What is ORDER BY command in Hive?
In Hive after we create table and load data, If we query it with ORDER BY command – Hive will make reducer as one since
ORDER BY command make global ordering of all our records. That means in map reduce program which executed by Hive engine – in reducer phase all the records are send to one reducer. If we use very large data set it will give memory issue and leads higher execution time.
How to set more number reducers in Hive
ORDER BY command in Hive leads to memory issue as
Hive taking entire dataset as global level. To avoid it we need to set more number of reducers.
More number of reducers set by the following command.
> SET mapreduce.job.reduces=5;
here is reducers 5
However even we set more than one reducers with using ORDER BY command, it will set reducer is one only.