Correct Answer: B
When the data is distributed unevenly to the resources which should process it, there arises the data skew situation. Based on the resource caps, the data for each process may be less or more, thus reducing the total processing efficiency.
We can try different methods to identify and resolve the skew in cases where there is data skew. All the options except, reducing the number of partition keys will be helping reduce the data skew. But if we reduce the number of partition keys, it may actually increase the data skew.
Option A is incorrect: in cases where we don`t have an appropriate key for partition and distribution, it is better to use a round robin.
Option B is correct: This will not resolve the data skew. In fact, it will increase data skew.
Option C is incorrect: Usually, by default, there will be non-recursive mode; enable recursive reducer where it is applicable.
Option D is incorrect: Combiner mode tries to distribute very big skewed-key value sets to different vertices.
Reference:
To know more, please refer to the docs below:
https://docs.microsoft.com/en-us/azure/data-lake-analytics/data-lake-analytics-data-lake-tools-data-skew-solutions