Correct Answers: A, B and C
To balance the parallel processing, the selection of the distribution column is very important. Otherwise, there are chances of data skew and processing. This will affect the parallel query performances considerably. The following are the three major considerations. Has many unique values. Duplicate values may be present in some columns. Distribution will have entire rows with the same value mapped to it. In fact, some columns will have more than one unique value, while some of them may not even have a single value.
Does not have NULLs or has only a few NULLs. More number of nulls means more skew and thus decreases the performance of parallel processing.
Is not a date column. In this case, all the data on a date will be in a single distribution.
Options A, B, C are correct: They are considerations that should be followed while selecting a distribution column.
Options D, E, F are incorrect: They are just opposite of what the real considerations are.
Reference:
To know more, please refer to the docs below:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute#choose-a-distribution-column-with-data-that-distributes-evenly