Question 235:
Henry is a Data Engineer of Whizlabs Inc working on Databricks Spark streaming. He’s using PySpark for the development of dataframes. He needs to perform the data aggregations & count of distinct data frame operations in the dataframe. Which of the following is the correct code snippet in this scenario?
Answer options:
A.countDistinctDF = nonNullDF.select(“emp_id”, “emp_name”) .groupBy(“emp_id).agg(countDistinct(“emp_name”).alias(“distinct_emp_name”) display(countDistinctDF) B.countDistinctDF = nonNullDF.select(“emp_id”, “emp_name”) .groupBy(“emp_id).aggregate(countDistinct(“emp_name”).alias(“distinct_emp_name”) display(countDistinctDF) C.countDistinctDF = nonNullDF.select(“emp_id”, “emp_name”).agg(countDistinct(“emp_name”).alias(“distinct_emp_name”). .groupBy(“emp_id) display(countDistinct) D.countDistinctDF = nonNullDF.select(“emp_id”, “emp_name”) .groupBy(“emp_id).aggregate().(countDistinct(“emp_name”).alias(“distinct_emp_name”) display(countDistinctDF)