Question 199:
You work as a machine learning specialist for a polling organization using US census data to predict whether a given polling respondent earns greater than $75,000. Your company will then sell the polling prediction data to candidates running for various political office positions across the country. You need to clean the polling data on which you wish to train your binary classification model. Specifically, you need to remove duplicate rows with erroneous data, transform the income column into a label column with two values, transform the age column to a categorical feature by binning the column, scale the capital gain and capital losses columns, and finally split the data into train and test datasets. Which of the options are the most efficient ways to achieve your data sanitizing and feature preparation? (Select TWO)
Answer options:
A.Create a SageMaker Processing job using a SageMaker Scala SDK with Processing container leveraging the pandas PDLearnProcessor package that performs your required preprocessing sanitizing and feature preparation tasks and then splits the data into the training and test datasets. B.Create a SageMaker Processing job using a SageMaker Python SDK with Processing container leveraging the scikit-learn SKLearnProcessor package that performs your required preprocessing sanitizing and feature preparation tasks and then splits the data into the training and test datasets. C.Create a SageMaker Processing job using a SageMaker Python SDK with Data Wrangler container leveraging the scikit-learn SKLearnProcessor package that performs your required preprocessing sanitizing and feature preparation tasks and then splits the data into the training and test datasets. D.Create a SageMaker Processing job using a SageMaker Python SDK with Processing container leveraging the Spark PySparkProcessor package that performs your required preprocessing sanitizing and feature preparation tasks and then splits the data into the training and test datasets. E.Create a SageMaker Processing job using a SageMaker Python SDK with Processing container leveraging the SparkMLProcessor package that performs your required preprocessing sanitizing and feature preparation tasks and then splits the data into the training and test datasets.