ExamQuestions.com

Register
Login
AWS Certified Machine Learning Specialty Exam Questions

Amazon

AWS Certified Machine Learning Specialty

31 / 258

Question 31:

You work for a mining company where you are responsible for the data science behind identifying the origin of mineral samples. Your data origins are Canada, Mexico, and the US. Your training data set is imbalanced as such:
Canada | Mexico |US|
 1,210| 120|68 |
You run a Random Forest classifier on the training data and get the following results for your test data set (your test data set is balanced):
Confusion matrix:
 Predicted_
Observed| Canada | Mexico | US | Accuracy |
Canada | 45|3 |0 |94%|
Mexico | 5|38 |5 |79%|
US | 19|8 | 21|44%|
In order to address the imbalance in your training data, you will need to use a preprocessing step before you create your SageMaker training job. Which technique should you use to address the imbalance?

Answer options:

A.Run your training data through a preprocessing script that uses the SMOTE (Synthetic Minority Over-sampling Technique) approach
B.Run your training data through a Spark pipeline in AWS Glue to one-hot encode the features
C.Run your training data through a preprocessing script that uses the feature-split technique.
D.Run your training data through a preprocessing script that uses the min-max normalization technique.