AWS Certified Machine Learning Specialty Exam Questions

Amazon

AWS Certified Machine Learning Specialty

240 / 258

Question 240:

You work as a machine learning specialist for an online retailer that is expanding into fresh produce as one of its new product categories. You and your machine learning team have been tasked with creating a model to classify each of your new fresh produce products. Examples of features in your data source include weight, price, country of origin, food group (fruit, vegetable, etc.), and other numeric and categorical features. You plan on using either k-nearest neighbors (KNN) or support vector machines (SVM) to classify your fresh produce products. Which data cleansing technique should you use on your data so that your features with potentially large values, such as weight, don’t take on exaggerated importance in the model when compared to features with potentially smaller values, such as price per unit?

Answer options:

A.Scale your data using scikit-learn MinMaxScaler
B.Normalize your data using scikit-learn normalize
C.Bin your data using scikit-learn KBinsDiscretizer with the uniform strategy
D.Quantile bin your data using scikit-learn KBinsDiscretizer with the quantile strategy

Answer correct:

Correct Answer: A Option A is correct. When using classification algorithms such as KNN or SVM, you need to scale your data so that each feature has the same scale. Using scikit-learn MinMaxScaler you can make your features span the same range of values (frequently between 0 and 1). This allows your features to have equal importance on the model’s outcome. Option B is incorrect. When you normalize your data you change your data to have equal distribution around the mean. This will not help with features that are on different scales, like weight and unit price. Option C is incorrect. Binning is used to change continuous features into categories. This will not help with features that are on different scales, like weight and unit price. Option D is incorrect. Quantile Binning is used to change continuous features into categories of equal bins. This will not help with features that are on different scales, like weight and unit price. Reference: Please see the Towards Data Science article titled All about Feature Scaling (https://towardsdatascience.com/all-about-feature-scaling-bcc0ad75cb35), the Kaggle page titled Scaling and Normalization (https://www.kaggle.com/alexisbcook/scaling-and-normalization), the Wikipedia page titled Support-vector machine (https://en.wikipedia.org/wiki/Support-vector_machine), the Wikipedia page titled k-nearest neighbors algorithm (https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm), and the Towards Data Science article titled Continuous Numeric Data (https://towardsdatascience.com/understanding-feature-engineering-part-1-continuous-numeric-data-da4e47099a7b), and the Scikit-learn modules page titled 6.3. Preprocessing data(https://scikit-learn.org/stable/modules/preprocessing.html), and the Scikit-learn modules page titled sklearn.preprocessing.KBinsDiscretizer (https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html)

Add to favourites

ExamQuestions.com

Register

Login

Amazon

AWS Certified Machine Learning Specialty

240 / 258