AWS Certified Machine Learning Specialty Exam Questions

Amazon

AWS Certified Machine Learning Specialty

43 / 258

Question 43:

You work as a machine learning specialist for a polling company. For the upcoming election, you need to classify the over 500,000 registered voters in your voter database by age for a campaign your team is about to launch. Your data is structured as such:
| voter_id | voter_age | voter_occupation | voter_income | …
|1 |21|student|0 | …
|2 |35|nurse |25000 | …
|3 |49|manager| 150000| …
|4 |63|truck driver|45000 | …
|5 |55|teacher|65000 | …
…
Because you have continuous data for your voter age feature, classifying your observations by age would result in too many classifications, i.e., one for every possible voter age from 21 though probably over 90. You need to have uniform classifications that are limited in number to make the best use of your data in your machine learning model.
What numerical feature engineering technique will give you the best distribution of classifications?

Answer options:

A.Cartesian Product Transformation
B.N-Gram Transformation
C.Orthogonal Sparse Bigram (OSB) Transformation
D.Normalization Transformation
E.Quantile Binning Transformation

Answer correct:

Answer: E Options A is incorrect. From the Amazon Machine Learning developer guide titled Data Transformations Reference, “The Cartesian product transformation takes categorical variables or text as input, and produces new features that capture the interaction between these input variables.” Because this transformation is for transforming text, it would not give you uniform age classifications that are limited in number. Option B is incorrect. From the Amazon Machine Learning developer guide titled Data Transformations Reference, “The n-gram transformation takes a text variable as input and produces strings corresponding to sliding a window of (user-configurable) n words, generating outputs in the process.” Because this transformation is also for transforming text, it would not give you uniform age classifications that are limited in number. Option C is incorrect. From the Amazon Machine Learning developer guide titled Data Transformations Reference, “The OSB transformation is intended to aid in text string analysis and is an alternative to the bi-gram transformation (n-gram with window size 2). OSBs are generated by sliding the window of size n over the text and outputting every pair of words that includes the first word in the window.” Because this transformation is also for transforming text, it would not give you uniform age classifications that are limited in number. Option D is incorrect. From the Amazon Machine Learning developer guide titled Data Transformations Reference, “The normalization transformer normalizes numeric variables to have a mean of zero and variance of one. Normalization of numeric variables can help the learning process if there are very large range differences between numeric variables because variables with the highest magnitude could dominate the ML model, no matter if the feature is informative with respect to the target or not.” Because this transformation is for normalizing continuous data, it would not give you uniform age classifications that are limited in number. Option E is correct. From the Amazon Machine Learning developer guide titled Data Transformations Reference, “The quantile binning processor takes two inputs, a numerical variable and a parameter called bin number, and outputs a categorical variable. The purpose is to discover non-linearity in the variable`s distribution by grouping observed values together.” Because Quantile binning is used to create uniform bins of classifications, it would be the right choice to give you uniform age classifications that are limited in number. For example, you could create classification bins such as: Under 30, 30 to 50, Over 50. Or even better: Millennial, Generation X, Baby Boomer, etc. Reference: Please see the Amazon Machine Learning developer guide titled Data Transformations for Machine Learning and the article Feature Engineering in Machine Learning (Part 1) Handling Numeric Data with Binning.

Add to favourites

ExamQuestions.com

Register

Login

Amazon

AWS Certified Machine Learning Specialty

43 / 258