Answer: E
Options A is incorrect. From the Amazon Machine Learning developer guide titled Data Transformations Reference, “The Cartesian product transformation takes categorical variables or text as input, and produces new features that capture the interaction between these input variables.” Because this transformation is for transforming text, it would not give you uniform age classifications that are limited in number.
Option B is incorrect. From the Amazon Machine Learning developer guide titled Data Transformations Reference, “The n-gram transformation takes a text variable as input and produces strings corresponding to sliding a window of (user-configurable) n words, generating outputs in the process.” Because this transformation is also for transforming text, it would not give you uniform age classifications that are limited in number.
Option C is incorrect. From the Amazon Machine Learning developer guide titled Data Transformations Reference, “The OSB transformation is intended to aid in text string analysis and is an alternative to the bi-gram transformation (n-gram with window size 2). OSBs are generated by sliding the window of size n over the text and outputting every pair of words that includes the first word in the window.” Because this transformation is also for transforming text, it would not give you uniform age classifications that are limited in number.
Option D is incorrect. From the Amazon Machine Learning developer guide titled Data Transformations Reference, “The normalization transformer normalizes numeric variables to have a mean of zero and variance of one. Normalization of numeric variables can help the learning process if there are very large range differences between numeric variables because variables with the highest magnitude could dominate the ML model, no matter if the feature is informative with respect to the target or not.” Because this transformation is for normalizing continuous data, it would not give you uniform age classifications that are limited in number.
Option E is correct. From the Amazon Machine Learning developer guide titled Data Transformations Reference, “The quantile binning processor takes two inputs, a numerical variable and a parameter called bin number, and outputs a categorical variable. The purpose is to discover non-linearity in the variable`s distribution by grouping observed values together.” Because Quantile binning is used to create uniform bins of classifications, it would be the right choice to give you uniform age classifications that are limited in number. For example, you could create classification bins such as: Under 30, 30 to 50, Over 50. Or even better: Millennial, Generation X, Baby Boomer, etc.
Reference:
Please see the Amazon Machine Learning developer guide titled Data Transformations for Machine Learning and the article Feature Engineering in Machine Learning (Part 1) Handling Numeric Data with Binning.