Answer: A
Option A is CORRECT. You are trying to solve a “how many” question, and your data is labeled. These two factors lead to the choice of linear regression as the best option from those given.
Option B is incorrect. The principal component analysis is used for dimensionality reduction, not for solving predictions of “how many” problems. Also, it is an unsupervised algorithm. We have labeled data. So we should use a supervised algorithm.
Option C is incorrect.The random cut forest is used primarily as an unsupervised algorithm for detecting anomalous data points within a data set. Since we have labeled data, we will use a supervised algorithm.
Option D is incorrect. Logistic regression is used to solve “yes/no” or binary predictions, not “how many” predictions.
Reference:
Please see the Amazon Machine Learning developer guide titled Regression Model Insights.
Please refer to the Amazon SageMaker developer guide titled Random Cut Forest (RCF) Algorithm.
Please refer to the Amazon SageMaker developer guide titled Principal Component Analysis (PCA) Algorithm.