Answer: C
Option Ais incorrect because SageMaker Random Cut Forest is best used for large batch data sets where you don’t need to update the model frequently (See AWS Kinesis Data Analytics documentation: https://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sqlrf-random-cut-forest.html).
Answer B is incorrect because the Naive Bayes Classifier is used to find independent data points. The Kinesis Data Streams service does not have machine learning algorithm capabilities (See the AWS Kinesis Streams developer documentation: https://docs.aws.amazon.com/streams/latest/dev/introduction.html).
Option C is correct. The Kinesis Data Analytics Random Cut Forest algorithm works really well for near-real-time updates to your model (See the AWS Kinesis Data Analytics documentation: https://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sqlrf-random-cut-forest.html).
Option D is incorrect because Kinesis Data Analytics provides a hotspots function that detects higher than normal activity using the distance between a hotspot and its nearest neighbor. But it does not provide ML model update capabilities (See AWS Kinesis Data Analytics documentation: https://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sqlrf-hotspots.html).
Diagram:
Here is a screenshot from the AWS Big Data blog:
Reference:
For an example, please see the AWS Big Data blog post titled Perform Near Real-time Analytics on Streaming Data with Amazon Kinesis and Amazon Elasticsearch Service: https://aws.amazon.com/blogs/big-data/perform-near-real-time-analytics-on-streaming-data-with-amazon-kinesis-and-amazon-elasticsearch-service/) for a complete description of the use of Kinesis Data Analytics and the random cut forest algorithm.