AWS Certified Machine Learning Specialty Exam Questions

Amazon

AWS Certified Machine Learning Specialty

147 / 258

Question 147:

Your company, a financial services firm, has asked your team to build an analytics and machine learning platform to analyze and forecast your company`s trading operations using Athena, S3, and SageMaker Studio. The volume of data received on a daily basis is very high. The data, stored in S3, will be used as feature data for your machine learning model that uses the XGBoost SageMaker built-in algorithm. The source systems that stream data into your environment send their data in JSON format in real-time. Your team needs to transform the data in real-time to prepare it for your machine learning model. Before storing it on S3 for use in your SageMaker XGBoost algorithm-based model, how can you transform the data to prepare it for training?

Answer options:

A.Use Kinesis Data Streams to ingest the JSON data from the source systems, then send the data to Kinesis Data Firehose, where you can leverage a Lambda function to convert the JSON to libsvm and then use a Kinesis Data Firehose transform to write the data to S3.
B.Use Apache Spark Structured Streaming in an EMR cluster to ingest the JSON data from the source systems, then run Apache Spark steps to convert the JSON data into x-recordio-protobuf.
C.Use Kinesis Data Streams to ingest the JSON data from the source systems, then use a Glue ETL job to convert data from JSON into x-recordio.
D.Use Apache Kafka Streams running on EC2 instances to ingest the JSON data from the source systems, then use the Kafka Connect S3 connector to serialize the data onto S3 as x-recordio.

Answer correct:

Correct Answer: A Option A is correct. This option satisfies the real-time requirement while also being the most efficient and requiring the least amount of effort for your team. Also, the XGBoost algorithm only supports the libsvm and CSV content types for training and inference. Option B is incorrect. This option can meet your real-time requirement, but it is far more complex to set up and maintain for your team than using the Kinesis Data Streams and Kinesis Data Firehose option. Also, the XGBoost algorithm only supports the libsvm and CSV content types, not the x-recordio-protobuf content type for training and inference. Option C is incorrect. This option is incorrect because Glue ETL jobs imply batch processing, which fails to meet your real-time requirement. Also, the XGBoost algorithm only supports the libsvm and CSV content types, not the x-recordio content type for training and inference. Option D is incorrect. This option is also incorrect because it is far more complex to set up and maintain for your team than using the Kinesis Data Streams and Kinesis Data Firehose option. Also, the XGBoost algorithm only supports the libsvm and CSV content types, not the x-recordio content type for training and inference. References: Please see the AWS blog titled Archiving Amazon MSK Data to Amazon S3 with the Lenses.io S3 Kafka Connect Connector (https://aws.amazon.com/blogs/apn/archiving-amazon-msk-data-to-amazon-s3-with-the-lenses-io-s3-kafka-connect-connector/), The Amazon SageMaker developer guide titled Prepare ML Data with Amazon SageMaker Data Wrangler (https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler.html), The Amazon Kinesis Data Firehose developer guide titled Converting Your Input Record Format in Kinesis Data Firehose (https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html), The Amazon SageMaker developer guide titled Common Data Formats for Training (https://docs.aws.amazon.com/sagemaker/latest/dg/cdf-training.html), The Amazon SageMaker developer guide titled XGBoost Algorithm (https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html)

Add to favourites

ExamQuestions.com

Register

Login

Amazon

AWS Certified Machine Learning Specialty

147 / 258