ExamQuestions.com

Register
Login
AWS Certified Machine Learning Specialty Exam Questions

Amazon

AWS Certified Machine Learning Specialty

147 / 258

Question 147:

Your company, a financial services firm, has asked your team to build an analytics and machine learning platform to analyze and forecast your company`s trading operations using Athena, S3, and SageMaker Studio. The volume of data received on a daily basis is very high. The data, stored in S3, will be used as feature data for your machine learning model that uses the XGBoost SageMaker built-in algorithm. The source systems that stream data into your environment send their data in JSON format in real-time. Your team needs to transform the data in real-time to prepare it for your machine learning model. Before storing it on S3 for use in your SageMaker XGBoost algorithm-based model, how can you transform the data to prepare it for training?

Answer options:

A.Use Kinesis Data Streams to ingest the JSON data from the source systems, then send the data to Kinesis Data Firehose, where you can leverage a Lambda function to convert the JSON to libsvm and then use a Kinesis Data Firehose transform to write the data to S3.
B.Use Apache Spark Structured Streaming in an EMR cluster to ingest the JSON data from the source systems, then run Apache Spark steps to convert the JSON data into x-recordio-protobuf.
C.Use Kinesis Data Streams to ingest the JSON data from the source systems, then use a Glue ETL job to convert data from JSON into x-recordio.
D.Use Apache Kafka Streams running on EC2 instances to ingest the JSON data from the source systems, then use the Kafka Connect S3 connector to serialize the data onto S3 as x-recordio.