AWS Certified Machine Learning Specialty Exam Questions

Amazon

AWS Certified Machine Learning Specialty

205 / 258

Question 205:

You work as a machine learning specialist for a financial services organization. Your machine learning team is responsible for building models that predict index fund tracking errors for the various funds managed by your mutual fund portfolio management department. You need to ingest data into your data lake for use in your machine learning models. The required securities pricing data come from varying sources that deliver the data you need to use in your model inferences in near real-time. You need to perform data transformation, such as compression, of the data before writing it to your S3 data lake. Which option gives you the most efficient solution for ingesting the data into your data lake?

Answer options:

A.Ingest the pricing data using a Kinesis Data Analytics application where you use Apache Flink to compress your data into the GZIP format and write it to your S3 data lake.
B.Ingest the pricing data into Kinesis Data Streams using a Kinesis Producer Library (KPL) application running on EC2 instances; use a Kinesis Client Library (KCL) application to compress your data into the GZIP format and write it to your S3 data lake.
C.Ingest the pricing data using Kinesis Data Firehose where you use a Lambda function to compress your data into the GZIP format and have the Lambda function write the data to your S3 data lake.
D.Ingest the pricing data using Kinesis Data Firehose where you use a Lambda function to compress your data into the GZIP format; Kinesis Data Firehose writes the data to your S3 data lake.

Answer correct:

Correct Answer: D Option A is incorrect. Kinesis Data Analytics needs to be fed the streaming data by either Kinesis Data Streams or Kinesis Data Firehose. Kinesis Data Analytics cannot ingest data directly. Also, Apache Flink can write your data to S3 using the streaming file sink, but it writes in the AVRO and Parquet formats, not GZIP. Option B is incorrect. The solution described in this option will technically work. However, it is much less efficient than using Kinesis Data Firehose to ingest, compress using Lambda, and write your data to S3. Option C is incorrect. You can ingest your pricing data using Kinesis Data Firehose and use lambda to compress your data into the GZIP format. However, you should leverage the Kinesis Data Firehose capability to write your data directly to your S3 bucket. This is more efficient than writing your own code in your Lambda function to write the data to S3. Option D is correct. Ingesting the data using Kinesis Data Firehose, using Lambda to compress the data into the GZIP format, and then having Kinesis Data Firehose write your data to S3 is a very common example of using Kinesis Data Firehose for a very efficient data ingestion solution. References: Please see the Building Big Data Storage Solutions (Data Lakes) for Maximum Flexibility AWS Whitepaper titled Data Ingestion Methods (https://docs.aws.amazon.com/whitepapers/latest/building-data-lakes/data-ingestion-methods.html), The Investopedia page titled Tracking Error (https://www.investopedia.com/terms/t/trackingerror.asp#:~:text=Tracking%20error%20is%20the%20difference,and%20its%20corresponding%20risk%20level.), The Apache Flink developer guide titled Streaming File Sink (https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/streamfile_sink.html), The Amazon Kinesis Data Streams product page titled Getting started with Amazon Kinesis Data Streams (https://aws.amazon.com/kinesis/data-streams/getting-started/), The Amazon Kinesis Data Analytics for SQL Applications Developer Guide SQL developer guide titled Amazon Kinesis Data Analytics for SQL Applications: How It Works (https://docs.aws.amazon.com/kinesisanalytics/latest/dev/how-it-works.html)

Add to favourites

ExamQuestions.com

Register

Login

Amazon

AWS Certified Machine Learning Specialty

205 / 258