Question 122:
You work for an online fashion retailer as a machine learning specialist. You are on a team of machine learning specialists and data scientists who have been given the responsibility of centralizing your company’s product, customer, supplier, and materials data in one source. This new data source will be used for analytics and making business decisions using KPIs (Key Performance Indicators). Your company has many different data sources where their product, customer, supplier, and materials data is stored. These data repositories are also housed on several different database technologies. When you load the various data sources into your new centralized data source, you need to clean and classify the data as well. What is the most expeditious and efficient way to create this new centralized data source?
Answer options:
A.Use Amazon EMR and its built-in machine learning tool Apache Spark MLlib to extract the data from your disparate data sources, transform (clean and classify) the data, and load it into an S3 data lake. B.Use AWS Glue crawlers to crawl your disparate data sources and create a metastore for your S3 data lake. Use AWS Glue to then extract, transform (clean and classify), and load the source data into your S3 data lake. C.Use Amazon Kinesis Data Firehose to send the data from your disparate data sources to you S3 data lake. Use lambda integration with Kinesis Data Firehose to transform (clean and classify) your data as it loads into your S3 data lake. D.Use AWS Lake Formation to collect and catalog the data from your disparate data sources, transform (clean and classify) your data, and load the data into your S3 data lake.