Question 68:
You work for a financial services company where you have a large Hadoop cluster hosting a data lake in your on-premises data center. Your department has loaded your data lake with financial services operational data from your corporate actions, order management, cash management, reconciliations, and trade management systems. Your investment management operations team now wants to use data from the data lake to build financial prediction models. You want to use data from the Hadoop cluster in your machine learning training jobs. Your Hadoop cluster has Hive, Spark, Sqoop, and Flume installed. How can you most effectively load data from your Hadoop cluster into your SageMaker model for training?
Answer options:
A.Use the distcp utility to copy your dataset from your hadoop platform to the S3 bucket where your SageMaker training job can use it. B.Use the HadoopActivity command with AWS Data Pipeline to move your dataset from your hadoop platform to the S3 bucket where your SageMaker training job can use it. C.Use the SageMaker Spark library using the data frames in your Spark clusters to train your model. D.Use the Sqoop export command to export your dataset from your Hadoop cluster to the S3 bucket where your SageMaker training job can use it.