Question 343:
A company needs to process large amounts of data and store them in a data store accordingly. The data consists of all IP addresses which are accessing their website. There would be around billions of rows being stored in the data store. The company have decided to use the AWS EMR service. The company needs to be able to query the data efficiently based on the IP address. Which of the following would be an ideal implementation plan for this?
Answer options:
A.Use S3 as the underlying storage for the EMR cluster. Ensure a bucket is created for each IP address B.Make use of HBase on EMR. Ensure that the IP address is used as the underlying key C.Use S3 as the underlying storage for the EMR cluster. Ensure that the prefixes have the IP address attached such as bucketname/IPaddress-filename D.Post the data from EMR to Redshift for analysis