ExamQuestions.com

Register
Login
AWS Certified Machine Learning Specialty Exam Questions

Amazon

AWS Certified Machine Learning Specialty

119 / 258

Question 119:

You work for a retail clothing manufacturer that has a very active online web store. You have been assigned the task of building a model to contact customers for a direct marketing campaign based on their predicted receptiveness to the campaign. Some of your customers have been contacted in the past for other marketing campaigns. You don’t want to contact these customers who have been contacted in the past for this latest campaign.
Before training this model, you need to clean your data and prepare it for the XGBoost algorithm you are going to use. You have written your cleaning/preparation code in your SageMaker notebook. Based on the following code, what happens on lines 19, 21, 22? (Select THREE)
1 import sagemaker
2 import boto3
3 from sagemaker.predictor import csv_serializer 
4 import numpy as np
5 import pandas as pd
6 from time import gmtime, strftime
7 import os
 8 region = boto3.Session().region_name 
9 smclient = boto3.Session().client(`sagemaker`)
10 from sagemaker import get_execution_role
11 role = get_execution_role()
12 bucket = `sagemakerS3Bucket` 
13 prefix = `sagemaker/xgboost`
14 !wget -N https://.../bank.zip
15 !unzip -o bank.zip
16 data = pd.read_csv(`./bank/bank-full.csv`, sep=`;`)
17 pd.set_option(`display.max_columns`, 500) 
18 pd.set_option(`display.max_rows`, 5) 
19 data[`no_previous_campaign`] = np.where(data[`contacted`] == 999, 1, 0)
20 data[`not_employed`] = np.where(np.in1d(data[`job`], [`student`, `retired`, `unempl`]), 1, 0)
21 model_data = pd.get_dummies(data) 
22 model_data = model_data.drop([`duration`, `employee.rate`, `construction.price.idex`,
`construction.confidence.idx`,`lifetime.rate`, `region`], axis=1)
23 train_data, validation_data, test_data = np.split(model_data.sample(frac=1,
random_state=1729), [int(0.7 * len(model_data)), int(0.9*len(model_data))]) 
24 pd.concat([train_data[`y_yes`], train_data.drop([`y_no`, `y_yes`], axis=1)],
axis=1).to_csv(`train.csv`, index=False, header=False)
25 pd.concat([validation_data[`y_yes`], validation_data.drop([`y_no`, `y_yes`], axis=1)],
axis=1).to_csv(`validation.csv`, index=False, header=False)
26 pd.concat([test_data[`y_yes`], test_data.drop([`y_no`, `y_yes`], axis=1)],
axis=1).to_csv(`test.csv`, index=False, header=False)
27 boto3.Session().resource(`s3`).Bucket(bucket).Object(os.path.join(prefix,
`train/train.csv`)).upload_file(`train.csv`)
28 boto3.Session().resource(`s3`).Bucket(bucket).Object(os.path.join(prefix,
`validation/validation.csv`)).upload_file(`validation.csv`)

Answer options:

A.Splits bank dataset into train, validation, and test datasets
B.Sets the attribute no_previous_campaign to 999, 0, or 1 depending if the customer in the observation has been contacted via a previous campaign
C.Sets the attribute no_previous_campaign to 1 if the customer in the observation has not been contacted via a previous campaign or 0 if they have been contacted via a previous campaign
D.Converts categorical data to a set of indicator variables
E.Converts empty attributes to dummy variables
F.Removes features deemed inconsequential
G.Removes observations deemed inconsequential