Question 160:
You are a machine learning specialist working for a large insurance company. You are building a machine learning model to predict the likelihood of an insured customer committing insurance fraud. Your training dataset has many attributes about the insured, the insurance policy, and their insurance claims. As its prediction, your model needs to produce a continuous value of the probability of fraud for any given customer claim. The feature set of your training data includes labeled outcomes for a set of 100,000 insurance claim observations. When you visualize the training dataset, you see that out of the 100,000 insurance claims, 24,350 claim records show the policy term length of 0 years. The remaining features for these observations show no anomalies. Which feature engineering option will give you the best dataset for your model training?
Answer options:
A.Use k-means clustering to impute the missing policy length features. B.Use KNN to impute the missing policy length features. C.Populate the 0 policy length feature value with the mean or median value of the feature. D.Drop the records from the dataset where policy length is 0.