Cesar Acosta

A health insurance company is interested in identifying new individuals who may consider purchasing their insurance policy. The company has customer data (from a similar insurance policy) with 85 customer variables and attributes (socio-demographic variables and product ownership variables). The data shows if a customer purchased the insurance policy. Only 6% of individuals in the dataset purchased that insurance policy. Classification models are built to predict if a new customer would buy the new policy. The data is clearly imbalanced and the focus is to find the best accuracy when predicting the positive class (a customer purchases the policy). The final model helped the company characterize those customers that would buy the insurance with a large probability.

Predicting Insurance data