The objective is to create an ML model that accurately predicts individual's health insurance price based on some parameter like age, gender, bmi, children, smoking status, location.
The dataset consist insurance records of an local Health Insurance company based in London. The dataset consist of 1340 records spread across 7 features (age, gender, bmi, children, smoking status, location, health insuarnce price). The target column is health insuarnce price.
- Loading dataset
- Checking shape
- Checking duplicate records
- Checking null values
- checking outliers
- Handling Missing Values(Complete Case Analysis Technique)
- Handling Outliers(IQR Technique)
- Univariate Analysis(Histogram, Countplots, Boxplot, PieCharts)
- Bivariate Analysis(Scatterplot, Lineplot, Boxplot, Barplot)
- Multivariate Analysis(Pairplot)
- Feature Encoding(OneHotEncoder)
- Feature Scaling(StandardScaler)
- Feature Selection(Backward Feature Selection)
- Linear Regressor
- SGD Regressor
- KNN Regressor
- Decision Tree Regressor
- Random Forest Regressor
- SVM Regresoor
- XGB Regressor
- AdaBoost Regressor
- Gradient Descent Regressor
- MSE
- R2_Score
- Cross Valve Score
Gradient Decent Regressor was the best performing model with cross valve score of 86%.