Skip to content

SCOR Datathon in 2020. Acquired and processed open data, predicted level of Glycohemoglobin, Cholesterol and probability of diabetes, then identified the probability change with Random Survival Forest to suggest improvements to a user.

Notifications You must be signed in to change notification settings

cnai-ds/Datathon-Health-Risk-Prediction

Repository files navigation

Heath Risk Prediction for SCOR Datathon

This repository is for SCOR Datathon held from November 2019 to February 2020 in Paris.
SCOR is a tier 1 reinsurance company in the world. During this 4 months, we processed real open data, NHANES and NHCS, and built models to predict health risks of a person in U.S.

Our business problem identification

We have identified 2 biggest business problems for insurance and re-insurance companies; frauds and increase cost from chronic diseases.
With the growth of middle class and urbanisation, sedentary lifestyle is becoming more common across the world. This leads to an accerelation of costly chronic diseases.
Health care payers are seeking for solutions to decrease their premium cost not only by predicting one's health risks to set correct premium, but also by improving one's health; the longer the people without health problems, the cheaper the health cost that the payers pays.

Our solutions

We have created a flask application to predict one's diabetes risk, level of glycohemoglobin and cholesterol, and suggest how one can decrease the risk to specific level. For example, we can say, to decrease the risk to 50% to 20% , we can suggest the person to walk 10 minutes more or sleep 1 hour longer per day.



Main Models for our application

  • Cox.PH to identify key features for health risks
  • Kaplan Meier to visualize survival curve for each indicators
  • Gradient Boost Regression to predict Glycohemoglobin and Cholesterol level of a person
  • Xgboost Classification to predict diabetes
  • SHAP value to calculate the marginal effect of certain variables to diabetes
  • Random Survival Forest to identify the probability change considering given age and other parameter change.

About

SCOR Datathon in 2020. Acquired and processed open data, predicted level of Glycohemoglobin, Cholesterol and probability of diabetes, then identified the probability change with Random Survival Forest to suggest improvements to a user.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published