Skip to content

Implementation of KMeans clustering on the Country-data.csv dataset to identify distinct clusters of countries based on various socio-economic indicators. Preprocessing and normalisation of data to ensure accuracy and consistency. Analysis of clusters uncovering patterns and insights for strategic decision-making.

Notifications You must be signed in to change notification settings

Aiza-D/KMeans-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Table of contents

  • Data Source
  • Data Attributes
  • Objective
  • Task Name
  • Steps followed

Categorising countries

Data Source

The data used in this task was orginally sourced from Help.NGO. This international non-governmental organisation specialises in emergency response, preparedness, and risk mitigation.

Dataset Attributes

country: Name of the country.
child_mort: Death of children under 5 years of age per 1000 live births.
exports: Exports of goods and services per capita. Given as a percentage of the GDP per capita.
health: Total health spending per capita. Given as a percentage of GDP per capita.
imports: Imports of goods and services per capita. Given as a percentage of the GDP per capita.
income: Net income per person.
inflation: The measurement of the annual growth rate of the Total GDP.
life_expec: The average number of years a new born child would live if the current mortality patterns remain the same.
total_fer: The number of children that would be born to each woman if the current age-fertility rates remains the same.
gdpp: The GDP per capita. Calculated as the Total GDP divided by the total population.

Objective

To group countries using socio-economic and health factors to determine the development status of the country.

KMeans Clustering

Steps followed:

Follow steps have been followed:

  1. Loading the Country-data.csv dataset.
  2. Dropping any non-numeric columns from the dataset.
  3. Plotting nine different scatter plots with different combinations of variables against GDPP and child_mort. For example, GDPP vs health.
    • Note which of these plots looks the most promising for separating into clusters.
  4. Normalising the dataset using MinMaxScaler from sklearn.
  5. Finding the optimal number of clusters using the elbow and silhouette score method.
  6. Fitting the scaled dataset to the optimal number of clusters. Reporting back on the silhouette score of the model.
  7. Visualising the clusters for the following two groups:
    • Child mortality vs GDPP
    • Inflation vs GDP
  8. Labeling the groups of countries in the plots created based on child mortality, GDPP, and inflation using terms such as:
    • least developed, developing, and developed.

About

Implementation of KMeans clustering on the Country-data.csv dataset to identify distinct clusters of countries based on various socio-economic indicators. Preprocessing and normalisation of data to ensure accuracy and consistency. Analysis of clusters uncovering patterns and insights for strategic decision-making.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published