In this project, I embarked on an exploratory journey through a dataset of U.S. medical insurance costs, aiming to uncover insights into the factors that influence these costs. Utilizing the Python programming language and its csv
library, I analyzed various aspects of the data, including patient demographics, smoking status, region, and the number of children. My primary objectives were to understand the average age of the patients, regional distribution, the impact of smoking on insurance costs, and the average age of patients with at least one child.
The first step in my analysis involved loading the data from a CSV file named insurance.csv
. I created lists for each attribute I intended to analyze: ages, sexes, BMIs, number of children, smoker status, regions, and insurance charges. Using the load_list_info
function, I populated these lists with data from the corresponding columns in the CSV file. This function streamlined the process of extracting information and set the stage for a more detailed analysis.
With the data successfully loaded into Python lists, I proceeded to delve into the analysis phase of the project. Here's a summary of the analyses I conducted:
I calculated the average age of the patients in the dataset to get a sense of the demographic we're dealing with. This involved summing all the ages and dividing by the total number of patients.
I analyzed the distribution of patients across different regions to identify where the majority of individuals are from. This required identifying unique regions in the dataset and counting occurrences.
I compared the insurance costs between smokers and non-smokers to evaluate the financial impact of smoking on medical insurance. This involved segmenting the data by smoking status and calculating average costs for each group.
Finally, I determined the average age of patients who have at least one child. This analysis aimed to explore how having dependents might relate to the age of the policyholders.
Through the implementation of the PatientsInfo
class and its methods, I systematically approached each analysis goal. The class's methods enabled me to efficiently analyze ages, distinguish sexes, identify unique regions, calculate average charges, and create a comprehensive dictionary of all patient information. This dictionary became a versatile tool for further exploration and analysis.
Some key insights from the project include:
- The average age of patients in the dataset, providing a window into the age distribution of those seeking medical insurance.
- The regional distribution of patients, highlighting areas with higher concentrations of insured individuals.
- A clear disparity in insurance costs between smokers and non-smokers, underscoring the financial consequences of smoking.
- The average age of patients with at least one child, offering perspective on the demographic of policyholders with dependents.
This project was not just an exercise in data manipulation but an enlightening exploration into the factors affecting U.S. medical insurance costs. By applying Python programming skills and analytical thinking, I uncovered valuable insights that could inform both individuals and policymakers about the dynamics of medical insurance pricing. The findings underscore the significance of lifestyle choices, demographic factors, and regional distribution in the cost of healthcare.