Skip to content

Data Cleaning and Data Preprocessing to prepare a dataset for further Analysis

Notifications You must be signed in to change notification settings

EmmaSui/Data-Cleaning-Create-an-Analytical-Dataset

Repository files navigation

Data-Cleaning-Create-an-Analytical-Dataset

This project was copleted to practice data preprocessing and data cleaning skills. The original data and project was created by Udacity, from the Nanodegree 'Predictive Analytics for Business Nanodegree'. The original project used Alteryx to clean the data, which is way easier than using Python. For practice purposes, I used Python to clean the data and extracted the same result as shown in the right answer.

Project Overview (Provided by Udacity)

Scenario

Pawdacity is a leading pet store chain in Wyoming with 13 stores throughout the state. This year, Pawdacity would like to expand and open a 14th store. Your manager has asked you to perform an analysis to recommend the city for Pawdacity’s newest store, based on predicted yearly sales.

Your manager has given you the following information to work with:

The monthly sales data for all of the Pawdacity stores for the year 2010. NAICS data on the most current sales of all competitor stores where total sales is equal to 12 months of sales. A partially parsed data file that can be used for population numbers. Demographic data (Households with individuals under 18, Land Area, Population Density, and Total Families) for each city and county in the state of Wyoming. For people who are unfamiliar with the US city system, a state contains counties and counties contains one or more cities.

Data

p2-2010-pawdacity-monthly-sales.csv: This file contains all of the monthly sales for all Pawdacity stores for 2010.

p2-partially-parsed-wy-web-scrape.csv: This is a partially parsed data file that can be used for population numbers.

p2-wy-453910-naics-data.csv: NAICS data on the sales of all competitor stores where total sales is equal to 12 months of sales

p2-wy-demographic-data.csv: This file contains demographic data for each city and county in Wyoming.

Language used

Python

Packages used

pandas, matplotlib.pyplot, numpy

Files in this project

p2-2010-pawdacity-monthly-sales.csv: This file contains all of the monthly sales for all Pawdacity stores for 2010.

p2-partially-parsed-wy-web-scrape.csv: This is a partially parsed data file that can be used for population numbers.

p2-wy-453910-naics-data.csv: NAICS data on the sales of all competitor stores where total sales is equal to 12 months of sales

p2-wy-demographic-data.csv: This file contains demographic data for each city and county in Wyoming.

Create and Analytical Dataset Code.ipynb: The Python code I wrote in Jupyter notebook and the result of this project

Others:

If this project inspired you, gave you ideas to help with your own project, please consider buying me a coffee.

About

Data Cleaning and Data Preprocessing to prepare a dataset for further Analysis

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published