GitHub - xinyexu/Data-Manipulation-Analysis: Data Manipulation & Analysis

Data Manipulation & Analysis

Course Description:

This course aims to help students get started with their own data harvesting, processing, aggregation, and analysis.

Data analysis is crucial to evaluating and designing solutions and applications, as well as understanding user's information needs and use. In many cases the data we need to access is distributed online among many web pages, stored in a database, or available in a large text file. Often these data (e.g.web server logs) are too large to obtain and/or process manually. Instead, we need an automated way of gathering the data, parsing it, and summarizing it, before we can do more advanced analysis.

Therefore, students will learn to use Python and its modules to accomplish these tasks in a 'quick and easy' yet useful and repeatable way. Next, students will learn techniques of exploratory data analysis, using scripting, text parsing, structured query language, regular expressions, graphing, and clustering methods to explore data. Students will be able to make sense of and see patterns in otherwise intractable quantities of data. The skills students will learn include the following: Big data processing; Converting messy data into a form that can be analyzed using Pandas; Compute and visualize summary statistics of datasets; Master the specification of graphical displays using Seaborn and matplotlib; Combine the use of graphics with data manipulation to visualize relationships between variables; Use machine learning techniques including clustering and classification. Use dimension reduction techniques.

Contents and related packages for each class and HW

Data manipulation I: pandas DataFrames

Data manipulation II: pandas (Homework 1 (Pandas data manipulation: Olympics))

Data analysis I: univariate stats, visualization, seaborn, intro to correlation (Homework 2 (more data manipulation))

Data analysis II: ANOVA, t-test, linear models (Homework 3 (Visualization and correlation))

Categorical data (contingency tables, chi-square, mosaic plots) Text processing: Regular Expressions (Homework 4 (Linear models))

Natural language processing (NLTK, gensim) (Project Proposal)

Machine Learning I: Clustering (Homework 5 (NLP))

Machine Learning II: Classification (Homework 6 (Clustering))

Machine Learning III: Dimensionality reduction (PCA and t-SNE)

Spark I

Spark II

HWs

Homework 1 (Pandas data manipulation: Olympics)

Homework 2 (more data manipulation)

Homework 3 (Visualization and correlation)

Homework 4 (Linear models)

Homework 5 (NLP)

Homework 6 (Clustering)

Homework 7 (Classification)

Homework 8 (Dimension Reduction)

Homework 9 (Spark)

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
618_01_Introduction.ipynb		618_01_Introduction.ipynb
618_02_Pandas_Intro.ipynb		618_02_Pandas_Intro.ipynb
618_03_Pandas_Advanced.ipynb		618_03_Pandas_Advanced.ipynb
618_04_data_analysis_I.ipynb		618_04_data_analysis_I.ipynb
618_05_data_analysis_II.ipynb		618_05_data_analysis_II.ipynb
618_06_categorical_and_text_data.ipynb		618_06_categorical_and_text_data.ipynb
618_07_NLP.ipynb		618_07_NLP.ipynb
618_08_Clustering_2019.ipynb		618_08_Clustering_2019.ipynb
618_09_Classification.ipynb		618_09_Classification.ipynb
618_10_Dimension_Reduction.ipynb		618_10_Dimension_Reduction.ipynb
618_11a_Big_Data_I.ipynb		618_11a_Big_Data_I.ipynb
618_11b_Big_Data_II.ipynb		618_11b_Big_Data_II.ipynb
618_12_Big_Data_III.ipynb		618_12_Big_Data_III.ipynb
618_HW3.ipynb		618_HW3.ipynb
618_HW4.ipynb		618_HW4.ipynb
618_HW5.ipynb		618_HW5.ipynb
618_Homework_01.ipynb		618_Homework_01.ipynb
618_Homework_06.ipynb		618_Homework_06.ipynb
618_Homework_07.ipynb		618_Homework_07.ipynb
Readme.md		Readme.md
si618-hw9-xinyexu.ipynb		si618-hw9-xinyexu.ipynb
si618-spark-dataframes.ipynb		si618-spark-dataframes.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Course Description:

Contents and related packages for each class and HW

HWs

About

Releases

Packages

Languages

xinyexu/Data-Manipulation-Analysis

Folders and files

Latest commit

History

Repository files navigation

Course Description:

Contents and related packages for each class and HW

HWs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages