Skip to content

Jiaying-Wu/DataWrangling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Wangling with Python

  • DW01_Parsing_Data: Parsing data in CSV, JSON, XML, PDF.

  • DW02_Regular_Expression: Regular Expression with Python.

  • DW03_Text_Pre_Processing: Text Pre-processing with Tokenization, Case Normalization, Removing Stop words, Stemming, Lemmatization, Sentence Segmentation.

  • DW04_Text_Feature_Representation: Generating features representation for text, including Removing Words with Non-alphabetic Characters, Removing the Most and Less Frequent Words, Creating Count Vectors, Creating TF-IDF Vectors, Saving Pre-processed Text to a File, Extracting Nouns and Verbs, Extracting N-grams and Collocations.

  • DW05_Process_Tweets: Processing tweets, including Collecting tweets from API, Loading Tweets from a Dump File, Process Emoticons, Tokenizing Tweet Text, Pre-processing Tweet for Sentiment Analysis.

  • DW06_Data_Auditing: Data Auditing of Syntactic Anomalies and Semantic Anomalies for Titanic Data.

  • DW07_Missing_Value: Investigate missing values and Imputation for Titanic Data.

  • DW08_Imputation_Linear_Regression: Using Linear Regression to impute missing value for Boston house prices data.

  • DW09_Outlier: Detect outlier using boxplot visulization.

  • DW10_Heat_Map: Data Integration with Household heating data and create heat map.

  • DW11_Interactive_Map: Process geospatial data and create interactive map for United States Geological Survey (USGS) dataset from SQLite database.

  • DW12_Data_Merge: Data Merging, Joing and Reshaping with pandas.

  • DW13_Data_Scaling: Data Scaling using StandardScaler, MinMaxScaler and PowerTransformer for California housing dataset.

  • DW14_Data_Sampling: Using Random sampling and Stratified sampling to sample data from wine dataset.

  • DW15_Data_Discretisation: Data Discretization with Binning.