Skip to content

Integration of 7 different datasets in various formats about housing information in Victoria, Australia. And study the effect of different normalization/transformation methods

Notifications You must be signed in to change notification settings

ricardoariasalazar/Housing-Information-Melbourne

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Housing-Information-Melbourne

By using a Python code we can integrate several datasets into one single schema and find and fix possible problems in the data. In this case we are going to use 7 different datasets in various formats about housing information in Victoria, Australia. Each of you is given 7 datasets in various formats and the data is about housing information in Victoria, Australia. The first task is to integrate all the datasets into one dataset:

  • Hospitals (HTML Format)
  • Supermarkets (Excel Format)
  • Shopping centers (PDF Format)
  • Real Estate (XML format)
  • Real Estate (JSON format)
  • Vic_suburb_boundary (Shape Format)
  • GTFS_Melbourne_Train_Information (Text Format)

The second task is to study the effect of different normalization/transformation methods:

  • Z-score Standardization
  • Minmax normalization

And observe and explain their effect assuming we want to develop a linear model to predict the price of a property using Distance_to_sc, travel_min_to_CBD, and Distance_to_hospital attributes.

About

Integration of 7 different datasets in various formats about housing information in Victoria, Australia. And study the effect of different normalization/transformation methods

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published