Mining of Massive Datasets

Jure Leskovec, Anand Rajaraman and Jeff Ullman welcome you to the self-paced version of the on-line course based on the book Mining of Massive Datasets. It is intended for people who have a reasonable undergraduate education in Computer Science, including courses in data structures, algorithms, databases, calculus, statistics, and linear algebra.

In this course, you will learn many of the interesting algorithms that have been developed for efficient processing of large amounts of data in order to extract simple and useful models of that data. These techniques are often used to predict properties of future instances of the same sort of data, or simply to make sense of the data already available. Many people view data mining, or "big data" as machine learning. There are indeed some techniques for processing large datasets that can be considered machine learning, and we shall cover a number of these. But there are also many algorithms and ideas for dealing with big data that are not usually classified as machine learning, and we shall cover many of these as well.

Course Outline

The course is divided into 15 modules of videos and homeworks and a final exam. In the synchronous version of the course, the material is intended to be covered in seven weeks. However, you are free to spend more or less time learning this material. Here is a list of the 15 modules:

MapReduce
Link Analysis (PageRank)
Locality-Sensitive Hashing
Distance Measures and Nearest-Neighbor Learning
Frequent Itemset Analysis
Social-Network Graphs
Algorithms for Data Streams
Recommendation Systems
Dimensionality Reduction
Clustering
Computational Advertising
Machine Learning
More on MapReduce Algorithms
More on Locality-Sensitive Hashing
More on Link Analysis

Course Materials

The material found in this course is supported by a free on-line book, with the same title and authors as the course itself. The book is published by Cambridge University Press, but, by courtesy of the publisher, you can download a free copy at www.mmds.org. In addition to the videos provided, the slide sets used in each video can be accessed via the "Handouts" link beneath each video.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Computational_Advertising.ipynb		Computational_Advertising.ipynb
Dimensionality_Reduction.ipynb		Dimensionality_Reduction.ipynb
Frequent_Itemsets.ipynb		Frequent_Itemsets.ipynb
Locality_Sensitive_Hashing_and_Distance_Measures.ipynb		Locality_Sensitive_Hashing_and_Distance_Measures.ipynb
Machine_Learning_and_Map_Reduce.ipynb		Machine_Learning_and_Map_Reduce.ipynb
PageRank_and_Advanced_Link_Analysis.ipynb		PageRank_and_Advanced_Link_Analysis.ipynb
README.md		README.md
Stream_Algorithms.ipynb		Stream_Algorithms.ipynb
data-mining.jpg		data-mining.jpg
dectree1.jpeg		dectree1.jpeg
gold.jpeg		gold.jpeg
newsvm4.jpeg		newsvm4.jpeg
pagerank1.jpeg		pagerank1.jpeg
pagerank2.jpeg		pagerank2.jpeg
pagerank4.jpeg		pagerank4.jpeg
pagerank5.jpeg		pagerank5.jpeg
relations.jpeg		relations.jpeg
svm1.jpeg		svm1.jpeg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mining of Massive Datasets

Course Outline

Course Materials

About

Releases

Packages

Languages

KrishnaKumarTiwari/mining-massive-datasets

Folders and files

Latest commit

History

Repository files navigation

Mining of Massive Datasets

Course Outline

Course Materials

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages