This repository contains 4 different recommender systems for movies based on user ratings.
- Movie based Collaborative Filtering Recommender System
- Singular Value Decomposition Recommender System
- Latent Factor Model of Recommender System
- CUR based Recommender System
These are made in Python language using a dataset of 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 users.
All ratings are contained in the file "ratings.dat" and are in the following format:
UserID::MovieID::Rating::Timestamp
- UserIDs range between 1 and 6040
- MovieIDs range between 1 and 3952
- Ratings are made on a 5-star scale (whole-star ratings only)
- Timestamp is represented in seconds since the epoch as returned by time(2)
- Each user has at least 20 ratings
User information is in the file "users.dat" and is in the following format:
UserID::Gender::Age::Occupation::Zip-code
All demographic information is provided voluntarily by the users and is not checked for accuracy. Only users who have provided some demographic information are included in this data set.
-
Gender is denoted by a "M" for male and "F" for female
-
Age is chosen from the following ranges:
- 1: "Under 18"
- 18: "18-24"
- 25: "25-34"
- 35: "35-44"
- 45: "45-49"
- 50: "50-55"
- 56: "56+"
-
Occupation is chosen from the following choices:
- 0: "other" or not specified
- 1: "academic/educator"
- 2: "artist"
- 3: "clerical/admin"
- 4: "college/grad student"
- 5: "customer service"
- 6: "doctor/health care"
- 7: "executive/managerial"
- 8: "farmer"
- 9: "homemaker"
- 10: "K-12 student"
- 11: "lawyer"
- 12: "programmer"
- 13: "retired"
- 14: "sales/marketing"
- 15: "scientist"
- 16: "self-employed"
- 17: "technician/engineer"
- 18: "tradesman/craftsman"
- 19: "unemployed"
- 20: "writer"
Movie information is in the file "movies.dat" and is in the following format:
MovieID::Title::Genres
-
Titles are identical to titles provided by the IMDB (including year of release)
-
Genres are pipe-separated and are selected from the following genres:
- Action
- Adventure
- Animation
- Children's
- Comedy
- Crime
- Documentary
- Drama
- Fantasy
- Film-Noir
- Horror
- Musical
- Mystery
- Romance
- Sci-Fi
- Thriller
- War
- Western
-
Some MovieIDs do not correspond to a movie due to accidental duplicate entries and/or test entries
-
Movies are mostly entered by hand, so errors and inconsistencies may exist
Results obtained:
Collaborative without baseline approach:
RMSE: 1.33 Time taken: 16 secs
Collaborative with baseline approach:
RMSE: 1.14 Time taken: 16 secs
SVD:
RMSE: 0.91 Time taken: 28 secs
SVD with 90% retained energy:
RMSE: 0.93 Time taken: 28 secs
CUR
RMSE: 2.20 Time taken: 10 secs
CUR with 90% retained energy:
RMSE: 2.25 Time taken: 10 secs
Latent Factor Model:
RMSE: 0.90 Time taken: 32 secs secs