Search Relevance Surveys

Analysis of the 3^rd running of the search relevance surveys (T175048).

Setup

Libraries

Since T178096 is done, apply the roles discovery::learner or discovery::allstar_cruncher to instances on Wikimedia Cloud (formerly Wikimedia Labs).

Packages

# Essentials:
install.packages(c("tidyverse", "caret", "MLmetrics", "mlbench"))
# For bnclassify:
source("https://bioconductor.org/biocLite.R")
biocLite(c("RBGL", "Rgraphviz"))
# Classifiers:
install.packages(c("xgboost", "C50", "klaR", "e1071", "randomForest", "bnclassify", "keras"))
# Metanalysis:
install.packages("betareg")

TODO

Tune & train a bunch of classifiers (thanks, caret!)
Figure out which sets of features yield the best predictive performance
Investigate a multi-level approach based on Discernatron reliability (sort of?)
Investigate a stacking / super learning approach
Investigate how many responses & impressions we need to get reliable score estimates
Write-up

Scripts

Data
- pageviews.R uses the Wikimedia Analytics Pageviews API to fetch a month worth of daily pageview counts for the relevant articles
- events.R fetches the survey data from Event Logging database
- discernatron.R fetches relevance scores from Discernatron's API
- data.R combines fetched pageviews, survey data, and Discernatron scores into complete datasets
Model Tuning & Training via models.R
- Outputs models/model-index.csv
- keras.R has the code for training a deep neural network with Keras and outputs models/keras-index.csv
Model Evaluation via evaluate.R
- Outputs models/model-accuracy.csv
- Note that keras.R computes accuracy as part of the training process

Production

To use the final model for predicting relevance of any query-page combination from users' survey responses, please refer to these instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
docs		docs
models		models
production		production
.gitignore		.gitignore
README.md		README.md
SurveyRedux.Rproj		SurveyRedux.Rproj
data.R		data.R
discernatron.R		discernatron.R
evaluate-size.R		evaluate-size.R
evaluate.R		evaluate.R
events.R		events.R
example.R		example.R
features.R		features.R
keras-legacy.R		keras-legacy.R
keras.R		keras.R
models.R		models.R
pageviews.R		pageviews.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Search Relevance Surveys

Setup

Libraries

Packages

TODO

Scripts

Production

About

Releases

Packages

Languages

wikimedia-research/Discovery-Search-Adhoc-RelevanceSurveys

Folders and files

Latest commit

History

Repository files navigation

Search Relevance Surveys

Setup

Libraries

Packages

TODO

Scripts

Production

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages