MarchMadnessDatasets

March Madness Datasets from 2002-2019, 2021. Datasets curated by [Chris Toukmaji] (https://www.toukmaji.com/), data published by [Ken Pomeroy] (https://en.wikipedia.org/wiki/Ken_Pomeroy) via kenpom.com. Please read the rest of my explanation so the structure of the rows makes sense.

Dataset Architecture

Due to the bracket style of the tournament, the dataset follows a similar approach. First and foremost, the order of data does initially matter in this dataset; be cautious if and when you partion your data. For the first region (the first set of 16 teams), each consecutive pair of teams play each other, and the winner is appended to the set of 16 teams. The process continues until there is only 1 winner remaining from these 16 teams. Then, we continue the same process for the remaining three regions. Finally, we are left with four teams (one team per region). We append these four teams, then continue the same process.

The fluidity of the data is left as a decision on how one wants to structure their data/model. Initially, I one-hot encoded each round, but now realized the drawbacks in this use case. For example, it is impossible for a team to win a 2nd round game without winning a 1st round game (since each team gets eliminated after one loss), so these features become uesless for most teams. Another option is to count the number of games a certain team wins. However, both of these approaches are unnatural and base a prediction on overall performance, not the performance against a certain team (like a true bracket). In order to keep the context of both teams, I am thinking of some form of data manipulation to each pair of consecutive teams and possibly subtracting each pairs' data so we are training on the difference between the performance of two teams rather than the average performance of a team against any arbitrary opponent.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
older		older
2011.csv		2011.csv
2012.csv		2012.csv
2013.csv		2013.csv
2014.csv		2014.csv
2015.csv		2015.csv
2016.csv		2016.csv
2017.csv		2017.csv
2018.csv		2018.csv
2019.csv		2019.csv
2021.csv		2021.csv
2022.csv		2022.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MarchMadnessDatasets

Dataset Architecture

About

Releases

Packages

christoukmaji/MarchMadnessDatasets

Folders and files

Latest commit

History

Repository files navigation

MarchMadnessDatasets

Dataset Architecture

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages