March Madness Data

About this Dataset

This repository contains result and play-by-play data from 2010 to 2019 in R Dataset (.rds) form. You can get the data straight from the Internet and this GitHub repository for the datasets that you’d like to use.

Note for Stage 2:

All of the data files have been updated through the end of the current regular season. You can find everything you need in these four files:

  1. Stage2DataFiles.zip (this contains the same type of information as the DataFiles.zip did for Stage 1, but it now includes 2019 data as well)

  2. MasseyOrdinals_thru_2019_day_128.zip (this contains the same type of information as MasseyOrdinals.zip did for Stage 1, but it now includes 2019 data as well). For the absolute latest version of the Massey Ordinals, see the Discussion thread “Massey Ordinals Day 133 Thread”.

  3. PlayByPlay_2019.zip (this contains the same type of information as PlayByPlay_2018.zip, etc., but it is play-by-play for the current 2019 season games)

  4. SampleSubmissionStage2.csv (this has the proper number of rows, and the proper teams for the 2019 tourney only, which is all you predict in Stage 2)

You can just disregard the Stage 1 files and the Prelim files at this point - they are completely superseded by the above release. The only exceptions are that the play-by-play data for earlier years (PlayByPlay_2010, PlayByPlay_2011, …, PlayByPlay_2018) is still useful, and there will be ongoing releases of the latest Massey Ordinals as they become available.