Congratulations! You’ve just been hired as an analytics scout for an NBA team. More good news, your team has the first overall pick in the upcoming draft. The team is turning to you and your statistical prowess to advise them on which prospect they should select.
You’ve been tasked with submitting a three page (< 1,500 words) report (plus relevant figures and code) on how you think your team should approach the draft using the information below.
Prospects
dataYour predecessor has prepared two files for you. The prospects
file contains college statistics for each of the prospects in this year’s draft. The variables included in this data file are:
Variable. | Description. |
---|---|
name | Prospect’s name |
pos | Position - Point Guard (PG), Shooting Guard (SG), Small Forward (SF), Power Forward (PF), Center (C) |
school | School |
games | Number of games played |
mpg | Minutes per game |
pts | Points per game |
reb | Rebounds per game |
fgpercent | Field goal percentage |
thr_percent | Three point percentage |
ftpercent | Free throw percentage |
Players
dataThe players
file contains the same information as prospects
file (i.e. college stats) for each NBA player from this past season, as well as the their points per game (nba_ppg
) and field goal percentage per game (nba_fgp
) in the NBA.
Use the buttons below to download both datasets as CSV files:
Compile a three page report that addresses a series of objectives (see below). Your report should consist of a explanatory text as well as table and figures, and the R code you created for your analysis. This could be submitted either as a Word document and a R script, or in a comprehensive R Markdown file (this is the preferred method).
The previous analytics scout wasn’t known for their rigorous data keeping (which is why you’re here now). Before your start your analysis, be sure there aren’t any erroneous entries in either the prospects
or players
files. If there are outliers, be sure to remove them from the data and provide a rationale for why each outlier was removed.
Your general manager has asked you to provide a statistical summary of this year’s prospects. She hasn’t explicitly asked for specific comparisons, so it’s up to you to design the descriptive report using the prospects
file. Maybe you want to compare positions, or find the top prospects in each statistical category. It’s up to you!
Remember that the front office is not full of stats people, so it’s best to rely on graphs whenever possible.
The team’s coach is an old-school guy. He thinks if you want to know how many points per game a prospect will score in the NBA, all you need is how many points they scored in college. Your general manager believes that it’s important to included additional information.
You’ve been tasked with settling the debate. Using the players
data, create a simple linear regression using collegiate points per game to model NBA points per game, as well as a multiple regression that includes additional covariates. Then compare each model and determine who is right.
Once again, it up to you to decide which variables to include, how you want to select the best multiple regression model, and the best way to compare both models to each other. Your analysis should clearly state which choices you make and why, and address underlying assumptions behind these models.
The moment of truth. The team has decided it wants to pick the best scorer in the draft. It is up to you to create a model using the players
data with nba_ppg
as the response/dependent variable and use that model to predict the best scorer in the prospects
data. Be sure describe why you’ve chosen your model, which predictor/independent variables were most important, and how to account for confidence/uncertainty in the predictions.
Hint: take a look at the help file for the predict()
function.
Your report will be graded using this rubric.