Congratulations! You’ve just been hired as an analytics scout for an NBA team. More good news, your team has the first overall pick in the upcoming draft. The team is turning to you and your statistical prowess to advise them on which prospect they should select.

You’ve been tasked with submitting a three page (< 1,500 words) report (plus relevant figures and code) on how you think your team should approach the draft using the information below.


The Data

Prospects data

Your predecessor has prepared two files for you. The prospects file contains college statistics for each of the prospects in this year’s draft. The variables included in this data file are:

Variable. Description.
name Prospect’s name
pos Position - Point Guard (PG), Shooting Guard (SG), Small Forward (SF), Power Forward (PF), Center (C)
school School
games Number of games played
mpg Minutes per game
pts Points per game
reb Rebounds per game
fgpercent Field goal percentage
thr_percent Three point percentage
ftpercent Free throw percentage



Players data

The players file contains the same information as prospects file (i.e. college stats) for each NBA player from this past season, as well as the their points per game (nba_ppg) and field goal percentage per game (nba_fgp) in the NBA.



Use the buttons below to download both datasets as CSV files:


The Assignment

Compile a three page report that addresses a series of objectives (see below). Your report should consist of a explanatory text as well as table and figures, and the R code you created for your analysis. This could be submitted either as a Word document and a R script, or in a comprehensive R Markdown file (this is the preferred method).

Note: I should be able to reproduce all of your analyses and figures using the code you submit. This means that you should be commenting your code so that it is reproducible.

Objective 1: Data Validation

The previous analytics scout wasn’t known for their rigorous data keeping (which is why you’re here now). Before your start your analysis, be sure there aren’t any erroneous entries in either the prospects or players files. If there are outliers, be sure to remove them from the data and provide a rationale for why each outlier was removed.

Objective 2: Descriptive Analysis

Your general manager has asked you to provide a statistical summary of this year’s prospects. She hasn’t explicitly asked for specific comparisons, so it’s up to you to design the descriptive report using the prospects file. Maybe you want to compare positions, or find the top prospects in each statistical category. It’s up to you!

Remember that the front office is not full of stats people, so it’s best to rely on graphs whenever possible.

Objective 3: Linear Regression

The team’s coach is an old-school guy. He thinks if you want to know how many points per game a prospect will score in the NBA, all you need is how many points they scored in college. Your general manager believes that it’s important to included additional information.

You’ve been tasked with settling the debate. Using the players data, create a simple linear regression using collegiate points per game to model NBA points per game, as well as a multiple regression that includes additional covariates. Then compare each model and determine who is right.

Once again, it up to you to decide which variables to include, how you want to select the best multiple regression model, and the best way to compare both models to each other. Your analysis should clearly state which choices you make and why, and address underlying assumptions behind these models.

Objective 4: Predictive Analysis

The moment of truth. The team has decided it wants to pick the best scorer in the draft. It is up to you to create a model using the players data with nba_ppg as the response/dependent variable and use that model to predict the best scorer in the prospects data. Be sure describe why you’ve chosen your model, which predictor/independent variables were most important, and how to account for confidence/uncertainty in the predictions.

Hint: take a look at the help file for the predict() function.

Your report will be graded using this rubric.