Project 1: Full Analysis

Greatest Risers and Fallers in the NBA Playoffs


Using Data Analytics to gain insights into NBA players’ performance in high-pressure situations.

MIAMI, FLORIDA — MAY 27 2023 (Photo by Michael Reaves/Getty Images)

Background 

After watching the first round of the 2023 NBA playoffs, one question came to mind. When did Jimmy Butler get this good? Out of nowhere he posted an insane 37.6 points, 6.0 rebounds, and 4.8 assists average over a 5 game stretch in comparison to his 2023 regular season average of 22.9 points, 5.9 rebounds, and 5.3 assists. This 64% increase in scoring was able to push the 8th seed Miami Heat over the 1st seed Milwaukee Bucks, an underdog story that has only happened 6 times in NBA history. This notion of Jimmy Butler stepping up during the playoffs has been a regular occurance and is common knowledge among the NBA community. This narrative has gotten so big that the NBA has a 1 hour montage of Jimmy Butler playoff highlights posted to their Youtube with 300k+ views. 

This got me thinking, is "Playoff Jimmy" a real thing? Compared to other NBA stats does Jimmy Butler's playoff performance stand out, or is this just another narrative created by NBA fans to entertain themselves? Furthermore, let's look at all NBA players to see which players had the largest rise or sharpest decline in the playoffs, compared to their regular season performance.

The video walkthrough of this project can be found here: Youtube


Methodology

When starting this project I had three big questions to answer:
1. What single metric am I going to use to compare performance?
2. Which players am I going to analyze?
3. How am I going to adapt for injuries and increased difficulty in competition?


1: What single metric am I going to use to compare performance?

There are multiple single number metrics used by NBA analysts to evaluate a player’s overall impact on the court. Some of the most notable ones are Player Efficiency Rating (PER), Win Shares (WS), Box Plus Minus (BPM), and Value Over Role Player (VORP). The metric that I chose to use is u_PER for two main reasons. The first being the calculation is not adjusted for pace, this is not needed because we are comparing the same player in the same season. The second reason being we can come to a number by using only a player’s box score stats. Below is the code I used to calculate a player’s PER with a formula adapted from Zach Fein of Bleacher Report who used weighted averages to calculate PER.

2: Which players are we going to analyze?

There have been 8271 people who have played at least 1 second in an NBA game. I am going to get a list of 101 qualified players with three parameters:

Playoff Minutes: Players who have played over 2000 playoff minutes.
Reason: Narrow my dataset to players who have spent significant time in the postseason.

Draft Year: Players who were drafted in the year 1979 or after.
Reason: 3 point field goal attempts are used in my calculation, however the NBA did not have a 3 point line until the 1979–80 season.

Average Playoff PER: Players with an average playoff PER of 15 or higher.
Reason: The average player will have a PER of 15. This will help me separate the stars and the bench players.


3. How am I going to adapt for injuries and increased difficulty in competition?

Injuries: To adapt for injuries I am going to keep the playoff runs where the player played 3 or more games.

Playoff difficulty: To account for increased playoff difficulty, I have built my own formula with inspiration from Ben Taylor with Thinking Basketball. This formula will first compare a player’s individual true shooting percentage compared to the league average true shooting percentage in both the regular season and the playoffs. Then, I will find the difference between these two values. If this number is positive the player had an increase in scoring efficiency in the playoffs; if it is negative the player’s scoring efficiency decreased. Lastly, I will add the playoff difficulty factor; this is the percent change in overall shooting efficiency from regular season to playoffs for the entire league. 

True shooting percentage: a measure of shooting efficiency that takes into account field goals, 3-point field goals, and free throws.

Data Collection: Python

The project’s data was extracted using two methods. The first method employed the Python package nba_api, an API client for www.nba.com. One thing I used this tool for was to gather regular season and playoff data for each player within our dataset. The second method implemented was web scraping using Python’s requests and beautiful soup packages, targeting two sites: stats.nba.com and www.basketball-reference.com.

The code below shows how I implemented a combination of web scraping and the nba_api to curate our playerset of 217 athletes. First, I used web scraping to accumulate playoff league leaders’ data from stats.nba.com. Then I sorted the data to only include players with over 2000 playoff minutes played. This data was then structured into a DataFrame. Following this, I used nba_api to gather the complete draft history of the NBA, which was filtered to contain only players who were drafted in 1979 or later. Finally, the data from these two processes were combined using an inner join operation resulting in a DataFrame that included only players present in both data sets. You can find all the code I wrote for the project here: Github.

Data Cleaning: Pandas

Following the extraction of the raw data, the focus shifted to scrubbing the data to get it ready for the analysis stage. For this, I mainly used Python’s built in functions and the Pandas package to get the data cleaned. The initial phase involved identifying what data I wanted to keep for the next step. I settled on 4 datasets. These data frames consisted of 1) The PER differential stats, 2) Regular Season Stats, 3) Playoff Stats, and 4) A backup dataframe. The PER differential stats dataframe will be used as our main analysis piece for the project. The Regular Season and Playoff Stats data frames will be used in a Tableau dashboard to help provide context for the trends found in the data. Finally, the backup data frame will have all the data used to double check calculations.

After identifying my target goal, the next step was building these dataframes. For this, I created a new data frame every time the loop would get data for a new player and give the data a prefix. After that, I combined all of the individual player data frames into one data frame that has the data for all the players. Below is a simplified version of the code used to get regular season data for every player, combined into one data frame, and exported as a .csv.

My final constraint of players with an average adjusted playoff PER of 15 or higher will be applied with the following line of code. This will give me a final playerset of 101 players to do my analysis on.

"comparative_data_15 = comparative_data[comparative_data['adj_P_PER'] > 15]"

After hours of renaming, reorganizing, dropping, popping, locating, inserting, merging and concating I now have my 4 dataframes exported to use in the next step. All code: Github.

Data Analysis: Tableau

Now that all of the data has been collected and cleaned, the tool of choice for analysis is Tableau, a business intelligence tool that helps to visualize data. For the data, I have compiled two graphs and one dashboard to help me understand the data. All of the graphs below can be found at my Tableau.

The first graph is a scatter plot which compares player playoff impact to their change in playoff performance. The x-axis shows percentile of change in playoff performance. This statistic looks at the difference of PER in the regular season vs the playoffs. The y-axis shows percentile of average playoff PER titled “Playoff Impact”. For this graph, the best 25 players are in the upper right of the graph and colored green. These are players who had above average playoff impact and increased their performance come playoff time. 

NOTE: Dashbaords not optimized for mobile. Please view on a desktop.


The second graph is a histogram that shows the binomial distribution of the playoff performance differential with the average set to 0. Here, we can clearly separate below average players in red and above average players in green. One trend that I saw in the top 10% of our playerset was that a large majority of them are two-way players such as Hakeem Olajuwon and Kawhi Leonard.



Lasty, the dashboard I built is split into 5 sections. First is the “Select Player” section where you can pick the player you would like to analyze, from there the following 4 sections will update to show accurate information. The first section is the “Player Bio” which provides more detail such as season played, draft information, and current team. The next section “Career Trajectory” shows how a player trends in the regular season based on their PER. The “Regular Season vs Playoffs” section shows up percent change differentials in key stats like PER, TS% and turnovers. The last section “Relative Performance” shows a mini version of the scatter plot so youe can visualize how the selected player performs relative to the other 100 players in the dataset. 

Feel free to play around with the dashboard. (Only optimized for Desktop/ PC Users).

The dashboard above shows the data for our key player, Jimmy Butler. Here we can see a general upward trend in regular season PER in “Career Trajectory”, as well as a 10% decline in PER from the Regular Season to the playoffs in “Regular Season vs Playoffs”. In the “Relative Performance” section you can see that Jimmy is ironically exactly average, being in the 50th percentile for change in playoff performance. This shows us that Jimmy Butler, based on all the information above, is middle of the pack when it comes to playoff performance differential compared to the other 100 players in the dataset.

Link to Dashboard: https://public.tableau.com/app/profile/james.pavlicek/viz/NBAPlayoffRisersandFallersDashboard/Dashboard

Conclusion

In summary, I first identified the question of which players rise or fall in the playoffs, then I created a plan of methods to use and parameters to set for the next step of collecting data with python. After, I cleaned the data acquired using pandas. Lastly, I analyzed the data in Tableau with graphs and dashboards to help explain the findings of this project.

In conclusion the data gathered and presented in this project can help in multiple ways. The first being it helps inform business decisions for teams evaluating active player new contracts or extensions based on their performance towards the ultimate team goal of winning a championship. A second use would be to provide context to fans and even players of a rough estimate of what to expect for next year’s playoffs based on past history and how to navigate narratives of the media and make their own analysis.

Lastly, I want to thank Youtuber “Think Basketball” for additional inspiration of this project with his video “Playoff changes | Which stars have improved or declined the most?”.

This is the conclusion of my NBA Playoff Risers and Fallers project, but if you have made it this far into the article, I have a little bonus for you.

Bonus: Does Lebron Choke in the Finals?


During the construction of this project, I remembered another narrative from the NBA community that could be better understood with the data analysis techniques used in this project: Is Lebron a Finals Choker?

The premise was simple: How does Lebron’s PER in the first three rounds of the playoffs compare to the 4th and final round of the same year? For this analysis I used the same techniques in the project above with PER as my main metric to compare. I also grabbed additional round specific playoff data from basketball reference to use for this bonus project. Here are the results. 

First is a graph that shows Lebron’s average playoff PER for the first three series (dark blue) and finals (light blue) with the season on the x-axis. With the graph on the right we can see that Lebron had an average PER of 27.971 in the first three rounds and an average PER of 25.995 in the finals, a 7% decrease. Additionally if we look at count, Lebron has been better in the finals 5 out of 10 times. 



The second dashboard lays out each year with more descriptive stats. To start, you can select the season on the left and 3 sections will automatically populate. The first is the “Playoffs” section that shows an image associated with that playoff run as well as a description of where that image is from. The next section, “First Three Series vs Finals”, shows the comparative stats of the percent change from the first three series to the finals. For example, in the 2016 run you can see that Lebron had an increase of 12% in PER and +20% in all other categories. The final section, “Performance by round”, shows how Lebron’s PER trends over time as the playoffs progress with a graph. It also shows detailed information about each series like if they won or lost and which team they played against that round.

In summary, the bonus project reveals that Lebron James’ performance, as measured by Player Efficiency Rating (PER), slightly dips in the NBA finals with a 7% average decrease compared to the first three rounds of the playoffs. However, this does not necessarily signify consistent underperformance or “choking” in the final round as he has outperformed his initial rounds’ average in the finals 5 out of 10 times. This data suggests that while there is a marginal downward trend in his overall finals performance, Lebron’s finals play is still highly impactful and occasionally surpasses his preceding playoff performances.