top of page
LaDainianTomlinson.jpg

Running Back Data

Running Some Numbers

I decided to explore data on NFL players who played running back, a role that I find rather fascinating.  Specifically, I looked at running backs who, as of 2021, have been inducted into the Pro Football Hall of Fame and who are designated as being “modern-era” players (i.e., those whose careers began after the end of World War II).  Analyses were performed in Python and used the Pandas and Seaborn libraries.

 

What follows are observations and discussion of the data based on several visualizations of its distribution.  Some insights are indicative of actual running back prowess or changes in the nature of the game of professional football over time, and others are likely merely due to the underlying structure and limitations of the data itself.

I combined all player season stats into one table for a total of 370 different running back seasons across a 66-year period from 1946 to 2011, inclusive.  Their career summary statistics were compiled in another table, this one with 33 rows (one for each back).  

RB1.png

​The data frame was loaded and info on the variables included was brought up for display.

RB2.png
RB3.png

Over time, NFL offenses have become more potent; this is in large part a result of greater restrictions being placed on defenses via rules changes over the years.  Both the rushing and receiving game are more prolific today than decades ago, not only in terms of yardage and touchdowns, but also in the number of plays run (when it comes to the running game, running backs in the current NFL rush about three times more often than their predecessors from the 1940’s and 1950’s).  Using regression plots, I plotted the relationships between rushes and time, rushing yards and time, and rushing TDs and time for the season stats table.  Not surprisingly, these plots all exhibit positive trends.

RB4.png
RB5.png
RB6.png

It practically goes without saying that rush yards and rushing TDs are partially functions of rushing attempts, and these relationships was also plotted, with the expected positive trends clearly visible.

RB7.png
RB8.png

Touchdowns as a function of yardage were also plotted.  We once again see a positive trend here.

RB9.png

It should be noted that in the plots of TDs with respect to time, rushes, and rushing yards, there is a good deal of spread around the regression line, as the number of touchdowns is also partially dependent upon offensive field position (i.e., if a player’s average rush began closer to the opponent’s end zone in a given season, he would probably be likely to score more touchdowns in that same season).

Yards per attempt over time was also examined, to see if there had been a noteworthy change.  What follows is arguably the most fascinating part of the study.  As we will see, the relationship between these variables is far less clear cut than the others we have just looked at.

RB10.png

It is striking that we actually see a slight downward trend in yards per attempt over time.  On the surface, it might be tempting to attribute this to a decline in running back efficiency.  However, there are actually several other factors that potentially explain this phenomenon.  They are as follows:

1) Sample size (i.e., the number of rushes) has an effect.  With fewer rushes, stochasticity comes into play and often leads to higher or lower average rush lengths than would be seen in a season with a greater number of rushes.

RB11.png

For an illustration of this, when we look at the relationship between number of rushing attempts and yards per attempt, we can see that given more attempts, yards per attempt tends to exhibit a “regression to the mean” effect (i.e., data points become more closely clustered around the mean line), whereas with fewer rushes, random chance becomes more of a factor and yards per carry is more spread out around the mean line.  The running backs who played very early in our study period rushed less often than those who played later, with their average rush lengths correspondingly skewed.  However, there is a reason that the skew for these backs is tending towards higher rather than lower average rush length; this is discussed in the following item.

2) There is a significant age-related component to consider.  The running back seasons with points at the left end of the yards per attempt vs. year plot, in addition to having fewer attempts each, also represent a small handful of younger running backs who would have been more likely, given their fresh legs and more youthful bodies, to consistently pull off long runs rather than short ones.  The reverse is true for points on the right side of the graph; those few HOF running backs still active in the last few seasons of the study period had by that time incurred major physical wear on their bodies, and so would have been more likely to rush for shorter distances on average in the carries that they were afforded by their teams (which would end up numbering fewer than early in their careers as they transitioned from starting to backup or platoon roles).  More toward the middle of the plot, we have years where there was more likely to be a mix of older and younger HOF backs whose yards per attempt numbers offset each other, to some degree.

For another representation of all of this, aggregation is useful.  I aggregated RB seasons by year, and obtained a plot showing median yards per attempt values over time (median was used in lieu of mean to ensure robustness to potential outliers).

RB12.png

We now have a plot with a trendline that exhibits a noticeable upturn at the left end.  Were we to extend this analysis to pre-modern era HOF backs (those whose careers began anywhere from the league’s inception in 1920 up to the end of WWII), we would very likely see the trendline in the area around the late ’40’s flatten out, as we would then be including some older backs’ seasons in the analysis of the years immediately following WWII, with said seasons balancing out those of the very young backs.  The upturn at the left side of the line would probably be “relocated” further to the left, representing the first few years of the league in the 1920’s, when most future HOF running backs who were active would have been rather young.

3) The running back in our study period who entered the league the earliest was Marion Motley, an uncommonly efficient running back who holds the highest career yards/attempt average among running backs in league history (the boxplot of career yards per attempt clearly shows his 5.7 as the lone outlier).

RB14.png
RB13.png

Charley Trippi and Joe Perry were the second and third running backs to enter the league in the study period, respectively, and they were also extremely efficient rushers who averaged at least 5 yards a carry over their careers.

RB15.png

Together, Motley, Trippi, and Perry represent three of the six total running backs in this study who averaged 5 yards or more per rush for their careers.  The fact that the entirety of Motley’s and Trippi’s careers occurred over the first ten years of our study period (1946-1955) further skews the yards/attempt data on that end of the graph, as does the fact that the first half of Perry’s career is encompassed in those years.

One word of caution here: It is tough to say exactly how much of these backs’ efficiency is due to their inherent skill and how much is due to skew as a result of lower volume of offense, as explored above.  However, when we examine Motley's and Perry’s careers more closely, it seems that things run a good degree in their favor.

RB16.png
marionmotley.jpeg
joeperry.jpeg

Motley ran 100 times or more in five of the nine seasons of his career, and Perry ran over 100 times in 7 of the 8 seasons he played up through 1955; this all came during an era in which it was not common for rushers to exceed 100 attempts per year.  Each of those seasons turned in by the two backs put them in competition for most heavily utilized rusher league-wide.  Overall, their rushing attempts are perhaps just numerous enough to suggest that their topline efficiency numbers are more than a mirage.  Anecdotes from their careers corroborate this; both men were said to possess exceptional athletic ability compared to most running backs of their time.  It is by random chance that both happened to be active in roughly the same era and helped set the trendline for the time.

RB22.png
RB17.png

Trippi is a different story.  He only rushed 100 times or more in two of his nine seasons in the league, ending up with 141 fewer total carries than Motley (who played the same number of seasons) and 1242 fewer than Perry (who had a 16-year career, one of the longest for a running back in league history).

RB18.png

Indeed, we can see that Trippi has the second fewest career attempts among the HOF running backs in our analysis.  The smaller sample size, as it were, certainly leaves room for reasonable doubt as to Trippi’s comparative efficiency.  Unlike Motley and Perry, he is not commonly cited by historians as being a tremendously gifted rusher per se (although he was certainly good), which lends support to the idea that he was not quite their equal in the run game.  This is where things get interesting, however.  His overall value as a player had a lot to do with his versatility; in addition to being a solid runner, his passing, receiving, punting, kick and punt return, and defensive prowess were noteworthy.

CharleyTrippi.jpeg
DoakWalker.jpeg

This kind of jack-of-all-trades career shape is shared by another player included in our analysis, Doak Walker, the only back in this study with fewer rushing attempts than Trippi.  His rushing stats, while good, aren’t nearly as impressive overall as those of his contemporaries who are also Hall inductees, and yet he did many different things for the teams he played on, including kicking and punting.  Trippi and Walker are illustrations of how fluid positional duties could sometimes be in the early years of the modern era; these two were in some respects carryovers from the pre-modern era, in which players were often expected to fill several different roles on the field, a rarity in today’s game.  Their inductions are also indicative of how Hall voters sometimes value overall ability in lieu of transcendence in any one facet of the game.

Returning to our analysis of yards per attempt, it is instructive to look at how the 20 highest seasonal figures in this stat align with number of rushes.

RB19.png
jimbrown.jpeg
barrysanders.jpeg

Of particular note is that 12 of the top 20 (including the top 11), were achieved with less than 100 carries, a small number by today’s standards.  It is questionable whether these backs would have been able to maintain such efficiency had they been given a workload typical of more recent starting rushers.  What really stand out on this table are the 6.4 Y/A by Jim Brown in 1963 and the 6.1 Y/A by Barry Sanders in 1997, both achieved with around 300 carries.  In fact, these two also have very high career yards/attempt numbers (visible in one of the tables above); both of them averaged at least 5 yards per carry, putting them in the company of Motley, Trippi, and Perry (as well as Gale Sayers, a back with an injury-shortened career who nevertheless made a huge impact on the league and played on an otherworldly level when he was healthy).  Brown’s and Sanders’s career efficiency figures were attained with a sizable number of overall rushes, so they are statistically “meaningful”.  These two players are widely recognized as among the top 5 running backs of all time, and these data points are definitely illustrations of their greatness.

For an additional perspective on this subject, we can look at season long rush numbers (representing the longest rush of a player’s season).

RB20.png
hughmcelhenny.jpeg

The top 20 long rushes in the season stats table represent a span of years from 1952 to 2006 (55 years, inclusive, so about 83 percent of our study period), but only one of them (Hugh McElhenny’s long run from ’52) is contained within the study period’s first ten years.  At least measured by long rushes, the earliest part of our study period is not extraordinary; in fact, it is underrepresented.

RB21.png

Also, a look at the regression plot of long rush with respect to year shows a trend that is almost completely flat and horizontal; that is, long rush length, on the whole, has not changed much over the decades, if at all.  Observations such as these help to disprove the idea that running efficiency somehow declined over the decades, which would have still been theoretically possible even in light of the factors enumerated above.

From the foregoing, we can see that the behavior of the yards per attempt variable over chronological time in our analysis has a lot to do with which data are being used and what exactly they represent.  All of this serves as a reminder that statistics often require context, and failing to familiarize oneself with the subject matter that stats represent can lead to inaccurate impressions and interpretations.

galesayers.jpeg

©2021 by Mike Kane. Proudly created with Wix.com

bottom of page