Know your analytics! (And watch out for the inside change-up)

August 23rd, 2017 / By Jeremy Zasowski

If I told you that Dustin Pedroia, the second baseman for the Boston Red Sox, currently had a batting average of .303 this season, could you predict what he’s going to do the next time he comes up to bat? Or could you tell him what he needs to improve most to help the Red Sox win more games?

I’ll give you a couple more stats for this season.  He is leading his team in batting average, he is tied for seventh in batting average in the American League and he ranks twenty-first in batting average in all of major league baseball. Now can you predict what he’s going to do in his next at bat or tell him what he needs to improve on to help the Red Sox win more games?

What if I throw in all of these stats for this season as well to help you make some predictions, courtesy of the Major League Baseball website:  In the 2017 season so far, Dustin has hit 80 singles, 17 doubles, 6 home runs, and 0 triples. He’s also struck out 37 times and has been walked 41 times.1

So, the next time Dustin Pedroia steps up to the plate, taps the dirt off of his cleats and settles into the batter’s box, what’s he going to do? Single? Strike out? Walk? And other than telling him to hit more home runs, what specifically does he need to improve on?

It’s not really possible to make an accurate prediction on what Dustin Pedroia will do at his next at bat. All of the data that I’ve listed here on Dustin Pedroia fall into the category of “descriptive analytics.” 

Descriptive Analytics answer the question of “what happened?” by aggregating historical data into a summary view of the past. The classic statistics of mean, median and mode fall into this category. These are the types of analytics that are used to populate performance dashboards, and that sports fans argue over for hours on end.

  • Dustin Pedroia hit 15 home runs and batted .378 against left-handed pitchers in 2016.
  • The median house price in Pittsburgh is currently $123,500.
  • Mark’s Hospital had 37 readmissions within 30-days of discharge in 2016.

Descriptive analytics are useful because they give us a more detailed understanding of a situation. The can help point to where an issue might be and possibly give us some ideas about what the issue might be. On their own however, just like knowing that Dustin Pedroia is batting .303 this year, descriptive analytics can’t help us figure out in detail what he needs to do to improve his batting, or help us predict what he will do at his next at bat.

To figure out what is going to happen in the future, or to figure out what you should do about an issue, there are two other types of analytics that are needed: predictive and prescriptive.

Predictive Analytics look to the future to attempt to answer “what could happen?” through building statistical models that assign a probability to the likelihood of specific events occurring in the future.  No algorithm or magical system can predict the future with 100 percent certainty.  Predictive analytics models are all based on probabilities of a future event occurring:

  • There is a 30 percent probability of rain at 3 PM on Tuesday.
  • Those ads that show up in your Facebook feed? Yep, predictive analytics targeted at your profile (age, gender, likes, posts, etc.)
  • Sam Samuels is an 87 year-old man who was just admitted to St. Mark’s Hospital. He lives alone, doesn’t drive, has chronic congestive heart failure, early onset dementia and type II diabetes uncontrolled, so he has an 80 percent probability of being readmitted to the hospital within 30-days of discharge.
  • Dustin Pedroia has a 68 percent probability of getting on base when playing at home, with runners on base, facing left-handed pitchers and ahead in the count.

Prescriptive Analytics are similar to predictive analytics in that they also look to the future to identify probable future states, but prescriptive analytics also look to answer the question of “what should we do about it?” This is done by using optimization and simulation algorithms to identify a possible future state and then offer suggestions on actions to achieve possible specified outcomes.

So, if you’re Joe Girardi, the manager of the New York Yankees, and you’re playing the Red Sox in Boston, you may know that there is a 68 percent probability that Dustin Pedroia will get on base when playing at home, with runners on base, facing left-handed pitchers and ahead in the count.

But you also know that in that same situation, there is a 70 percent probability that Pedroia hits an infield ground ball when the pitcher throws him an inside change-up pitch. So, you signal to your pitcher to throw that inside change-up pitch as a “prescriptive” approach to avoid Pedroia getting on base

If you’re John Farrell, the manager of the Boston Red Sox, you know these same stats and probabilities, and you know that Joe Girardi knows them as well, so you predict that the next pitch is likely to be an inside change-up. You know that Dustin struggles with inside change-ups, so you’ve had him on a “prescriptive” training program during practices this season to specifically address this area and improve his ability to hit the inside change-up.

Baseball and health care might not seem like they have a lot in common, but one thing they do have in common is that they both generate lots of data.  In baseball, teams play 162 regular season games each year. A starting player might have over 600 at-bats in a season, and a pitcher could throw over 3,000 pitches in a season. With all of that data, you have a lot to analyze to find trends, make predictions, and keep a data analyst busy for months.

Each year in the U.S. there are over 30 million hospital admissions2, over 130 million emergency department visits3, over 125 million outpatient department visits4 and over 922 million physician office visits5. These “descriptive” statistics tell us that there are a lot of visits and interactions with the healthcare system occurring across the entire continuum of care, from a doctor’s office, to a radiology clinic, to an emergency department and on into the hospital. That gives us a lot of data to analyze and identify trends, weaknesses and strengths. It should also allow us to use big data analytics to develop predictive and prescriptive analytics to help improve health care.

The challenge is that with this much data, we’ve got to start to automate data analytics. There aren’t enough healthcare data analysts able to work fast enough to process the ongoing flood of data in health care to turn this trove of information into actionable insights. We need to utilize intelligent automation to enable each healthcare provider organization in the U.S. to optimize its delivery network and to optimize the care given to each patient.

Jeremy Zasowswki is innovation manager, 3M Data Informatics for 3M Health Information Systems.


1 https://www.baseball-reference.com/players/p/pedrodu01.shtml

2 http://www.aha.org/research/rc/stat-studies/fast-facts.shtml

3 https://www.cdc.gov/nchs/fastats/emergency-department.htm

4 https://www.cdc.gov/nchs/fastats/hospital.htm

5 https://www.cdc.gov/nchs/fastats/physician-visits.htm