Coder variability: A closer look at measuring CAC

September 1st, 2017 / By Jessica Clive

When I began college a few years ago, my parents used it as an opportunity to reminisce about their own college experiences. As they did so, I recognized the conveniences I enjoyed as a result of attending school in an era of rapid technological advances. One of these modern conveniences is the internet and its amazing ability to help students monitor their grades online. During the course of my first semester, I quickly developed a habit of checking my test scores on the university website. While I was somewhat interested in seeing my own scores and how they affected my grades, my real motivation was to compare my own scores to the other students in my classes. I based my satisfaction, or dissatisfaction, with my performance on how well I did in comparison to the class average on a particular exam.

After some time, I realized that not only did I have the ability to see the class average, but I also could see the distribution of student scores on each test. This discovery became both a blessing and a curse. I quickly realized that simply comparing my own score to the class average didn’t give me an accurate view of my performance because there were often outliers within the distribution. Some students performed incredibly well and others did poorly; their scores skewed the class average, making me feel better or worse about my own performance when comparing it to the class average alone. This experience was useful in teaching me the lesson that strictly comparing myself to an average was not sufficient to determine how well I performed. Instead, I needed to see the distribution and variation of the data set that composed that average in order to determine true performance.

This lesson has a useful application in the world of computer-assisted coding (CAC). Rather than looking only at the mean or median of recall, we need to dig down deeper and look at the variation among the individual coders’ recall in order to get an accurate picture of a facility’s performance and how well they are utilizing the CAC engine.

In previous posts we have discussed the high degree of variation in recall among coders. To illustrate this, we highlighted one enterprise and looked at only one DRG. We discovered that recall still varied between 60 to 90 percent. This could have been due to different document types, less evidence for some visits when compared to others, or the coders themselves having different habits and not coding consistently even within the same enterprise. This inconsistency is important in regards to computer-assisted coding because the variability in performance affects whether an enterprise is claiming all of the value that can be gained from the CAC engine.

In order to better assist enterprises in monitoring the performance of their coders and the variation in their recall, we have added the standard deviation of recall to the performance metrics reports. Standard deviation is a common statistical metric that is used to measure the degree of spread a data set has from its mean. For our purposes, we have used standard deviation to measure how spread out coders’ recall is from their enterprise or facility’s mean. We refer to this measurement as coder variability.  

One standard deviation encompasses approximately 68 percent of the data set. To provide an example, imagine a group of facilities with a standard deviation of recall of 0.13 and a mean recall of 0.64. This would indicate that 68 percent of the coders from this group of facilities have a recall between 0.51 and 0.77 (0.13 less than 0.64, or 0.13 more than 0.64). This would also indicate that the other 32 percent of coders’ recall fell even further than 0.13 from the mean, or somewhere outside of the range of 0.51 – 0.77.

The scatterplots below demonstrate what it would look like to depict all the facilities for one potential enterprise, with each dot representing a specific facility. The x-axis is a measure of the mean recall of each facility, and the y-axis is a measure of the standard deviation of the coders’ recall at each facility. Additionally, the green line on each plot represents the median of recall, and the red line represents the median of the standard deviation of recall.

To simplify these scatterplots, facilities that fall into the bottom right quadrant of each plot would be those with high recall and consistent coders (as indicated by their low standard deviation of recall), and facilities in the top left quadrant would be those with both a low recall and high degree of inconsistency and variation among their coders. It is interesting to note how spread out the highlighted facilities are on the scatterplots, especially considering the fact that they are all part of the same enterprise and presumably their coders receive similar training and instruction.

The key takeaway from these scatterplots and coder variability is that cases where the standard deviation of recall is high should be considered a signal that standardized policies and procedures may need to be put in place in order to improve the consistency of coder performance. Doing so could help facilities and enterprises obtain as much value as possible from the CAC engine. Additionally, by improving the consistency of coder performance, the mean recall could increase as the low performer outliers increase their individual recall and no longer skew the facility’s average.

Jessica Clive is a business intelligence analyst at 3M Health Information Systems.