Good discussion of some of the cons of working with administrative or billing data for research. While sensitivity is quite variable for using stuff like ICD–9 codes to define cases, specificity is generally very high. This can be a good thing so, let’s not throw the baby out with the bathwater. There are instances when these parameters are acceptable.

Most of the research I’ve done has involved administrative data [1]. In the first paper I published [2], I looked at trends in antibiotic use for Staph aureus infections over the past 10 years for hospitalized children. I relied on administrative data which gave me the benefit of using data from children’s hospitals all over the country and allowed me to look at over 60,000 cases going back to 1999. Because ICD–9 coding is not perfect, sure there were cases I missed. But, I can be almost certain (given the generally high specificity) that all 60,000+ cases I looked at were really due to Staph infections [3]. This same study could easily have been conducted at a single institution and would have taken 5 times as long and not had nearly the same power. Or, I could have formed a multi-institution consortium (easily a years worth of work alone) and spent tens of thousands of dollars conducting this same study at multiple sites over multiple years using confirmed microbiology and pharmacology data. (I think I would have come to the same conclusion.)

The conclusion of the blog post by Dr Edmond states:

Perhaps the forthcoming ICD–10 will help, but the fundamental issue of only reviewing physician notes will remain. More sophisticated methods utilizing computerized algorithms for analyzing electronic medical records for case detection will probably be the ultimate solution.

ICD–10 may help, but really the ultimate solution is making the microbiology data (and other laboratory data) more accessible. The Achilles’ heel of using billing data for infectious diseases research is that micro lab data is not contained in administrative datasets [4]. This should improve with wider adoption of EMR systems, as well as natural language processing systems that will allow us to easily parse notes.

Bottom line—many tools exist for clinical research. None is perfect. It’s important to know both the pros and cons of your tools so you can use them appropriately.

Guess what my opinion on administrative data for research purposes is? ↩
Shameless plug!!! Read my research!! (Don’t really, unless your in peds…and probably only if you’re in peds ID.) ↩
Admittedly, there is lots of nuance I’m skipping over here. I suggest reading the discussion section of the paper I’ve linked to for some more insight into the limitations of using billing data (and the limitations of that paper specifically). ↩
What IS in administrative or billing datasets you ask? The general rule of thumb is anything you can be billed for. So, you can see that someone had a blood culture (because they get charged for that) but you don’t have the results from that blood culture. (Depending on the dataset, you may not even know it is a blood culture, only that something was cultured from somewhere on the patient…) ↩