Saturday, October 17, 2015

Is Bad Data Better Than No Data?

We are now in the era of “big data” which, we are told, will answer questions we could never answer and also identify individuals before they are sick so we can intervene and prevent their illnesses and their problems.  It is exciting, earth shattering and the subject of more articles, blog posts and conferences than one can shake a stick at.  However is it true?  Can “big data” or little data for that matter really lead us to salvation (medically speaking at least). 

Forgive me for the religious association however it often seems as though people are taking the pronouncements from the medical data gurus as being holy writ from God.  That bothers me a bit because fundamentally being a monotheist I do tend to think that we mere mortals are not godlike in our perfection, even if we are physicians and even if we are the even godlier than physicians, health policy experts.  There is an old joke about a good man dying and going to heaven.  In heaven he is shown around by one of the angels who take him to the dining hall where a line of happy people are patiently waiting their turn to pick up their food for the day.  He then sees one of the heavenly beings wearing a white coat with a stethoscope in a pocket cut into line.  He asks the angel who that is and the angel says, “Oh that is just God – sometimes he thinks he is a doctor.”  Since we are not godlike in our analyses, we must better understand what all this data means, and whether bad data is better than no data.  Ultimately we need to know how to use data to help those in need.  That is the essence of medicine – helping those in need.     

I speak as a physician and a health data expert who has helped health care organizations, government and large corporations design programs based on the populations they serve.   The work I do is data based and thus I must understand the strength and limitations of data.  I know that data can potentially be used to positively impact the use of precious health care resources and the care a patient receives.  At the same time, in my professional role, I am often the skeptic, challenging those people who claim the data holds magical powers.  Thus I enjoyed the article by Dr. Saurabh Jha in the Health Care Blog entitled, “Quality of Skepticism and Skepticism of Quality.” It was his section on bad data being better than no data that inspired this post.  He makes the point, and I admit it is a point I make all the time, that perfection is the enemy of good but does not stop there as many do. Dr. Jha understands the limitations and that while perfection is the enemy of good, sometimes data analytics do not even achieve the standard of good.   

What then is this data we are talking about?  All data depends on some information, being put into the right format to translate into the binary code that computers understand.  When people speak of big data, at this point in time, they are mainly speaking of claims databases which take billing codes from insurance claims and assume that they accurately reflect the care that is being rendered.  Billing codes are financial tools that drive payment and are used by providers to maximize their revenues (there are courses and consultants that constantly try to help people adjust codes to do just that) and are used by insurance companies to minimize payments.  That results in a game with only passing interest in accurately reflecting what is going on between a doctor and a patient.  With electronic medical records, the hope is that we will obtain more accurate information on what is really happening.  The funny part is that the most popular and widespread EMR gained its market dominance by being able to help hospitals maximize their revenues by capturing all services and materials for accounting and billing purposes, not by accurately telling the clinical story.

Saul Weiner and Alan Schwartz, who I have spoken about in previous posts, have looked at whether medical records actually reflect what happens in an interaction between doctor and patient and have found, by comparing tape recorded encounters, and using standardized actor patients, that the record does not!  Thus even the data inputs from a medical record, considered to be much stronger than the claims records have serious flaws. They point out in their research that the medical records leave out the emotions, competing priorities, financial concerns, spiritual beliefs and other aspects of being human that have a major impact on the care rendered.  They call this contextualized care and have found the ability to understand the person and not only the disease is much more important in driving quality care than the purely bio-medical issues. 

Data tends to suffer from observational bias, sometimes called the “streetlight effect” from a joke that scientists like to tell.  Late at night, a police officer finds a drunken man crawling around on his hands and knees under a streetlight. The drunken man tells the officer he’s looking for his wallet. When the officer asks if he’s sure this is where he dropped the wallet, the man replies that he thinks he more likely dropped it across the street. “Then why are you looking over here?” the befuddled officer asks. Because the light’s better here, explains the drunken man.  We tend to look at these big databases, designed and optimized for financial purposes, because the light is better, even though the answers, the insights, are more likely found in data ‘across the street’ where it is not captured. 

But advances are being made.  Lab data is now included in some databases.  Pharmacy information, which used to be separate, is now incorporated.  Methods using word search and mining audio databases of phone calls between providers, patients, and insurers are starting to be used with some potential effectiveness.   However the databases, on a sheer numbers basis, are still overwhelmingly claims or EMR based, both of which are designed for financial and not clinical purposes. 

All this brings me back to the question which titles this post.  Is bad data better than no data?  I do not have a hard and fast answer.  Bad data can push you to make bad decisions and when the data is big, the bad decisions can really be whoppers.  Big data used to identify individuals is especially prone to mistakes as the variability in people is far greater than can be seen from the financially based data in the databases.  The danger is that we assume that the data is correct.  We assume it to be useful.  Dr Jha takes exception to this and says, “The burden is on proponents of the metrics to prove their usefulness.”  Currently that is not the case and the burden is on those who question the usefulness.  That does need to change and to be tempered by the medical tradition of skepticism. 

None of this is to suggest that the use of data be abandoned.  Perfection is the enemy of the good.  Let’s just understand what we are looking at, what the limitations are, and stop using even good data as if it is perfect.  We need to take a breath and study the use of data to evaluate its effectiveness rather than assume that all answers lie in those numbers.