I understand. I have spent a good deal of time and effort leading a group of very smart data scientists, clinical informaticists, health economists and statisticians in developing algorithms to predict the “path” of illness in order to better intercept and intervene before disaster hits. We have the advantage of using the largest commercial claims data base in the US and possibly in the world, with over 40 million people per year. It is also possibly the most up to date health care claims database, as it is refreshed monthly, and the database with the greatest claims history, as it reflects 10 years of claims.
Yet even then, it is very difficult because “one tiny change” can have a dramatic impact. Medicine and health care are filled with low probability and high consequence events. Life is as well. Recently, we have been working on a model to predict people who are likely to be sick enough to generate $250,000 in claims in the coming 12 months. It means predicting any illness, all of which have different profiles, and different patterns that lead up to catastrophe. Thus, there are many numerous variables to predict numerous disease and combinations of diseases. This creates a data science problem that belies an easy solution. It means identifying someone who will be one of the 1 in 7,000 people to be sick enough to generate $250,000 in costs. The nature of the prediction challenge is also to identify those who are likely to have an illness that we can impact with proper treatment and/or proper support to improve the outcome. A missed prediction could mean a person has a much worse outcome, even death or disability. A personal “hurricane” that one may not be able to withstand.
The data that one uses must also include all factors that are present when a person becomes ill. That means including data on emotions, social factors, finance and culture, among others. If someone you love is in an accident and you are speeding while distracted while going to the hospital, you are more likely to have an accident. That is a data point that medical records or health care claims will not reflect. If you have just been fired from your job, that is another datapoint. If you are poor and/or lonely, these are extremely important factors in trying to predict illness. It is not just biology.
But biology is the central pathway, and claims databases, while reflecting medical care and medical interventions and encounters, are basically financial databases. Electronic medical records, genomic information, and social information must be integrated into co-mingled databases to make analysis and, more specifically, prediction more powerful.
With healthcare, as opposed to hurricanes, there are issues of privacy and confidentiality that bring legal and ethical issues into play when combining data. Dorian does not care if we know the details of its hurricane force winds but Jane Smith might care a lot that we are using her genetic information, combining it with her financial profile, any police record, her family dynamics, her friends and her spirituality to try and predict any illnesses and healthcare needs.
All these data feeds help us to become more accurate, but even if the privacy concerns are overcome, there will always be uncertainty. Progress will increase with each new insight, but someone can always trip in their own home and break an arm.
With each new insight we gain more predictability. Recently, in trying to predict patients who are likely to have very high costs due to critical illness, we found that the “velocity” of their medical activities was a useful variable in building our models. To calculate this requires a different algorithm to feed the algorithm.
In other work I am doing, we have found that knowledge of the genetic makeup of the bacteria that are in a normal human intestinal tract, part of the microbiome, can help predict things like sensitivity to medications, propensity to diabetes depending on diet, and disease activity in someone with Inflammatory Bowel Disease. Thus, the microbiome data has become an important variable in predicting disease.
And yet, our society demands perfection and immediacy. In another article about forecasting the path of a hurricane, it was pointed out that forecasts tend to define a forecast cone. The forecast cones have gotten significantly more narrow in the last ten years due to better measures and techniques, but there is still a 1 in 3 chance that the hurricane will be outside the cone completely. That is far from perfection.
I believe that in healthcare, it will be no different. We will make progress and get better with our predictions, but we must always remember that there will always be uncertainty in medicine. We will need to better differentiate between the need for huge databases, such as the one used by my team, and the deep data that will come from more knowledge of the individual, and how the the two will interact. In a future post, I will talk more about big data and deep data and how they both must be used. Ultimately, there will always be some uncertainty, but we can decrease it incrementally in our quest to improve care.
No comments:
Post a Comment