Why conversational AI alone won’t improve health outcomes.
Over the past week, I’ve fielded the same question dozens of times: What do you think about ChatGPT Health and Claude for Healthcare?
My answer surprises people. These launches represent genuine progress—especially in making healthcare data more accessible and intelligible. But we’re also at risk of repeating a familiar mistake in digital health: We’re confusing information with intervention.
Access matters. Understanding matters. Neither reliably changes what you do tomorrow morning—and tomorrow morning is where outcomes live.
When Data Becomes King, Behavior Is Queen
A recent analysis of ChatGPT Health captured something essential: “Data is king, not AI.”[1]
I agree.
But there’s a more important follow-up question: What do you do with your data once you finally have it?
Here’s the uncomfortable truth that behavioral science has demonstrated for decades: Having your medical records doesn’t automatically make you healthier.[2][3]
Health data alone rarely drives sustained behavior change. Often it just creates more anxiety.
Nobel laureate Richard Thaler, whose work on “nudge theory” revolutionized our understanding of human decision-making, put it simply: “If you want to encourage some activity, make it easy.”[4] Yet today’s conversational AI makes understanding easy while leaving the hard work of behavior change entirely to the patient.
Consider what these systems can do remarkably well. ChatGPT Health and Claude for Healthcare can summarize medical histories, explain lab results, prepare patients for visits, and translate clinical jargon into plain language. Google’s Personal Health Large Language Model (PH-LLM), integrated with Fitbit data, can contextualize your sleep patterns and fitness metrics.[5,6]
That’s valuable. But it’s largely passive consumption of historical data.
It still doesn’t answer the question that matters most: What should I do differently tomorrow?
The Architecture Healthcare Actually Needs
After years of building AI-driven health systems, I’ve come to believe effective healthcare AI requires three distinct layers working in concert:
Layer One: Systems of Record — The Data Infrastructure
This is the foundation: electronic health records, claims databases, lab systems, wearable devices, and the interoperability standards that connect them. FHIR APIs, SMART on FHIR, and platforms like b.well and HealthEx have made real progress here.[7,8]
But semantic consistency remains poor. Even basic definitions like “normal” or “high risk” vary wildly across systems and populations.[9,10] A woman with hemoglobin of 12.5 g/dL might be classified as anemic by WHO standards but normal by other clinical guidelines.[11,12] These aren’t edge cases—they’re endemic to healthcare data.
Status: Improving, but fragmented.
Layer Two: Conversational Interfaces — The Comprehension Layer
This is where ChatGPT Health and Claude for Healthcare excel. They translate medical complexity into human understanding, summarize overwhelming information, and reduce cognitive burden for both patients and clinicians.[13,14]
OpenAI’s January 2025 announcement of ChatGPT Health emphasized its ability to help users “understand their health information” and “prepare for doctor visits.”[13] Anthropic’s Claude for Healthcare, launched the same week, highlighted similar capabilities while emphasizing HIPAA compliance and integration with Epic’s MyChart.[14,15]
This is meaningful progress. Claude Opus 4.5, for instance, achieves 92.3% accuracy on medical calculation tasks—though that also means roughly one error per thirteen calculations, which remains problematic for medication dosing.[15,16]
Status: Advancing rapidly.
Layer Three: Personalized Action Engines — The Intervention Layer
This is where the current narrative breaks down completely.
This layer must turn insight into context-aware action, deliver interventions at the right psychological moment, learn from individual response patterns, coordinate across multiple health behaviors and conditions, and sustain engagement over weeks and months.
This layer is largely absent from the conversation.
Without it, we risk building a system that helps people understand exactly why they’re still unhealthy—while their A1C, blood pressure, and weight remain unchanged.
Why Conversational AI Isn’t Built for Behavior Change
This isn’t a criticism of large language models. It’s a category mismatch. Conversational AI is optimized for reactive question-answering. Behavior change requires proactive orchestration.
LLMs for behavior change is a category mismatch. We need proactive orchestration, not reactive responses.
Take someone with prediabetes. A chatbot can explain what A1C means, offer general diet advice, and discuss exercise benefits. But it typically cannot detect when they’ve been sedentary for three days and send a timely, personalized nudge. It cannot adapt recommendations based on their demonstrated barriers. It cannot learn which interventions actually work for them. And it cannot coordinate behavior change across the multiple conditions most patients are managing simultaneously.
Understanding is not adherence. Thaler and his co-author Cass Sunstein demonstrated this principle across hundreds of contexts in their landmark book Nudge: “No choice is ever presented to us in a neutral way,” they wrote, “and we are all susceptible to biases that can lead us to make bad decisions.”[4,17].
The solution isn’t more information. It’s better choice architecture—the deliberate design of environments that make healthy choices easier to execute.
The Empathy Problem Nobody’s Solving
Some research suggests AI can simulate empathy convincingly in text-based interactions. A 2025 systematic review found that ChatGPT has a 73% likelihood of being perceived as more empathic than human clinicians.[18,19]
But here’s what gets lost: These models generate empathy through “linguistic mimicry based on probabilistic text prediction,” not genuine emotional understanding.[19,20] When patients discover responses come from AI rather than humans, perceived authenticity drops significantly.[20]
More fundamentally, clinical empathy involves “emotion-guided imagining of what a moment feels like to the patient—resonating with emotional shifts in real-time.”[21] This genuine empathic connection improves treatment adherence, enables more effective communication about sensitive topics, and helps patients cope with difficult diagnoses.[21]
AI cannot provide this. It can simulate empathetic language patterns, but it cannot form the therapeutic alliance that predicts positive outcomes across healthcare interventions.
This is precisely why human clinicians must remain central to the intervention layer.
In systems designed for behavior change at scale, interventions must be authored by clinicians who understand therapeutic context, approved with clinical oversight before deployment, monitored for therapeutic appropriateness across diverse populations, and refined based on clinical judgment—not just algorithmic optimization.
The role of AI should be to scale and personalize clinician-designed interventions, not to replace the clinical wisdom and genuine empathy that make those interventions effective.
What Actually Works
Decades of research on digital health interventions reveal a consistent pattern. Effective systems include personalization based on individual characteristics, timely prompts delivered at decision points, feedback loops that reinforce progress, credible sources that establish trust, sustained engagement over clinically meaningful timeframes, and coordination across multiple behaviors.[22,23,24]
The hard part isn’t the language model. It’s the behavioral system design—the infrastructure that turns momentary insight into sustained habit change.
At CueZen, we’ve built this intervention layer with NudgeStream, serving over 1.1 million users daily in Singapore’s national health system. Our graph neural network-based approach achieved a 6.17% increase in daily steps and 7.61% increase in exercise minutes—not through better explanations, but through personalized, timely nudges informed by social network dynamics and individual response patterns.[25,26]
The point isn’t that our system is unique. It’s that this layer requires entirely different primitives than conversational AI.
It needs:
- Real-time behavioral sensing to detect engagement patterns
- Personalization engines that learn individual responses
- Orchestration logic to coordinate multiple interventions
- Clinical governance to ensure therapeutic appropriateness
- Outcome measurement to validate effectiveness
None of this comes naturally from large language models.
The Practical Implication for Healthcare Leaders
If you’re evaluating AI strategy for your health system, payer organization, or digital health company, here’s the crucial distinction:
Conversational AI will help you improve patient understanding, reduce administrative burden, and enhance documentation and communication.
Conversational AI will not, by itself, improve medication adherence, sustain lifestyle change, reduce readmissions, or improve chronic disease outcomes.
For those outcomes—the ones that actually matter for population health and total cost of care—you need an intervention engine designed explicitly for behavior change.
The Architecture That Actually Works
The future isn’t a binary choice: OpenAI or Anthropic. Data access or conversation or intervention.
It’s an integrated ecosystem where data is accessible, information is understandable, and actions are personalized and sustained.
Each layer has a distinct job. Confusing them leads to disappointment, wasted investment, and ultimately—unchanged outcomes.
Think of it as a relay race. The data layer gets the baton (your health information) into play. The comprehension layer hands it to you in a form you can grasp. But without the intervention layer to carry it across the finish line, the race remains incomplete.
Data Is King. Behavior Is Queen.
We’ve made genuine progress on healthcare data access and AI-powered comprehension. OpenAI, Anthropic, and Google deserve credit for advances that would have seemed impossible a decade ago.
But access to data is powerful only if we can translate insight into sustained action. Understanding that data is empowering only if it shapes tomorrow’s choices, not just today’s awareness.
The missing layer in healthcare AI isn’t more sophisticated language models. It’s the behavioral infrastructure that transforms momentary insight into lasting habit change—the systems that operate when the conversation ends and real life begins.
We need systems that operate when the conversation ends and real life begins
As Thaler and Sunstein wrote in their framework for behavior change: “By knowing how people think, we can design choice environments that make it easier for people to choose what is best for themselves, their families, and their society.”[17]
We’ve built the systems that help people know. Now it’s time to build the systems that help people do.
Because in healthcare, knowing your numbers and changing your numbers are two very different things. And only one of them keeps you out of the hospital.
Disclosure: I love working in Healthcare & I love the rapid progress we are making in AI. As both, an academic, and an entrepreneur I believe in the power of technology disrupting traditional thinking and advances that empower systems at scale. I also believe this must be done safely, conscientiously and while preserving privacy and sanctity of those we serve.
The author is CEO of CueZen, which builds behavioral intervention systems for healthcare organizations.
References
[1] Becker’s Hospital Review. (2025). ChatGPT Health: Data is king, not AI. Healthcare IT News.
[2] Thaler, R. H., & Sunstein, C. R. (2008). Nudge: Improving Decisions about Health, Wealth, and Happiness. Yale University Press.
[3]Webb, T. L., & Sheeran, P. (2006). Does changing behavioral intentions engender behavior change? A meta-analysis of the experimental evidence. Psychological Bulletin, 132(2), 249-268.
[4] Thaler, R. H. (2015). Misbehaving: The Making of Behavioral Economics. W.W. Norton & Company.
[5] Google Research. (2025). Advancing personal health and wellness insights with AI: Towards a Personal Health Large Language Model. Retrieved from https://research.google/blog/advancing-personal-health-and-wellness-insights-with-ai/
[6] Google for Health. (2025). Advancing Cutting-edge AI Capabilities: Personal Health Large Language Model. Retrieved from https://health.google/ai-models/
[7] b.well Connected Health. (2024). Healthcare Interoperability Platform. Retrieved from https://www.bwell.com
[8] Centers for Medicare & Medicaid Services. (2024). Interoperability and Patient Access. CMS.gov.
[9] Benson, T., & Grieve, G. (2021). Principles of Health Interoperability: FHIR, HL7 and SNOMED CT (4th ed.). Springer.
[10] Lehne, M., Sass, J., Essenwanger, A., Schepers, J., & Thun, S. (2019). Why digital medicine depends on interoperability. npj Digital Medicine, 2(1), 79.
[11] Beutler, E., & Waalen, J. (2006). The definition of anemia: what is the lower limit of normal of the blood hemoglobin concentration? Blood, 107(5), 1747-1750.
[12] World Health Organization. (2011). Hemoglobin concentrations for the diagnosis of anemia and assessment of severity. WHO/NMH/NHD/MNM/11.1.
[13] OpenAI. (2025). Introducing ChatGPT Health. Retrieved from https://openai.com/blog/chatgpt-health
[14] Anthropic. (2025). Claude for Healthcare: Enterprise-Grade AI for Health Systems. Retrieved from https://www.anthropic.com/healthcare
[15] Financial Content. (2025). Anthropic Launches “Claude for Healthcare”: A Paradigm Shift in Medical AI Integration and HIPAA Security. FinancialContent Business News.
[16] Anthropic. (2025). Claude Opus 4.5 System Card. Retrieved from https://www.anthropic.com/system-cards
[17] Sunstein, C. R., & Thaler, R. H. (2021). Nudge: The Final Edition. Penguin Books.
[18] Atkinson, T., Koenig, C. J., Babu, D., & Lawson, S. W. (2025). AI chatbots versus human healthcare professionals: a systematic review and meta-analysis of empathy in patient care. British Medical Bulletin, 156(1), ldaf017.
[19] Weissman, G. E., Halpern, S. D., Perlis, R. H., & Mehta, S. J. (2025). Patient perceptions of empathy in physician and artificial intelligence chatbot responses to patient questions about cancer. npj Digital Medicine, 8, 96.
[20] Mozafari, N., Weiger, W. H., & Hammerschmidt, M. (2024). Artificial empathy in healthcare chatbots: Does it feel authentic? Intelligent Systems with Applications, 22, 200361.
[21] Dranseika, V., Piasecki, J., & Waligora, M. (2021). In principle obstacles for empathic AI: why we can’t replace human empathy in healthcare. AI & Society, 36, 1111-1118.
[22] Krebs, P., Prochaska, J. O., & Rossi, J. S. (2010). A meta-analysis of computer-tailored interventions for health behavior change. Preventive Medicine, 51(3-4), 214-221.
[23] Webb, T. L., Joseph, J., Yardley, L., & Michie, S. (2010). Using the internet to promote health behavior change: a systematic review and meta-analysis of the impact of theoretical basis, use of behavior change techniques, and mode of delivery on efficacy. Journal of Medical Internet Research, 12(1), e4.
[24] Morrison, L. G., Yardley, L., Powell, J., & Michie, S. (2012). What design features are used in effective e-health interventions? A review using techniques from critical interpretive synthesis. Telemedicine and e-Health, 18(2), 137-144.
[25] Teredesai, A., et al. (2021). NudgeRank: A Large-Scale Graph-Based Approach for Digital Health Interventions. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (pp. 3557-3567).
[26] Ministry of Health Singapore. (2024). National Steps Challenge: Impact Assessment Report. MOH Holdings.
Author:
Ankur Teredesai Ankur Teredesai is the Founder and CEO of CueZen and a tenured Professor at the University of Washington, with over 20 years of experience in Healthcare AI. He has authored 100+ publications and built AI systems deployed globally across neonatal care, chronic disease management, and palliative care.