[{"data":1,"prerenderedAt":763},["ShallowReactive",2],{"blog-blog_en-cbt-chatbots-research":3,"alternates-cbt-chatbots-research-en":746},{"id":4,"title":5,"author":6,"body":7,"category":724,"date":725,"description":726,"draft":727,"extension":728,"healthTopics":729,"image":733,"meta":734,"navigation":735,"path":736,"readingTime":737,"reviewedBy":733,"seo":738,"stem":739,"tags":740,"updatedDate":744,"__hash__":745},"blog_en\u002Fblog\u002Fcbt-chatbots-research.md","Five CBT Chatbots, Five Design Choices: How 2024–2025 Studies Map the Field","Nearby",{"type":8,"value":9,"toc":703},"minimark",[10,19,37,42,45,48,55,59,66,77,87,93,97,104,109,123,139,143,158,163,173,183,187,194,199,205,224,228,239,244,249,270,277,281,408,412,415,447,451,454,480,491,495,500,507,511,518,522,525,529,532,536,543,546,551,562,572,581,587,596,605,615,618,633,642,651,661,671,685,694],[11,12,13,14,18],"p",{},"Five specialized CBT chatbots have been clinically evaluated in 2024–2025, each pinned to a different technique: SuDoSys on the WHO PM+ protocol (Chen et al., 2024), a cognitive-restructuring system (Wang et al., 2025), Socrates 2.0 for cognitive reappraisal (Held et al., 2025), a behavioral-activation chatbot for young adults (Kuhlmeier et al., 2025), and a GPT-4 problem-solving therapy system (Mo et al., 2025). All five achieve high protocol fidelity. They differ sharply on the ",[15,16,17],"strong",{},"design choice they make about LLM directiveness"," — the same axis that determines whether the system stays safely inside CBT or drifts into directive advice. This article maps each system to its design choice and to the failure mode it exposes.",[11,20,21,22,27,28,32,33,36],{},"For the pooled effect-size evidence on AI chatbots in mental health (Hedges' g = 0.64 for depression, 2.4× advantage for generative models over scripted), see our ",[23,24,26],"a",{"href":25},"\u002Fblog\u002Fai-chatbot-therapy-meta-analysis","breakdown of the Li et al. 2023 meta-analysis"," and the ",[23,29,31],{"href":30},"\u002Fblog\u002Frule-based-vs-llm-chatbot-depression","LLM-vs-scripted comparison by Du et al. 2025",". Here we focus on the ",[15,34,35],{},"system-by-system clinical evaluations"," that landed in 2024–2025.",[38,39,41],"h2",{"id":40},"why-cbt-is-the-technique-that-automates","Why CBT is the technique that automates",[11,43,44],{},"CBT decomposes into operationalized building blocks: problem assessment, psychoeducation, a defined set of techniques (cognitive restructuring, behavioral activation, exposure, behavioral experiments, Socratic dialogue), change monitoring, and relapse prevention. Each technique has a script: a hierarchy of avoided situations, a format for recording automatic thoughts, mood-rating scales.",[11,46,47],{},"This structure is what \"general ChatGPT\" lacks and what is critical for safe automation. The systematic review by Karki et al. (2025) shows that chatbots and LLMs offer empathy comparable to humans and round-the-clock availability, but require integration into stepped care to be safe.",[11,49,50,51,54],{},"So the 2024–2025 wave is not \"yet another generative companion.\" It is a hybrid: a structured CBT protocol with an LLM generating natural-language responses inside the protocol's rails. The interesting question is ",[15,52,53],{},"how each system implements the rails"," — and that is what divides the five.",[38,56,58],{"id":57},"system-1-sudosys-a-staged-architecture-on-a-who-protocol","System 1 — SuDoSys: a staged architecture on a WHO protocol",[11,60,61,62,65],{},"Chen et al. (2024) introduced ",[15,63,64],{},"SuDoSys",", an LLM chatbot that runs the conversation on the WHO Problem Management Plus (PM+) protocol — a brief 5-session intervention developed for settings with a shortage of specialists.",[11,67,68,71,72,76],{},[15,69,70],{},"Design choice:"," lowest-directiveness rail. The chatbot holds the current stage of the work (contracting → problem assessment → psychoeducation → regulation techniques → change planning → consolidation) and refuses to advance until the stage's exit criteria are met. The LLM generates natural responses ",[73,74,75],"em",{},"inside"," the stage; the protocol gates the transitions.",[11,78,79,82,83,86],{},[15,80,81],{},"What it solves:"," \"general ChatGPT\" loses therapeutic direction in emotionally charged moments — the breakdown documented qualitatively by Song et al. (2024) in ",[73,84,85],{},"Proceedings of the ACM on Human-Computer Interaction"," (Q1). A staged architecture makes the breakdown structurally impossible: the model cannot drift because it does not own the transitions.",[11,88,89,92],{},[15,90,91],{},"Why this matters for safety:"," SuDoSys delivers an already-validated protocol (PM+ has published RCT evidence for depression and anxiety in multiple countries), not an LLM-invented one. The chatbot is a delivery shell for an existing intervention. That is a fundamentally smaller validation surface than \"evaluating the AI's therapy\" from scratch.",[38,94,96],{"id":95},"system-2-a-cognitive-restructuring-chatbot-where-directiveness-leaks-in","System 2 — A cognitive restructuring chatbot: where directiveness leaks in",[11,98,99,100,103],{},"Wang et al. (2025) evaluated an LLM chatbot for ",[15,101,102],{},"cognitive restructuring"," — the central CBT technique in which the client learns to recognize and test automatic dysfunctional thoughts. Expert psychologists rated the system's clinical quality.",[11,105,106,108],{},[15,107,70],{}," higher directiveness budget than SuDoSys. The chatbot is allowed to generate prompts that probe specific cognitive distortions.",[11,110,111,114,115,118,119,122],{},[15,112,113],{},"The failure mode the study exposed:"," the model drifts from ",[15,116,117],{},"exploratory questions"," (\"what arguments are there for and against this thought?\") into ",[15,120,121],{},"directive advice"," (\"think about it like this instead\"). This violates one of CBT's foundational principles — the client's own discovery of alternative interpretations is the active ingredient, not the therapist's correct answer delivered from above.",[11,124,125,128,129,133,134,138],{},[15,126,127],{},"The lesson:"," the quality of a CBT chatbot is set not by the volume of the model's knowledge but by how skillfully the protocol throttles its directiveness in the right places. The same problem is addressed in the prompt-engineering framework by Boit & Patil (",[23,130,132],{"href":131},"\u002Fblog\u002Fprompt-engineering-mental-health-chatbot","breakdown of prompt engineering for mental-health chatbots",") and architecturally in ",[23,135,137],{"href":136},"\u002Fblog\u002Fmind-safe-framework-for-clinics","MIND-SAFE",".",[38,140,142],{"id":141},"system-3-socrates-20-the-hardest-technique-to-automate","System 3 — Socrates 2.0: the hardest technique to automate",[11,144,145,146,149,150,153,154,157],{},"Held et al. (2025) in ",[73,147,148],{},"JMIR Mental Health"," published a mixed-methods feasibility study of ",[15,151,152],{},"Socrates 2.0"," — an AI system for ",[15,155,156],{},"cognitive reappraisal"," through Socratic dialogue. Socratic dialogue is the technique in which the therapist, through a sequence of open questions, helps the client arrive at a more balanced interpretation on their own rather than receive the \"right answer\" from outside.",[11,159,160,162],{},[15,161,70],{}," an exploratory-stance rail explicitly engineered into the prompt. Asking clarifying questions, probing interpretations, holding focus on the session's goal — without giving the answer.",[11,164,165,168,169,172],{},[15,166,167],{},"What worked:"," contemporary LLMs ",[73,170,171],{},"can"," sustain a Socratic dialogue in a format close to a therapeutic one, and they hold goal-focus across a session.",[11,174,175,178,179,182],{},[15,176,177],{},"Where it broke:"," in complex cases of cognitive distortion the model drifted toward advice and lost its exploratory stance — the same failure mode Wang et al. (2025) flagged for cognitive restructuring. Two independent designs converging on the same boundary makes this ",[15,180,181],{},"not a Socrates-2.0-specific limit but a generic CBT-chatbot limit",": today's LLMs can deliver cognitive techniques at moderate complexity, but need an exploratory-stance guard rail to handle difficult cases.",[38,184,186],{"id":185},"system-4-a-behavioral-activation-chatbot-for-young-adults","System 4 — A behavioral-activation chatbot for young adults",[11,188,189,190,193],{},"Kuhlmeier et al. (2025) developed an LLM chatbot for ",[15,191,192],{},"behavioral activation"," (BA) in young adults with depression and evaluated it with artificial users (client simulators) and clinical experts. BA is the most-evidence-based CBT technique for depression — rather than working with thoughts, the client gradually increases the number of activities tied to values and pleasure, breaking the depressive cycle.",[11,195,196,198],{},[15,197,70],{}," strict protocol-fidelity rails. Run the BA session structure, assign correct homework, monitor progress.",[11,200,201,204],{},[15,202,203],{},"What the evaluation confirmed:"," LLM chatbots can carry out a CBT protocol with high fidelity — they follow the session structure, give correct homework, and track progress through scales.",[11,206,207,210,211,214,215,218,219,223],{},[15,208,209],{},"The open frontier:"," ",[15,212,213],{},"robust clinical reasoning"," — responding to atypical client answers, recognizing hidden risks, dynamically adapting intensity. This is the same role-1-vs-role-3 boundary that comes up across every chatbot study: protocol delivery is solved; clinical judgment is not. Adjacent designs like CaiTI (Nie et al., 2024, ",[73,216,217],{},"ACM Transactions on Computing for Healthcare",", Q1, 35 citations) — an LLM \"therapist\" delivered through everyday smart devices — push toward ",[23,220,222],{"href":221},"\u002Fblog\u002Fjust-in-time-interventions-ai-crisis","just-in-time CBT intervention at the right moment",", which raises the bar further.",[38,225,227],{"id":226},"system-5-a-gpt-4-problem-solving-therapy-chatbot","System 5 — A GPT-4 problem-solving therapy chatbot",[11,229,230,231,234,235,238],{},"Mo et al. (2025) in ",[73,232,233],{},"Frontiers in Digital Health"," introduced a ",[15,236,237],{},"PST chatbot built on GPT-4"," for self-help in young adults. Problem Solving Therapy (PST) is a brief CBT-derived approach: defining the problem → generating alternatives → evaluating and choosing → planning implementation → reviewing the result.",[11,240,241,243],{},[15,242,70],{}," the LLM owns more of the dialogue surface, because the protocol is so tightly stepwise that it constrains drift inherently.",[11,245,246],{},[15,247,248],{},"Why PST fits a chatbot uniquely well:",[250,251,252,260,267],"ul",{},[253,254,255,256,259],"li",{},"The protocol is ",[15,257,258],{},"strictly stepwise"," and easy to hold inside a dialogue — almost no room for the model to wander.",[253,261,262,263,266],{},"It operates on ",[15,264,265],{},"current life tasks",", not deep belief restructuring — which lowers the demand on the system's \"therapeutic intuition.\"",[253,268,269],{},"The chatbot helps structure the user's thinking without having to claim the role of a depth therapist.",[11,271,272,273,276],{},"This makes PST a useful upper-bound on what an LLM can own. When the protocol is ",[73,274,275],{},"that"," well-bounded, the chatbot is on safe ground; when it isn't (cognitive restructuring, Socratic reappraisal), the LLM needs an external rail.",[38,278,280],{"id":279},"five-systems-five-rails-side-by-side","Five systems = five rails. Side-by-side",[282,283,284,306],"table",{},[285,286,287],"thead",{},[288,289,290,294,297,300,303],"tr",{},[291,292,293],"th",{},"System",[291,295,296],{},"Technique",[291,298,299],{},"Design choice",[291,301,302],{},"What worked",[291,304,305],{},"Failure mode",[307,308,309,329,349,368,388],"tbody",{},[288,310,311,317,320,323,326],{},[312,313,314,316],"td",{},[15,315,64],{}," (Chen 2024)",[312,318,319],{},"WHO PM+ protocol",[312,321,322],{},"Stage gates control transitions; LLM only inside a stage",[312,324,325],{},"Cannot drift; delivers a pre-validated WHO intervention",[312,327,328],{},"Limited to PM+'s 5-session scope",[288,330,331,337,340,343,346],{},[312,332,333,336],{},[15,334,335],{},"Cognitive restructuring"," (Wang 2025)",[312,338,339],{},"Restructuring",[312,341,342],{},"Higher directiveness budget",[312,344,345],{},"Empathic validation, protocol-holding",[312,347,348],{},"Drifts into directive advice — violates \"client's own discovery\"",[288,350,351,356,359,362,365],{},[312,352,353,355],{},[15,354,152],{}," (Held 2025)",[312,357,358],{},"Cognitive reappraisal",[312,360,361],{},"Exploratory-stance rail in the prompt",[312,363,364],{},"Sustains Socratic dialogue at moderate complexity",[312,366,367],{},"Drifts to advice in complex cognitive distortions",[288,369,370,376,379,382,385],{},[312,371,372,375],{},[15,373,374],{},"BA chatbot"," (Kuhlmeier 2025)",[312,377,378],{},"Behavioral activation",[312,380,381],{},"Strict protocol-fidelity rails",[312,383,384],{},"High fidelity to BA session structure",[312,386,387],{},"Atypical client answers; risk recognition",[288,389,390,396,399,402,405],{},[312,391,392,395],{},[15,393,394],{},"PST chatbot"," (Mo 2025)",[312,397,398],{},"Problem-solving therapy",[312,400,401],{},"Inherent stepwise structure constrains drift",[312,403,404],{},"LLM safely owns more of the dialogue",[312,406,407],{},"Limited to current-task work, not depth",[38,409,411],{"id":410},"limits-common-to-all-five-systems","Limits common to all five systems",[11,413,414],{},"Across the five systems, the same risk zones surface:",[416,417,418,424,430,441],"ol",{},[253,419,420,423],{},[15,421,422],{},"Directiveness drift — the central CBT-chatbot failure mode."," Two of the five systems (Wang 2025; Held 2025) independently showed the model leaking into directive advice where CBT calls for collaborative inquiry. The rail design is the safety question.",[253,425,426,429],{},[15,427,428],{},"Uneven empathy across subgroups."," LLM empathy varies across patient groups (Gabriel et al., 2024). Without balanced corpora and guard rails, users from underrepresented groups receive lower-quality responses than others.",[253,431,432,435,436,440],{},[15,433,434],{},"Crisis handling without dedicated guard rails creates harm."," Less than half of the systems in the Li et al. (2023) review reported safety measures at all. General-purpose LLMs deployed without dedicated mechanisms ",[23,437,439],{"href":438},"\u002Fblog\u002Fai-guardrails-mental-health","create real harm"," (De Choudhury et al., 2023).",[253,442,443,446],{},[15,444,445],{},"Validation surface is small — five techniques, five systems."," The 2024–2025 evidence shows what works for cognitive restructuring, Socratic reappraisal, BA, PST, and a WHO protocol. It does not yet cover exposure therapy, behavioral experiments for OCD, or third-wave techniques (ACT, DBT skills).",[38,448,450],{"id":449},"what-the-five-systems-agree-on","What the five systems agree on",[11,452,453],{},"A coherent design specification for a clinically usable CBT chatbot reads:",[250,455,456,462,468,474],{},[253,457,458,461],{},[15,459,460],{},"Stage gates, not LLM-owned transitions."," Pick a SuDoSys-style staged architecture for protocols with a well-defined session structure (BA, PM+, PST).",[253,463,464,467],{},[15,465,466],{},"Exploratory-stance rail in the prompt."," For techniques where the LLM is allowed to generate dialogue inside a stage (restructuring, Socratic), the rail must throttle directiveness — and even then, it breaks on complex cognitive distortions, so escalate.",[253,469,470,473],{},[15,471,472],{},"A separate safety surface."," Crisis recognition and handoff cannot be a section of the prompt; it has to be an independent layer. EmoAgent (Qiu et al., 2025) and the MIND-SAFE framework demonstrate the architecture.",[253,475,476,479],{},[15,477,478],{},"Bounded scope."," Mild-to-moderate symptoms, not acute crisis or complex comorbidity. The protocol must surface the boundary explicitly to the user.",[11,481,482,483,487,488,138],{},"Nearby implements this specification: CBT protocols with a ",[23,484,486],{"href":485},"\u002Fblog\u002Fmulti-agent-ai-therapist-vs-chatbot","multi-agent architecture"," that separates technique delivery from safety, structured profiling that lowers the directiveness pressure, and explicit escalation to a clinician outside the protocol's scope. The interesting work in this space for the next 12 months is not \"more powerful base models.\" It is ",[15,489,490],{},"better rails",[38,492,494],{"id":493},"frequently-asked-questions","Frequently asked questions",[496,497,499],"h3",{"id":498},"what-design-choice-does-the-cbt-chatbot-evaluation-literature-point-to","What design choice does the CBT-chatbot evaluation literature point to?",[11,501,502,503,506],{},"A ",[15,504,505],{},"staged architecture"," in which the protocol owns transitions between phases and the LLM only generates inside a phase. SuDoSys (Chen et al., 2024) on the WHO PM+ protocol is the cleanest example: contracting → assessment → psychoeducation → regulation → planning → consolidation, with the model unable to advance until exit criteria are met. The Mo et al. (2025) PST chatbot reaches a similar safety profile because PST is so tightly stepwise that the structure constrains drift inherently.",[496,508,510],{"id":509},"why-does-directiveness-drift-matter-clinically","Why does directiveness drift matter clinically?",[11,512,513,514,517],{},"CBT relies on ",[15,515,516],{},"collaborative inquiry",": the client discovers alternative interpretations through guided questions, not by receiving the therapist's \"correct answer.\" Two of the five 2024–2025 evaluations (Wang 2025; Held 2025) showed the LLM leaking into directive advice in complex cases. This breaks therapeutic contact and reduces the client's sense of authorship over change — the active ingredient in cognitive techniques.",[496,519,521],{"id":520},"which-cbt-techniques-have-actually-been-clinically-evaluated-in-20242025","Which CBT techniques have actually been clinically evaluated in 2024–2025?",[11,523,524],{},"Five: structured dialogue on the WHO PM+ protocol (SuDoSys, Chen et al., 2024), cognitive restructuring (Wang et al., 2025), Socratic reappraisal (Socrates 2.0, Held et al., 2025), behavioral activation in young adults (Kuhlmeier et al., 2025), and problem-solving therapy on GPT-4 (Mo et al., 2025). Exposure therapy, OCD-specific behavioral experiments, and third-wave techniques (ACT, DBT skills) are not yet covered.",[496,526,528],{"id":527},"where-does-each-system-break-and-what-does-that-tell-us","Where does each system break, and what does that tell us?",[11,530,531],{},"SuDoSys breaks only on scope (it is bound to PM+). Wang's cognitive-restructuring chatbot and Socrates 2.0 break on the same failure mode — drifting into advice in complex cognitive distortions — making it a generic limit of today's LLMs, not a system-specific bug. Kuhlmeier's BA chatbot has the cleanest fidelity profile but exposes the role-3 boundary: protocol delivery is solved, robust clinical reasoning is not. Mo's PST chatbot is the upper bound on what a model can safely own when the protocol is tightly stepwise.",[496,533,535],{"id":534},"is-a-cbt-chatbot-safe-without-an-explicit-safety-layer","Is a CBT chatbot safe without an explicit safety layer?",[11,537,538,539,542],{},"No. Less than half of the chatbots in the Li et al. (2023) review reported any safety mechanism at all. General-purpose LLMs deployed without dedicated guard rails ",[23,540,541],{"href":438},"create documented harms"," (De Choudhury et al., 2023). Crisis recognition and handoff must be an independent layer — not a section of the prompt.",[544,545],"hr",{},[11,547,548],{},[15,549,550],{},"References",[11,552,553,554,556,557],{},"Boit, S., & Patil, R. (2025). A prompt engineering framework for large language model–based mental health chatbots: Conceptual framework. ",[73,555,148],{},". ",[23,558,559],{"href":559,"rel":560},"https:\u002F\u002Fdoi.org\u002F10.2196\u002F75078",[561],"nofollow",[11,563,564,565,556,568],{},"Chen, Y., Zhang, X., Wang, J., Xie, X., Yan, N., Chen, H., & Wang, L. (2024). Structured dialogue system for mental health: An LLM chatbot leveraging the PM+ guidelines. ",[73,566,567],{},"ArXiv",[23,569,570],{"href":570,"rel":571},"https:\u002F\u002Fdoi.org\u002F10.48550\u002Farxiv.2411.10681",[561],[11,573,574,575,556,577],{},"De Choudhury, M., Pendse, S. R., & Kumar, N. (2023). Benefits and harms of large language models in digital mental health. ",[73,576,567],{},[23,578,579],{"href":579,"rel":580},"https:\u002F\u002Fdoi.org\u002F10.48550\u002Farxiv.2311.14693",[561],[11,582,583,584,138],{},"Du, Q., Ren, Y., Meng, Z., He, H., & Meng, S. (2025). The efficacy of rule-based versus large language model–based chatbots in alleviating symptoms of depression and anxiety: Systematic review and meta-analysis. ",[73,585,586],{},"Journal of Medical Internet Research",[11,588,589,590,556,592],{},"Gabriel, S., Puri, I., Xu, X., Malgaroli, M., & Ghassemi, M. (2024). Can AI relate: Testing large language model response for mental health support. ",[73,591,567],{},[23,593,594],{"href":594,"rel":595},"https:\u002F\u002Fdoi.org\u002F10.48550\u002Farxiv.2405.12021",[561],[11,597,598,599,556,601],{},"Held, P. et al. (2025). AI-facilitated cognitive reappraisal via Socrates 2.0: Mixed methods feasibility study. ",[73,600,148],{},[23,602,603],{"href":603,"rel":604},"https:\u002F\u002Fdoi.org\u002F10.2196\u002F80461",[561],[11,606,607,608,556,611],{},"Karki, A., Kamble, C., Chavan, R., & Chapke, N. (2025). Mental health meets machine learning: The rise of chatbots and LLMs in therapy. ",[73,609,610],{},"International Journal for Research Trends and Innovation",[23,612,613],{"href":613,"rel":614},"https:\u002F\u002Fdoi.org\u002F10.56975\u002Fijrti.v10i5.203281",[561],[11,616,617],{},"Kuhlmeier, F., Hanschmann, L., Rabe, M., Luettke, S., Brakemeier, E.-L., & Maedche, A. (2025). Designing an LLM-based behavioral activation chatbot for young people with depression: Insights from an evaluation with artificial users and clinical experts.",[11,619,620,621,624,625,628,629],{},"Li, H., Zhang, R., Lee, Y.-C., Kraut, R. E., & Mohr, D. C. (2023). Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. ",[73,622,623],{},"NPJ Digital Medicine",", ",[73,626,627],{},"6","(1), 236. ",[23,630,631],{"href":631,"rel":632},"https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs41746-023-00979-5",[561],[11,634,635,636,556,638],{},"Mo, F. et al. (2025). Self-help psychological intervention for young individuals: PST chatbot using GPT-4. ",[73,637,233],{},[23,639,640],{"href":640,"rel":641},"https:\u002F\u002Fdoi.org\u002F10.3389\u002Ffdgth.2025.1627268",[561],[11,643,644,645,556,647],{},"Nie, J., Shao, H., Fan, Y., Shao, Q., You, H., Preindl, M., & Jiang, X. (2024). LLM-based conversational AI therapist for daily functioning screening and psychotherapeutic intervention via everyday smart devices. ",[73,646,217],{},[23,648,649],{"href":649,"rel":650},"https:\u002F\u002Fdoi.org\u002F10.48550\u002Farxiv.2403.10779",[561],[11,652,653,654,556,657],{},"Obradovich, N. et al. (2024). Opportunities and risks of large language models in psychiatry. ",[73,655,656],{},"NPP Digital Psychiatry and Neuroscience",[23,658,659],{"href":659,"rel":660},"https:\u002F\u002Fdoi.org\u002F10.1038\u002Fs44277-024-00010-z",[561],[11,662,663,664,556,667],{},"Omar, M., Soffer, S., Charney, A. W., Landi, I., Nadkarni, G. N., & Klang, E. (2024). Applications of large language models in psychiatry: A systematic review. ",[73,665,666],{},"Frontiers in Psychiatry",[23,668,669],{"href":669,"rel":670},"https:\u002F\u002Fdoi.org\u002F10.3389\u002Ffpsyt.2024.1422807",[561],[11,672,673,674,624,677,680,681],{},"Sharma, A. et al. (2023). Human-centered evaluation of generative AI-based therapy chatbot. ",[73,675,676],{},"NEJM AI",[73,678,679],{},"1","(2). ",[23,682,683],{"href":683,"rel":684},"https:\u002F\u002Fdoi.org\u002F10.1056\u002FAIoa2300127",[561],[11,686,687,688,556,690],{},"Song, I., Pendse, S. R., Kumar, N., & De Choudhury, M. (2024). The typing cure: Experiences with large language model chatbots for mental health support. ",[73,689,85],{},[23,691,692],{"href":692,"rel":693},"https:\u002F\u002Fdoi.org\u002F10.1145\u002F3757430",[561],[11,695,696,697,556,699],{},"Wang, Y. et al. (2025). Evaluating an LLM-powered chatbot for cognitive restructuring: Insights from mental health professionals. ",[73,698,567],{},[23,700,701],{"href":701,"rel":702},"https:\u002F\u002Fdoi.org\u002F10.48550\u002Farxiv.2501.15599",[561],{"title":704,"searchDepth":705,"depth":705,"links":706},"",2,[707,708,709,710,711,712,713,714,715,716],{"id":40,"depth":705,"text":41},{"id":57,"depth":705,"text":58},{"id":95,"depth":705,"text":96},{"id":141,"depth":705,"text":142},{"id":185,"depth":705,"text":186},{"id":226,"depth":705,"text":227},{"id":279,"depth":705,"text":280},{"id":410,"depth":705,"text":411},{"id":449,"depth":705,"text":450},{"id":493,"depth":705,"text":494,"children":717},[718,720,721,722,723],{"id":498,"depth":719,"text":499},3,{"id":509,"depth":719,"text":510},{"id":520,"depth":719,"text":521},{"id":527,"depth":719,"text":528},{"id":534,"depth":719,"text":535},"ai-therapy","2026-05-09","Five clinically evaluated CBT chatbots in 2024–2025 — SuDoSys (WHO PM+), a cognitive-restructuring system, Socrates 2.0, a BA chatbot, and a GPT-4 PST chatbot — each implements a different rail against the central failure mode: directiveness drift.",false,"md",[730,731,732],"Mental health","Cognitive behavioral therapy","Digital mental health",null,{},true,"\u002Fblog\u002Fcbt-chatbots-research",12,{"title":5,"description":726},"blog\u002Fcbt-chatbots-research",[741,724,742,743],"AI mental health","CBT","chatbots","2026-05-19","W586EsVQKCojeinWfCWiwFd7dNZFimG6mmP3R1Kor4U",[747,751,755,759],{"locale":748,"label":749,"path":750},"ru","Русский","\u002Fru\u002Fblog\u002Fcbt-chatbots-research",{"locale":752,"label":753,"path":754},"kz","Қазақша","\u002Fkz\u002Fblog\u002Fcbt-chatbots-research",{"locale":756,"label":757,"path":758},"ky","Кыргызча","\u002Fky\u002Fblog\u002Fcbt-chatbots-research",{"locale":760,"label":761,"path":762},"by","Беларуская","\u002Fby\u002Fblog\u002Fcbt-chatbots-research",1780418369489]