Large language models play a significant role in how individuals gather information, make decisions, and interact with social robots. However, these models often produce fluent but inaccurate responses—termed “confabulations”—which can undermine trust and pose safety risks in embodied agents. To address this issue, the authors propose implementing a lightweight, five-step Cognitive-Behavioural Therapy (CBT) loop within or just above system prompts. This loop encourages models to identify their automatic thoughts, critique them, and adjust their responses with appropriate uncertainty. The authors emphasize the importance of this structured self-check, especially as the internal workings of models become more opaque. By advocating for the adoption of therapy loops across various platforms, including chatbots, APIs, and social robots, they aim to enhance the reliability and safety of these systems while maintaining minimal latency and costs.
Source link
Enhancing Trustworthy AI: A Novel Therapy-Loop Prompt Framework

Leave a Comment
Leave a Comment