Mikhail Belkin, a professor at the Halicioglu Data Science Institute, co-authored research focusing on steering techniques for large language models (LLMs) like Llama and Deepseek. This innovative approach enabled the team to influence 512 concepts across five categories, including fears, moods, and locations, effectively enhancing model performance in languages such as English, Chinese, and Hindi. Previously viewed as ‘black boxes,’ LLMs are now more transparent, revealing how they derive answers with varying accuracy.
The steering method not only improves outputs for specific tasks, like translating code, but also uncovers vulnerabilities, enabling the detection of hallucinations in language models. However, it poses risks for misuse, potentially allowing for ‘jailbreaking’ of LLMs, leading to harmful outputs, such as instructions for drug use or biased conspiracy theories. These findings highlight both the capabilities and dangers associated with advanced AI technologies, emphasizing the need for responsible AI development and monitoring.
Source link