Concerned about the potential misuse of smaller language models (LLMs) in malware, the author created a hacking LLM capable of self-replication, presenting their findings at BsidesSF. This hacking bot leverages reinforcement learning and vector searches for guidance, achieving a hacking capability comparable to a teenager. Unlike current AI worms that rely on third-party services, this model cannot be centrally shut down. The author argues that financial incentives make such advanced malware inevitable, as seen with past ransomware profits.
The hacking process involves a supervisor— a Python script that directs the LLM in pre-exploit and post-exploit modes, utilizing tools like NMAP and TruffleHog for reconnaissance and credential gathering. The author recognizes limitations in the model’s ability but enhances its performance by feeding it extensive hacking guides. Ultimately, they refrain from creating replication code due to safety concerns, emphasizing the urgent need for AI safety teams to recognize these capabilities as threats, especially as AI accelerates malware evolution.
Source link