The paper presented at the Workshop on Memory for LLM-Based Agentic Systems at ICLR explores the limitations of Small Language Models (SLMs) in retaining world knowledge due to parameter size constraints. To overcome factual inaccuracies in SLM outputs, the research investigates whether SLMs should learn specific tokens during pretraining or delegate them for external retrieval. The study reveals that loss alone isn’t a sufficient indicator for token prediction; some high-loss tokens may still represent valid continuations of pretraining material. Utilizing a spaCy grammar parser enhances the learning process by refining loss signals to determine which tokens are safe for SLMs to learn and which should be delegated. The proposed method, LaCy, effectively balances token selection, leading to improved FactScores when generating outputs in conjunction with larger models. This approach demonstrates superior performance compared to other techniques like Rho or LLM-judge while being simpler and cost-effective.
Source link
