Ukrainian researchers have unveiled Lapa LLM v0.1.2, an optimized large language model based on Gemma-3-12B specifically for the Ukrainian language. This model integrates a new tokenizer, replacing 80,000 out of 250,000 tokens with Ukrainian equivalents, achieving a 1.5x reduction in computational demands for Ukrainian text processing. Lapa LLM is touted as the fastest NLP model in the region, performing nearly on par with market leader MamayLM in instruction comprehension, translation, summarization, and Q&A tasks. In English to Ukrainian translations, it scored 33 BLEU on the FLORES benchmark, making it the top open translator between these languages. Beyond text, Lapa LLM excels in image processing and effectively identifies propaganda and misinformation through advanced filtering methods. Fully open for commercial use, the team has published over 25 training datasets, model source code, and comprehensive documentation. Future plans include enhancing reasoning capabilities and expanding datasets. Explore Lapa LLM on Hugging Face and GitHub.
Source link
