LLM4MS: Superior Performance in Mass Spectrometry Analysis
LLM4MS excels in large-scale spectral matching for compound identification, utilizing a million-scale in-silico EI-MS library developed by Yang et al., containing over 2.1 million predicted spectra. The software was rigorously tested against a curated test set derived from the NIST23 database, ensuring high representativeness and chemical diversity. With UMAP visualization confirming significant overlap, LLM4MS demonstrated remarkable accuracy, achieving a 66.3% Recall@1, outperforming established methods like Spec2Vec (58.3%) and WCS (56.5%).
Additionally, its integration with Approximate Nearest Neighbor Search techniques enhanced search speeds massively, reaching 14,440 queries per second while maintaining robust accuracy. LLM4MS also showed exceptional generalization performance on unseen compounds, achieving a 41.9% Recall@1. Our intuitive GUI facilitates user engagement, making compound identification from mass spectrometry data more efficient. Software is available at Zenodo, empowering researchers in rapid analysis with cutting-edge LLM-derived embeddings.
