Home AI Leveraging a Large Language Model for Precise Spectral Embeddings in Mass Spectrometry-Based...

Leveraging a Large Language Model for Precise Spectral Embeddings in Mass Spectrometry-Based Compound Identification

0
A large language model for deriving spectral embeddings for accurate compound identification in mass spectrometry

LLM4MS: Superior Performance in Mass Spectrometry Analysis

LLM4MS excels in large-scale spectral matching for compound identification, utilizing a million-scale in-silico EI-MS library developed by Yang et al., containing over 2.1 million predicted spectra. The software was rigorously tested against a curated test set derived from the NIST23 database, ensuring high representativeness and chemical diversity. With UMAP visualization confirming significant overlap, LLM4MS demonstrated remarkable accuracy, achieving a 66.3% Recall@1, outperforming established methods like Spec2Vec (58.3%) and WCS (56.5%).

Additionally, its integration with Approximate Nearest Neighbor Search techniques enhanced search speeds massively, reaching 14,440 queries per second while maintaining robust accuracy. LLM4MS also showed exceptional generalization performance on unseen compounds, achieving a 41.9% Recall@1. Our intuitive GUI facilitates user engagement, making compound identification from mass spectrometry data more efficient. Software is available at Zenodo, empowering researchers in rapid analysis with cutting-edge LLM-derived embeddings.

Source link

NO COMMENTS

Exit mobile version