An Open-Source Software Pipeline for Extracting Medical Information in Oncology Using Large Language Models

September 18, 2025

The information extraction (IE) protocol encompasses four stages: problem definition and data preparation, data preprocessing, LLM-based IE, and output evaluation. Designed for clinical researchers without NLP expertise, it enables extraction from medical texts in multiple formats (PDF, CSV, TXT, Excel). The system is cost-effective, operating on low-resource hardware (e.g., a GPU with 48 GB VRAM). Users can preprocess documents via a GUI, applying Optical Character Recognition (OCR) as needed. The protocol supports the latest LLMs, allowing for extraction tasks without complex programming. After defining extraction parameters, the protocol produces output in CSV format, generating confusion matrices for performance evaluation. Two primary functions include information extraction and document anonymization. Users can download and set up the pipeline easily via Docker or manual installation. This open-source tool is optimized for various models, including Llama, ensuring accessibility and adaptability in processing medical data effectively.

Source link

{{post_title}}

An Open-Source Software Pipeline for Extracting Medical Information in Oncology Using Large Language Models

NO COMMENTS

LEAVE A REPLY Cancel reply

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

RELATED ARTICLES

Transform Your Selfies into Stunning AI Festive Art for Holi 2026...

Public Support for Anthropic Surges After Trump’s Blacklisting, Causing Claude App...

Top 10 AI Tools Revolutionizing Job Applications in 2026 – BBN...

NO COMMENTS

LEAVE A REPLY Cancel reply