This study, approved by the Shahid Beheshti University IRB, investigates the efficacy of conventional machine learning (CML) methods against large language models (LLMs) in predicting COVID-19 mortality from a high-dimensional dataset of 9,134 patients. Conducted under strict ethical guidelines, patient confidentiality was maintained, and informed consent was obtained. The research involved a detailed comparison between seven CML algorithms, including logistic regression and random forests, and eight LLMs, focusing on patient data such as demographics and clinical history. Data preprocessing included feature selection via Lasso and addressing class imbalance using the SMOTE technique. Findings reveal that LLMs, particularly the fine-tuned Mistral-7b model, exhibited robust predictive power. Model performance was evaluated using multiple metrics, including accuracy and F1 score, alongside SHAP analysis for interpretability. Results highlight CMLs’ and LLMs’ potential for clinical application, significantly aiding in accurate outcome prediction for COVID-19 patients.
Source link
