Application of large language models in clinical record correction : a comprehensive study on various retraining methods

Research Projects

Organizational Units

Journal Issue

Abstract

Objectives: We evaluate the effectiveness of large language models (LLMs), specifically GPT-based (GPT-3.5 and GPT-4) and Llama-2 models (13B and 7B architectures), in autonomously assessing clinical records (CRs) to enhance medical education and diagnostic skills. Materials and Methods: Various techniques, including prompt engineering, fine-tuning (FT), and low-rank adaptation (LoRA), were implemented and compared on Llama-2 7B. These methods were assessed using prompts in both English and Spanish to determine their adaptability to different languages. Performance was benchmarked against GPT-3.5, GPT-4, and Llama-2 13B. Results: GPT-based models, particularly GPT-4, demonstrated promising performance closely aligned with specialist evaluations. Application of FT on Llama-2 7B improved text comprehension in Spanish, equating its performance to that of Llama-2 13B with English prompts. Low-rank adaptation significantly enhanced performance, surpassing GPT-3.5 results when combined with FT. This indicates LoRA’s effectiveness in adapting open-source models for specific tasks. Discussion. While GPT-4 showed superior performance, FT and LoRA on Llama-2 7B proved crucial in improving language comprehension and task-specific accuracy. Identified limitations highlight the need for further research. Conclusion: This study underscores the potential of LLMs in medical education, providing an innovative, effective approach to CR correction. Low-rank adaptation emerged as the most effective technique, enabling open-source models to perform on par with proprietary models. Future research should focus on overcoming current limitations to further improve model performance.

Doctoral program

Description

Publisher Copyright: © The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved.

Citation

Maitin, A M, Nogales, A, Fernández-Rincón, S, Aranguren, E, Cervera-Barba, E, Denizon-Arranz, S, Mateos-Rodríguez, A & García-Tejedor, Á J 2025, 'Application of large language models in clinical record correction : a comprehensive study on various retraining methods', Journal of the American Medical Informatics Association : JAMIA, vol. 32, no. 2, pp. 341-348. https://doi.org/10.1093/jamia/ocae302