Application of large language models in clinical record correction : a comprehensive study on various retraining methods

Maitin, Ana M.; Maitín, Ana María; Fernández-Rincón, Sergio; Nogales Moyano, Alberto; Cervera-Barba, Emilio; Denizon-Arranz, Sophia; Mateos-Rodríguez, Alonso; Cervera Barba, Emilio Juan; Denizon Arranz, Sophia; Mateos-Rodríguez, Alonso A.; García Tejedor, Álvaro José

doi:10.1093/jamia/ocae302

Application of large language models in clinical record correction : a comprehensive study on various retraining methods

AIIM-D-24-0062922.pdf (429.78 KB)

Identifiers

URI: https://hdl.handle.net/10641/7515

ISSN: 1067-5027

DOI: 10.1093/jamia/ocae302

Publication date

2025-02-01

Metrics

Share

Export

Abstract

Objectives: We evaluate the effectiveness of large language models (LLMs), specifically GPT-based (GPT-3.5 and GPT-4) and Llama-2 models (13B and 7B architectures), in autonomously assessing clinical records (CRs) to enhance medical education and diagnostic skills. Materials and Methods: Various techniques, including prompt engineering, fine-tuning (FT), and low-rank adaptation (LoRA), were implemented and compared on Llama-2 7B. These methods were assessed using prompts in both English and Spanish to determine their adaptability to different languages. Performance was benchmarked against GPT-3.5, GPT-4, and Llama-2 13B. Results: GPT-based models, particularly GPT-4, demonstrated promising performance closely aligned with specialist evaluations. Application of FT on Llama-2 7B improved text comprehension in Spanish, equating its performance to that of Llama-2 13B with English prompts. Low-rank adaptation significantly enhanced performance, surpassing GPT-3.5 results when combined with FT. This indicates LoRA’s effectiveness in adapting open-source models for specific tasks. Discussion. While GPT-4 showed superior performance, FT and LoRA on Llama-2 7B proved crucial in improving language comprehension and task-specific accuracy. Identified limitations highlight the need for further research. Conclusion: This study underscores the potential of LLMs in medical education, providing an innovative, effective approach to CR correction. Low-rank adaptation emerged as the most effective technique, enabling open-source models to perform on par with proprietary models. Future research should focus on overcoming current limitations to further improve model performance.

Description

Keywords

LLMs, artificial intelligence, clinical records, retraining, Health Informatics, Journal Article, Research Support, Non-U.S. Gov't, Yes, yes

Citation

Maitin, A M, Nogales, A, Fernández-Rincón, S, Aranguren, E, Cervera-Barba, E, Denizon-Arranz, S, Mateos-Rodríguez, A & García-Tejedor, Á J 2025, 'Application of large language models in clinical record correction : a comprehensive study on various retraining methods', Journal of the American Medical Informatics Association : JAMIA, vol. 32, no. 2, pp. 341-348. https://doi.org/10.1093/jamia/ocae302

Collections

FACULTAD DE MEDICINA

Full item page

Depósito Digital UFV

Application of large language models in clinical record correction : a comprehensive study on various retraining methods

Identifiers

Publication date

Start date of the public exhibition period

End date of the public exhibition period

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Share

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Doctoral program

Description

Keywords

Citation

Collections