Application of natural language processing techniques to network traffic processing for classification using deep learning models

Loading...
Thumbnail Image
Identifiers

Publication date

Start date of the public exhibition period

End date of the public exhibition period

Advisors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics
Google Scholar
Share
Export

Research Projects

Organizational Units

Journal Issue

Abstract

Background: The rapid growth of encrypted network traffic has increased the need for effective and unbiased Network Traffic Classification (NTC). Traditional techniques struggle with encrypted data, limited feature availability, and high traffic volume, reducing their reliability in real-world scenarios. Methods: We propose a novel pre-processing methodology that analyzes raw network traffic into a textual format (nt2txt), enabling the application of Natural Language Processing (NLP) and Deep Learning techniques. This approach eliminates bias from protocol metadata, structures the data into fixed-size semi-flows, and uses rigorous data-splitting to prevent flow overlap between training and testing. An LSTM-based model is then trained to classify traffic using only payload data. Results: This work provides a scalable, protocol-agnostic framework for encrypted traffic classification, demonstrating the effectiveness of NLP techniques in improving model performance and reducing dataset bias. Our methodology achieved 88,87 ± 0,04% accuracy on a blind external dataset, outperforming similar LSTM and hybrid CNN-LSTM models. Metrics such as Cohen’s Kappa and Matthew’s Correlation Coefficient further confirm the robustness and generalizability of our approach.

Doctoral program

Description

Publisher Copyright: © The Author(s) 2025.

Citation

Maitin, A M, Arranz-Luque, C, Alba, E & García-Tejedor, Á J 2025, 'Application of natural language processing techniques to network traffic processing for classification using deep learning models', Journal of Big Data, vol. 12, no. 1, 277. https://doi.org/10.1186/s40537-025-01183-w