Application of natural language processing techniques to network traffic processing for classification using deep learning models
Identifiers
Publication date
Start date of the public exhibition period
End date of the public exhibition period
Advisors
Journal Title
Journal ISSN
Volume Title
Publisher
Share
Abstract
Background: The rapid growth of encrypted network traffic has increased the need for effective and unbiased Network Traffic Classification (NTC). Traditional techniques struggle with encrypted data, limited feature availability, and high traffic volume, reducing their reliability in real-world scenarios. Methods: We propose a novel pre-processing methodology that analyzes raw network traffic into a textual format (nt2txt), enabling the application of Natural Language Processing (NLP) and Deep Learning techniques. This approach eliminates bias from protocol metadata, structures the data into fixed-size semi-flows, and uses rigorous data-splitting to prevent flow overlap between training and testing. An LSTM-based model is then trained to classify traffic using only payload data. Results: This work provides a scalable, protocol-agnostic framework for encrypted traffic classification, demonstrating the effectiveness of NLP techniques in improving model performance and reducing dataset bias. Our methodology achieved 88,87 ± 0,04% accuracy on a blind external dataset, outperforming similar LSTM and hybrid CNN-LSTM models. Metrics such as Cohen’s Kappa and Matthew’s Correlation Coefficient further confirm the robustness and generalizability of our approach.


