A Linguistic-based Method for SQL Injection Attack Detection and Defense
DOI:
https://doi.org/10.65879/3070-5789.2025.01.05Keywords:
Language processor, machine learning, sanitization, structured query language, SQL injection attacks, tokenization, vectorizationAbstract
As the prominence of artificial intelligence (AI) has grown, companies are finding more ways to incorporate it into previously existing technological infrastructures to optimize their performance. Structured Query Language (SQL) is a popular way to store and retrieve data, and recently has been combined with AI search algorithms to speed up these processes. However, these successes make SQL vulnerable to attacks where malware is injected to gain access, modify, or even delete restricted data, which is also known as SQL Injection Attacks (SQLIA). This paper presents a detection approach that based on language processors, and shifts the focus of defenses against SQLIA from static pattern matching to recognizing the underlying linguistic and structural features of SQL injection attacks. Unlike traditional sanitization, this newly proposed defense will combine tokenization and vectorization with a language processor to detect malicious patterns and flag them. After running our proposed defense on a large public dataset, we received a precision score of 0.789, a recall score of 0.97, and an F1 score of 0.87 in terms of detection performance, which demonstrates the effectiveness of our defense method.
References
[1] Jemal I, Cheikhrouhou O, Hamam H, Mahfoudhi A. Sql injection attack detection and prevention techniques using machine learning. International Journal of Applied Engineering Research 2020; 15(6): 569-580.
[2] Islam S. Future trends in sql databases and big data analytics: Impact of machine learning and artificial intelligence. Available at SSRN 5064781, 2024.
https://doi.org/10.2139/ssrn.5064781
[3] Mulki R. Sql injection isn’t dead. here’s why. Jul 2025. [Online]. Available: https://medium.com/@rizqimulkisrc/sql-injectionisnt-dead-here-s-why-aa4b6657f5c3
[4] Crespo-Martinez IA, Campazas-Vega A, Guerrero-Higueras AM, Riego-DelCastillo V, lvarez-Aparicio CA, Fernandez-Llamas C. Sql injection attack detection in network flow data. Computers & Security 2023; 127: 103093.
https://doi.org/10.1016/j.cose.2023.103093
[5] Gadde H. Integrating ai into sql query processing: Challenges and opportunities. International Journal of Advanced Engineering Technologies and Innovations 2022; 1(3): 194-219.
[6] Alghawazi M, Alghazzawi D, Alarifi S. Detection of sql injection attack using machine learning techniques: a systematic literature review. Journal of Cybersecurity and Privacy 2022; 2(4): 764-777.
https://doi.org/10.3390/jcp2040039
[7] Barlas E, Du X, Davis JC. Exploiting input sanitization for regex denial of service. in Proceedings of the 44th International Conference on Software Engineering 2022; pp. 883-895.
https://doi.org/10.1145/3510003.3510047
[8] Das D, Sharma U, Bhattacharyya D. Defeating sql injection attack in authentication security: an experimental study. International Journal of Information Security 2019; 18(1): 1-22.
https://doi.org/10.1007/s10207-017-0393-x
[9] Appelt D, Nguyen CD, Briand LC, Alshahwan N. Automated testing for sql injection vulnerabilities: an input mutation approach. in Proceedings of the 2014 International Symposium on Software Testing and Analysis 2014; pp. 259-269.
https://doi.org/10.1145/2610384.2610403
[10] Wieting J, Bansal M, Gimpel K, Livescu K. Charagram: Embedding words and sentences via character n-grams. arXiv preprint arXiv:1607.02789, 2016.
https://doi.org/10.18653/v1/D16-1157
[11] Khan JR, Farooqui SA, Siddiqui AA. A survey on sql injection attacks types & their prevention techniques. Journal of Independent Studies and Research Computing 2023; 21(2): 1-4.
https://doi.org/10.31645/JISRC.23.21.2.1
[12] Alotaibi FM, Vassilakis VG. Toward an sdn-based web application firewall: Defending against sql injection attacks. Future Internet 2023; 15(5): 170.
https://doi.org/10.3390/fi15050170
[13] Sheng J. Research on sql injection attack and defense technology of power dispatching data network: Based on data mining. Mobile Information Systems 2022; 2022(1): 6207275.
https://doi.org/10.1155/2022/6207275
[14] Muhammad T, Ghafory H. Sql injection attack detection using machine learning algorithm. Mesopotamian Journal of Cybersecurity 2022; 2022: 5-17.
https://doi.org/10.58496/MJCS/2022/002
[15] Habib U. A survey on implication of artificial intelligence in detecting sql injections.
[16] Alorainy W. Ml-psdfa: A machine learning framework for synthetic log pattern synthesis in digital forensics. Electronics 2025; 14(19): 3947.
https://doi.org/10.3390/electronics14193947
[17] Kapoor M, Fuchs G, Quance J. Rexactor: Automatic regular expression signature generation for stateless packet inspection. in 2021 IEEE 20th International Symposium on Network Computing and Applications (NCA). IEEE 2021; pp. 1-9.
https://doi.org/10.1109/NCA53618.2021.9685959
[18] Yeboah PN, Kayes A, Rahayu W, Pardede E, Mahbub S. A framework for phishing and web attack detection using ensemble features of self-supervised pre-trained models Authorea Preprints 2025.
https://doi.org/10.36227/techrxiv.173603362.21995515/v1
[19] Priyanka AK, Smruthi SS. Webapplication vulnerabilities: Exploitation and prevention. in 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA). IEEE 2020; pp. 729-734.
https://doi.org/10.1109/ICIRCA48905.2020.9182928
[20] Creating a vulnerable sql injection lab for sqlmap practice. [Online]. Available: https://www.linkedin.com/pulse/creating-vulnerablesql-injection-lab-sqlmap-practice-jose-pacheco-ej3nc/
[21] Cui ED. Vectorization: A Practical Guide to Efficient Implementations of Machine Learning Algorithms. John Wiley & Sons, 2024.
https://doi.org/10.1002/9781394272976
[22] Gupta P, Bagchi A. Introduction to numpy. in Essentials of Python for Artificial Intelligence and Machine Learning. Springer 2024; pp. 127-159.
https://doi.org/10.1007/978-3-031-43725-0_4
[23] Choo S, Kim W. A study on the evaluation of tokenizer performance in natural language processing. Applied Artificial Intelligence 2023; 37(1): 2175112.
https://doi.org/10.1080/08839514.2023.2175112
[24] Thatikonda M, PK MK, Amsaad F. A novel dynamic confidence threshold estimation ai algorithm for enhanced object detection. in NAECON 2024-IEEE National Aerospace and Electronics Conference. IEEE 2024; pp. 359-363.
https://doi.org/10.1109/NAECON61878.2024.10670627
[25] Yacouby R, Axman D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. in Proceedings of the first workshop on evaluation and comparison of NLP Systems 2020; pp. 79-91.
https://doi.org/10.18653/v1/2020.eval4nlp-1.9
[26] Zhou B. Optimized feature engineering for machine learning-based financial trend prediction. Available at SSRN 5734370.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Journal of Cybersecurity, Digital Forensics and Jurisprudence

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.