Intrusion Detection in Database Management System Using Machine Learning

Mayeesha Mahzabin; Brajendra Panda

doi:10.65879/3070-5789.2025.01.10

Authors

Mayeesha Mahzabin Dept. of Electrical Engineering and Computer Science University of Arkansas, Fayetteville, AR 72701, USA Author
Brajendra Panda Dept. of Electrical Engineering and Computer Science University of Arkansas, Fayetteville, AR 72701, USA Author

DOI:

https://doi.org/10.65879/3070-5789.2025.01.10

Keywords:

Database Management System, Intrusion Detection System, Machine Learning

Abstract

As databases become increasingly central to modern information systems, protecting them from unauthorized access and malicious transactions has become a critical research priority. Traditional signature-based intrusion detection systems (IDS) are often ineffective in discovering novel or stealthy attacks due to their reliance on predefined patterns. To address this limitation, this study proposes an anomaly-based database intrusion detection framework that integrates PrefixSpan sequential pattern mining with adaptive binary feature engineering specifically designed for database transaction semantics. The novel contribution lies in the systematic integration of optimal pattern-mining parameters (support ratio = 0.05, pattern length [2–4]) with an OCSVM-RBF kernel transformation that effectively handles discrete binary feature spaces, addressing the fundamental challenge of learning solely from normal data in transactional contexts. The framework demonstrates robustness under realistic noise conditions (20% transaction-level corruption) and provides a comprehensive algorithm–feature-space compatibility analysis, revealing why kernel methods succeed while covariance-based approaches fail on sparse binary patterns. Experimental results show that OCSVM with the RBF kernel achieves a 98% F1-score and 95.15% AUPRC, outperforming Isolation Forest, Local Outlier Factor, Elliptic Envelope, and Probabilistic Neural Network by significant margins. These findings establish generalizable principles for sequential-pattern-based anomaly detection that extend beyond database security to any domain requiring discrete, sparse, high-dimensional feature representations.

References

[1] Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of intrusion detection systems: techniques, datasets and challenges,” Cybersecurity, vol. 2, no. 1, pp. 1–22, 2019.

https://doi.org/10.1186/s42400-019-0038-7

[2] J. L. Leevy and T. M. Khoshgoftaar, “A survey and analysis of intrusion detection models based on cse-cic-ids2018 big data,” Journal of Big Data, vol. 7, no. 1, p. 104, 2020.

https://doi.org/10.1186/s40537-020-00382-x

[3] R. J. Santos, J. Bernardino, and M. Vieira, “Approaches and challenges in database intrusion detection,” ACM Sigmod Record, vol. 43, no. 3, pp. 36–47, 2014.

https://doi.org/10.1145/2694428.2694435

[4] J. Breier and J. Branisˇova´, “Anomaly detection from log files using data mining techniques,” in Information Science and Applications. Springer, 2015, pp. 449–457.

https://doi.org/10.1007/978-3-662-46578-3_53

[5] A. Kundu, S. Sural, and A. K. Majumdar, “Database intrusion detection using sequence alignment,” International Journal of information security, vol. 9, no. 3, pp. 179–191, 2010.

https://doi.org/10.1007/s10207-010-0102-5

[6] Y. Guo, “A review of machine learning-based zero-day attack detection: Challenges and future directions,” Computer Communications, vol. 198, pp. 175–185, 2023.

https://doi.org/10.1016/j.comcom.2022.11.001

[7] T. Sowmya, “A comprehensive review of ai-based intrusion detection,” Journal of Cybersecurity and Information Systems, vol. XX, pp. XX– XX, 2023.

https://doi.org/10.1016/j.measen.2023.100827

[8] M. R. Keyvanpour, M. B. Shirzad, and S. Mehmandoost, “Cid: A novel clustering-based database intrusion detection algorithm,” Journal of Ambient Intelligence and Humanized Computing, vol. 12, pp. 1601– 1612, 2020.

https://doi.org/10.1007/s12652-020-02231-4

[9] Singh, “Expectation maximization clustering and sequential pattern mining based approach for detecting intrusive transactions in databases,” Multimedia Tools and Applications, 2021.

https://doi.org/10.1007/s11042-021-10786-3

[10] R. Jindal and I. Singh, “A survey on database intrusion detection: approaches, challenges and application,” International Journal of Intelligent Engineering Informatics, vol. 7, no. 6, pp. 559–592, 2019.

https://doi.org/10.1504/IJIEI.2019.104565

[11] W. Zhang and J. P. Lazaro, “A survey on network security traffic analysis and anomaly detection techniques,” International Journal of Emerging Technologies and Advanced Applications, vol. 1, no. 4, pp. 8–16, 2024.

https://doi.org/10.62677/IJETAA.2404117

[12] S. Kumar, S. Gupta, and S. Arora, “Research trends in network-based intrusion detection systems: A review,” IEEE Access, vol. 9, pp. 157 761–157 779, 2021.

https://doi.org/10.1109/ACCESS.2021.3129775

[13] M.-L. Shyu, S.-C. Chen, K. Sarinnapakorn, and L. Chang, “A novel anomaly detection scheme based on principal component classifier,” 2003.

[14] S. Mukkamala, G. Janoski, and A. Sung, “Intrusion detection: support vector machines and neural networks,” in proceedings of the IEEE International Joint Conference on Neural Networks (ANNIE), St. Louis, MO, 2002, pp. 1702–1707.

[15] Y. Hu and B. Panda, “A data mining approach for database intrusion detection,” in Proceedings of the 2004 ACM symposium on Applied computing, 2004, pp. 711–716.

https://doi.org/10.1145/967900.968048

[16] Y. Hu and B. Panda, “Mining inter-transaction data dependencies for database intrusion detection,” in Innovations and Advances in Computer Sciences and Engineering, T. Sobh, Ed. Dordrecht: Springer Netherlands, 2010, pp. 67–72.

https://doi.org/10.1007/978-90-481-3658-2_12

[17] Y. Hu and B. Panda, “Identification of malicious transactions in database systems,” in Seventh International Database Engineering and Applications Symposium, 2003. Proceedings., 2003, pp. 329–335.

https://doi.org/10.1109/IDEAS.2003.1214946

[18] M. Doroudian and H. R. Shahriari, “A hybrid approach for database intrusion detection at transaction and intertransaction levels,” in 2014 6th Conference on Information and Knowledge Technology (IKT). IEEE, 2014, pp. 1–6.

https://doi.org/10.1109/IKT.2014.7030322

[19] U. P. Rao, G. Sahani, and D. R. Patel, “Machine learning proposed approach for detecting database intrusions in rbac enabled databases,” in 2010 second international conference on computing, communication and networking technologies. IEEE, 2010, pp. 1–4.

https://doi.org/10.1109/ICCCNT.2010.5591574

[20] M. Kumar, M. Hanumanthappa, and T. S. Kumar, “Intrusion detection system using decision tree algorithm,” in 2012 IEEE 14th international conference on communication technology. IEEE, 2012, pp. 629–634.

https://doi.org/10.1109/ICCT.2012.6511281

[21] R. K. S. Gautam and E. A. Doegar, “An ensemble approach for intrusion detection system using machine learning algorithms,” in 2018 8th International conference on cloud computing, data science & engineering (confluence). IEEE, 2018, pp. 14–15.

https://doi.org/10.1109/CONFLUENCE.2018.8442693

[22] R. A. R. Ashfaq, X.-Z. Wang, J. Z. Huang, H. Abbas, and Y.-L. He, “Fuzziness based semi-supervised learning approach for intrusion detection system,” Information sciences, vol. 378, pp. 484–497, 2017.

https://doi.org/10.1016/j.ins.2016.04.019

[23] M. Zhang, B. Xu, and J. Gong, “An anomaly detection model based on one-class svm to detect network intrusions,” in 2015 11th International conference on mobile ad-hoc and sensor networks (MSN). IEEE, 2015, pp. 102–107.

https://doi.org/10.1109/MSN.2015.40

[24] Yin, Y. Zhu, J. Fei, and X. He, “A deep learning approach for intrusion detection using recurrent neural networks,” Ieee Access, vol. 5, pp. 21 954–21 961, 2017.

https://doi.org/10.1109/ACCESS.2017.2762418

[25] H. R. Sayegh, W. Dong, and A. M. Al-madani, “Enhanced intrusion detection with lstm-based model, feature selection, and smote for imbalanced data,” Applied Sciences, vol. 14, no. 2, p. 479, 2024.

https://doi.org/10.3390/app14020479

[26] Q. Niyaz, W. Sun, and A. Y. Javaid, “A deep learning based ddos detection system in software-defined networking (sdn),” arXiv preprint arXiv:1611.07400, 2016.

https://doi.org/10.4108/eai.28-12-2017.153515

[27] Sadaf and J. Sultana, “Intrusion detection based on autoencoder and isolation forest in fog computing,” IEEE Access, vol. 8, pp. 167 059– 167 068, 2020.

https://doi.org/10.1109/ACCESS.2020.3022855

[28] E. Caville, W. W. Lo, S. Layeghy, and M. Port-mann, “Anomal-e: A self-supervised network intrusion detection system based on graph neural networks,” Knowledge-Based Systems, vol. 258, p. 110030, 2022.

https://doi.org/10.1016/j.knosys.2022.110030

[29] H. Kamal and M. Mashaly, “Enhanced hybrid deep learning models-based anomaly detection method for two-stage binary and multi-class classification of attacks in intrusion detection systems,” Algorithms, vol. 18, no. 2, p. 69, 2025.

https://doi.org/10.3390/a18020069

[30] Han, J. Pei, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, “Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth,” in proceedings of the 17th international conference on data engineering. IEEE Piscataway, NJ, USA, 2001, pp. 215–224.

[31] Amer, M. Goldstein, and S. Abdennadher, “Enhancing one-class support vector machines for unsupervised anomaly detection,” in Proceedings of the ACM SIGKDD workshop on outlier detection and description, 2013, pp. 8–15.

https://doi.org/10.1145/2500853.2500857

[32] Scho¨lkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural computation, vol. 13, no. 7, pp. 1443–1471, 2001.

https://doi.org/10.1162/089976601750264965

[33] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in 2008 eighth ieee international conference on data mining. IEEE, 2008, pp. 413–422.

https://doi.org/10.1109/ICDM.2008.17

[34] R. C. Ripan, I. H. Sarker, M. M. Anwar, M. H. Furhad, F. Rahat, M. Hoque, and M. Sarfraz, “An isolation forest learning based outlier detection approach for effectively classifying cyber anomalies,” in Hybrid Intelligent Systems, A. Abraham, T. Hanne, O. Castillo, N. Gandhi, T. Nogueira Rios, and T.-P. Hong, Eds. Cham: Springer International Publishing, 2021, pp. 270–279.

https://doi.org/10.1007/978-3-030-73050-5_27

[35] M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander, “Lof: identifying density-based local outliers,” in Proceedings of the 2000 ACM SIGMOD international conference on Management of data, 2000, pp. 93–104.

https://doi.org/10.1145/342009.335388

[36] J. Rousseeuw and K. V. Driessen, “A fast algorithm for the minimum covariance determinant estimator,” Technometrics, vol. 41, no. 3, pp. 212–223, 1999.

https://doi.org/10.1080/00401706.1999.10485670

[37] F. Specht, “Probabilistic neural networks,” Neural networks, vol. 3, no. 1, pp. 109–118, 1990.

https://doi.org/10.1016/0893-6080(90)90049-Q

[38] H. A. Salman, A. Kalakech, and A. Steiti, “Random forest algorithm overview,” Babylonian Journal of Machine Learning, vol. 2024, pp. 69– 79, 2024.

https://doi.org/10.58496/BJML/2024/007

[39] C. Bente´jac, A. Cso¨rgo˝, and G. Mart´ınez-Mun˜oz, “A comparative analysis of gradient boosting algorithms,” Artificial Intelligence Review, vol. 54, no. 3, pp. 1937–1967, 2021.

https://doi.org/10.1007/s10462-020-09896-5

[40] R. Andonie, “Hyperparameter optimization in learning systems,” Journal of Membrane Computing, vol. 1, no. 4, pp. 279–291, 2019.

https://doi.org/10.1007/s41965-019-00023-0

[41] T. Joachims, “Making large-scale svm learning practical,” Technical report, Tech. Rep., 1998.