Harnessing Big Data Analytics for Advanced Detection of Deepfakes and Cybersecurity Threats Across Industries

Rasheed Afolabi¹, Rianat Abbas^2*, Rajesh Vayyala³, Dorcas Folasade Oyebode⁴, Victoria Abosede Ogunsanya⁵, & Adetomiwa Adesokan⁶¹Department of Information Systems, Baylor University, USA²Department of Information Systems, Baylor University, USA³Data Architecture and Design, PRA Group Inc, USA⁴College of Business, Purdue University Northwest, USA⁵Department of Computer Science, University of Bradford, USA⁶Department of Economics, University of Nevada, Reno, USA
DOI – http://doi.org/10.37502/IJSMR.2025.8208

Full Text – PDF

Abstract

The rise of deepfake technology has introduced a new layer of complexity to cybersecurity, creating opportunities for misuse in areas like misinformation, fraud, and identity theft. These challenges are further amplified by the speed at which deepfakes and other cyber threats evolve, often outpacing traditional detection methods. This study delves into how big data analytics can be harnessed to combat these threats, using advanced machine learning models like gradient boosting to detect malicious patterns in large-scale datasets. Key insights reveal that features such as packet length and flow timing are critical in differentiating between web-based attacks and botnet activities. The model demonstrates strong performance, achieving a high AUC-ROC score of 0.97, showcasing its ability to identify and classify threats effectively.

However, the work also highlights challenges, including the need for more computational efficiency, diverse datasets, and adaptability to rapidly changing attack methods. Despite these hurdles, the integration of big data analytics into cybersecurity frameworks shows immense promise, providing scalable and real-time solutions across industries. Moving forward, collaboration across fields and a focus on ethical data practices will be vital to ensuring these technologies are both effective and trustworthy in the fight against emerging cyber risks.

Keywords: Deepfakes, cybersecurity, big data analytics, machine learning, anomaly detection, cyber threats, data privacy, and advanced detection techniques.

References

Agarwal, S., Chen, J., & Prakash, R. (2020). A Deep Hierarchical Network for Packet-Level Malicious Traffic Detection. IEEE Access, 8(1), 224532-224543.
Carlini, N., & Wagner, D. (2017). Towards evaluating the robustness of neural networks. Proceedings of the 2017. IEEE Symposium on Security and Privacy (SP),, 39-57.
Cheng, Q., Wu, C., Zhou, H., Kong, D., Zhang, D., Xing, J., & Ruan, W. (2021). Machine Learning based Malicious Payload Identification in Software-Defined Networking. arXiv preprint arXiv:.
Chesney, R., & Citron, D. K. (2019). Deepfakes and cheap fakes: The manipulation of audio and visual evidence. Data & Society.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., . . . Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems,. Information Technology Reviewa, 27(1), 2672-2680.
Heidari, A., Jafari Navimipour, N., Dag, H., & Unai, M. (2022). Deepfake detection using deep learning methods: A systematic and comprehensive review. Wiley Interdisciplinary Review in Data Mining, Knowledge, and Discovery, 14(1), e1520.
Hwang, R. H., Peng, M. C., Nguyen, V. L., & Chang, Y. L. (2019). An LSTM-Based Deep Learning Approach for Classifying Malicious Traffic at the Packet Level. Journal of Applied Science, 9(16), 3414.
Kumar, M., & Kundu, A. (2024). Secure Vision: Advanced Cybersecurity Deepfake Detection with Big Data Analytics. Journal of Sensors, 24(19), 6300.
Kundu, A., & Kumar, N. (2024). Cyber Security Focused Deepfake Detection System Using Big Data. Journal of Computer Science, 5(6), 752-. doi: https://doi.org/10.1007/s42979-024-03105-8
Li, Y., Chang, M.-C., & Lyu, S. (2020). Deepfake detection: Current challenges and next steps. Proceedings of the IEEE/CVF. International Conference on Computer Vision Workshops (ICCVW), 4471-4480.
Nguyen, T. T., Nguyen, C. M., Nguyen, D. T., Nguyen, D. T., & Nahavandi, S. (2019). Deep learning for deepfakes creation and detection: A survey. . arXiv preprint arXiv:1909.11573.
Orlikowski, W. J., & Gash, D. C. (1994). Technological frames: Making sense of information technology in organizations. ACM Transactions on Information Systems, 12(2), 174-207.
Paris, B., & Donovan, J. (2019). Deepfakes and the new disinformation war: The coming age of post-truth geopolitics. Foreign Affairs Journal, 98(1), 147-155.
Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & NieBner, M. (2019). “FaceForensics++: Learning to Detect Manipulated Facial Images”. Proceedings of the IEEE/CVF International Conference on Computer Vision, 1-11.
Suwajanakorn, S., Seitz, S. M., & Kemelmacher-Shlizerman, I. (2017). Synthesizing Obama: Learning lip sync from audio. ACM Transactions on Graphics (TOG). Information System, 36(4), 1-13.
Verdoliva, L. (2020). FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces. arXiv preprint arXiv:1803.09179.
Yamagishi, J., Wang, X., Todisco, M., Sahidullah, M., Patino, J., Nautsch, A., . . . Evans, N. (2021). Accelerating progress in spoofed and deepfake speech detection. Proceedings of the ASVspoof, 1(1), 1-6.