A Hybrid Deep Learning Framework for Robust Deepfake Detection Using CNN, LSTM, and Vision Transformers

Main Article Content

Ashima Gajendra Singh, Pooja Sharma

Abstract

Deepfake technology, powered by advanced deep learning models such as Generative Adversarial Networks (GANs), has enabled the creation of highly realistic synthetic media, including images, audio, and videos. While offering transformative applications in entertainment, education, and communication, deepfakes also present serious threats such as misinformation, political manipulation, privacy invasion, and cybercrime. This study focuses on developing robust detection mechanisms that evolve alongside increasingly sophisticated generative techniques, emphasizing the societal, ethical, and legal implications of synthetic media misuse. The proposed detection framework integrates spatial and temporal deep learning architectures, including convolutional neural networks (ResNet50, Xception, ResNeXt) combined with Long Short-Term Memory (LSTM) networks and Vision Transformers (ViT). This hybrid approach effectively captures visual artifacts and temporal inconsistencies characteristic of deepfakes. The model is trained and evaluated on a diverse dataset compiled from public benchmarks (FaceForensics++, Celeb-DF, Deepfake Detection Challenge) and synthetic samples to improve generalization. Extensive preprocessing, augmentation, and transfer learning enhance the system’s robustness. Experimental results show that longer video sequences improve accuracy, with the hybrid model achieving up to 98.7% detection accuracy. A real-time, user-friendly web interface built on Django supports video uploads, confidence scoring, and frame-level feedback for practical deployment. The research addresses challenges including adversarial attacks, computational efficiency, dataset variability, and ethical considerations such as privacy and misinformation. The system is deployed on scalable cloud infrastructure with GPU acceleration to support real-world applications. Future work will explore multi-modal detection, explainable AI, federated learning for privacy preservation, and blockchain for transparent authentication. This comprehensive and adaptive deepfake detection framework contributes to safeguarding digital media integrity and promoting ethical AI use in an increasingly synthetic digital ecosystem.

Article Details

Section
Articles