Adver-MDP: An Adaptive Reinforcement Learning Framework for Multi-Step Identity Verification under Adversarial Markov Decision Processes

Main Article Content

Suman Kumar Sanjeev Prasanna

Abstract

Modern identity fraud rarely occurs as a single event; it is a sequential, multi-step process where adversaries adapt tactics based on the system’s defensive responses. This research introduces Adver-MDP, a reinforcement learning framework that models identity verification as an Adversarial Markov Decision Process (MDP), treating verification as a dynamic sequential game between the defender and adaptive adversaries. The framework trains an intelligent agent using Proximal Policy Optimization (PPO) to dynamically adjust verification challenges—including biometric checks, behavioral prompts, and transactional validations—based on the evolving risk state of the interaction. An Opponent Modeling component simulates adversary strategies, allowing the agent to learn robust counter-policies. The reward function explicitly balances security integrity, computational cost, and user friction. Empirical evaluation on both high-throughput simulated environments and real-world identity datasets demonstrates that Adver-MDP reduces successful adversarial penetrations by 43–58% compared to static rule-based protocols, improves sequential verification accuracy from 82% to 95%, and reduces verification latency by 18% while maintaining optimal user experience. These results confirm that modeling identity verification as a dynamic, sequential game is a superior paradigm for defending against sophisticated multi-step identity attacks.

Article Details

Section
Articles