Improving Financial Invoice Workflows with RPA and OCR Using Multimodal Techniques
Main Article Content
Abstract
The global financial process is labor-intensive and time-consuming because it relies heavily on written documents and physical work. By combining CV, NLP techniques, and RPA, we have moved towards higher automation to address this issue. Tasks like document categorization and key information extraction fit well with these advanced solutions. However, challenges arise when analyzing text-rich document images, and a large training dataset is needed to process bilingual documents. To automate business processes using practical financial document models, particularly for banking operations, this study introduces an intelligent document processing framework. This framework utilizes a multimodal approach that combines traditional RPA with a pre-trained deep learning model. The proposed system can effectively analyse multilingual documents and is designed to perform categorization and key information extraction with less training data. Extensive studies using images of Indian financial documents were conducted to assess the framework's effectiveness. The results indicate that the multimodal approach is better at interpreting financial documents, and precise labeling can enhance performance by as much as 15%. This framework has greatly improved the automation and optimization of financial document processing.
Article Details

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.