
Russian scientists aim to train artificial intelligence to detect fraud involving counterfeit documents; to achieve this, they compiled a dataset comprising 8,000 images of fake identification papers from Russia, the CIS, and other nations, as reported to RIA Novosti by Smart Engines.
“Researchers at the Russian AI firm Smart Engines have unveiled the nation’s inaugural large-scale dataset of fraudulent documents, named MIDV-DM. This collection incorporates 8,000 images featuring identity documents from the Russian Federation, CIS countries, and other parts of the world. The samples included in this set were generated using the most prevalent forgery techniques employed by fraudsters. This dataset will empower developers globally to train, validate, and enhance their AI-powered anti-fraud systems,” the company stated.
The foundation of this dataset consists of examples drawn from the internal Russian passport, plus national passports and ID cards originating from Azerbaijan, Latvia, Estonia, Finland, and several other states. Furthermore, the developers incorporated various document manipulations, such as superimposing text fields or photographs sourced from a “donor” document, obscuring specific document sections, fusing disparate fragments into a single image, and inserting unauthorized elements like emblems, holograms, and so forth.
This compilation of data, the company noted, is set to boost the accuracy of anti-fraud solutions, a critical development given the recent surge in fraudulent activities utilizing falsified documents. According to data from the annual study conducted by Smart Engines specialists and the legal firm Intellect, the volume of criminal cases related to falsification, manufacturing, and circulation of forged documents (under Article 327 of the Russian Criminal Code) escalated by 34%, reaching 3.9 thousand cases in 2024.
“The dataset includes forgeries involving alterations to signatures, holder photographs, and individual document fields—representing the full spectrum of typical attacks currently encountered in practice by banks, microfinance organizations, and government agencies. This will enable AI systems to learn to identify not only substituted personal data but also complex structural inconsistencies within documents with greater precision,” commented Vladimir Arlazarov, CEO of Smart Engines and Doctor of Technical Sciences.
Moving forward, the company intends to further develop its proprietary anti-fraud system, “Sherlock 2o,” which is engineered to simultaneously process document images across optical, ultraviolet, and infrared spectrums, alongside textual fields, data from NFC chips, barcodes, metadata, and digital signatures.