Mo hinh AI.jpg
AI transformation is accelerating across Vietnam, with OCR technology playing a key role in digitalizing documents. Photo: Illustration

CMC’s research team has announced that its AI model, CATI-VLM, has placed in the Top 12 worldwide and ranked No. 1 in Vietnam in the Document Visual Question Answering (DocVQA) category of the Robust Reading Competition (RRC), as revealed in June 2025.

Developed by the CMC Applied Technology Institute (CMC ATI), the CATI-VLM model (Visual Document Understanding) was trained using a massive 5TB dataset. The achievement marks a significant milestone in Vietnam’s AI innovation and research efforts.

“We are thrilled that CMC’s research capabilities have been recognized on such a prestigious global platform as the RRC,” said Dang Minh Tuan, Director of CMC ATI. “We are proud to have reached this level in a short time, competing alongside leading international institutions. More importantly, this demonstrates our ability to master technology and tackle Vietnamese-specific and sectoral challenges.”

As digital transformation surges in Vietnam, AI adoption has gained significant momentum. Optical character recognition (OCR) technologies are becoming vital in digitizing documents, automating workflows, reducing costs, and improving management efficiency.

However, with the complexity of Vietnamese - marked by tonal diacritics and handwritten scripts - text recognition requires more than simply reading characters. It demands full contextual understanding.

CATI-VLM sets itself apart from traditional OCR by not only extracting characters but also interpreting multiple layers of information. This includes textual content, non-textual elements (checkboxes, charts, signatures, formulas), layout structures (pages, tables, forms), and style components (fonts, highlights).

The model can visually respond to questions posed about document images - akin to ChatGPT - without prior exposure to specific form templates.

The Robust Reading Competition is a prestigious global scientific event organized by the Computer Vision Center at the Universitat Autònoma de Barcelona (UAB), a world-renowned institution in the field of computer vision.

Since its inception in 2011, the competition has been closely linked to the International Conference on Document Analysis and Recognition (ICDAR) - one of the world’s largest forums for document analysis and computer vision. It consistently attracts researchers and engineers from esteemed universities, research institutes, and leading tech companies, including Tsinghua University, Hyundai Motor Group, and Tencent.

RRC tasks are designed to drive technological innovation and address real-world problems, spanning applications such as translation, enterprise data management, urban analysis, and historical document processing.

Thai Khang