news 2026/3/16 9:01:16

VISTA-Bench Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
VISTA-Bench Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text

VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

Authors:Qing’an Liu, Juntong Feng, Yuhao Wang, Xinzhe Han, Yujie Cheng, Yue Zhu, Haiwen Diao, Yunzhi Zhuge, Huchuan Lu

Deep-Dive Summary:
Error: PDF not downloaded. Cannot generate detailed summary.

Original Abstract:Vision-Language Models (VLMs) have achieved impressive performance in cross-modal understanding across textual and visual inputs, yet existing benchmarks predominantly focus on pure-text queries. In real-world scenarios, language also frequently appears as visualized text embedded in images, raising the question of whether current VLMs handle such input requests comparably. We introduce VISTA-Bench, a systematic benchmark from multimodal perception, reasoning, to unimodal understanding domains. It evaluates visualized text understanding by contrasting pure-text and visualized-text questions under controlled rendering conditions. Extensive evaluation of over 20 representative VLMs reveals a pronounced modality gap: models that perform well on pure-text queries often degrade substantially when equivalent semantic content is presented as visualized text. This gap is further amplified by increased perceptual difficulty, highlighting sensitivity to rendering variations despite unchanged semantics. Overall, VISTA-Bench provides a principled evaluation framework to diagnose this limitation and to guide progress toward more unified language representations across tokenized text and pixels. The source dataset is available at https://github.com/QingAnLiu/VISTA-Bench.

PDF Link:2602.04802v1

部分平台可能图片显示异常,请以我的博客内容为准

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/3/16 8:21:34

当AI深植企业生产:Java凭何成为落地关键?

随着人工智能技术从实验室走向企业生产环境,一场关于软件架构与业务模式的变革正在悄然发生。企业不再满足于AI作为辅助工具的角色,而是期望其深度融入核心业务系统,实现从“内容生成”到“服务重塑”的跨越。在这场变革中,Java凭…

作者头像 李华
网站建设 2026/3/16 5:54:57

基于SSM+JSP银行账户管理系统的设计与实现

项目说明 随着金融科技的快速发展和银行业务的不断创新,传统的手工账户管理方式已经无法满足现代银行业务的需求。传统的账户管理存在效率低下、差错率高、安全性不足等问题,严重影响了银行的服务质量和运营效率,为了提高银行业务的现代化进程…

作者头像 李华
网站建设 2026/3/12 17:49:11

宾夕法尼亚州立大学团队:让AI系统学会自己进化

这项由宾夕法尼亚州立大学、亚马逊和杜克大学联合开展的研究发表于2026年2月,论文编号为arXiv:2602.00359v1,有兴趣深入了解的读者可以通过该编号查询完整论文。这项研究提出了一个让人眼前一亮的想法:让AI系统学会自己进化,就像生…

作者头像 李华
网站建设 2026/3/12 18:33:42

Qt之多线程和并发_P2

在使用多线程时需要时刻注意一点的时,多个线程在访问同一个资源时会抢夺资源,造成数据不一致,严重影响程序结果甚至崩溃。为了防止竞态条件的发生,使用多线程时需要实现线程同步,也即确保多个线程在同时使用共享资源时不会发生冲突或数据不一致。 Qt提供了互斥锁、信号量、…

作者头像 李华