TABLET
A large-scale Visual Table Understanding dataset with
4 million examples
across 21 tasks,
grounded in 2 million unique tables where 88% preserve
original visualizations.
Paper accepted at ICLR 2026.
About the dataset
While table understanding increasingly relies on pixel-only settings where tables are processed as visual representations, current benchmarks predominantly use synthetic renderings that lack the complexity and visual diversity of real-world tables. Additionally, existing visual table understanding (VTU) datasets offer fixed examples with single visualizations and pre-defined instructions, providing no access to underlying serialized data for reformulation. We introduce TABLET, a large-scale VTU dataset with 4 million examples across 20 tasks, grounded in 2 million unique tables where 88% preserve original visualizations. Each example includes paired image-HTML representations, comprehensive metadata, and provenance information linking back to the source datasets. By preserving original visualizations and maintaining example traceability in a unified large-scale collection, TABLET establishes a foundation for robust training and extensible evaluation of future VTU models.
Previous datasets render table images from serialized tables, losing original visual details. In contrast, TABLET locates and retrieves the original table visualizations across 14 tabular datasets, resulting in 4M examples grounded in 2M unique tables.
Here is a breakdown of examples per task and their source dataset.
Tasks & Data Sources
| Task | Seeds | Examples |
|---|---|---|
| ent_link | TURL | 1,523,904 37.5% |
| col_type | TURL | 628,396 15.4% |
| struct_aware_parse | PubTabNet, | 523,699 12.9% |
| wikibio | WikiBIO | 728,321 17.9% |
| hybridqa | HybridQA | 69,599 1.7% |
| fetaqa | ToTTo | 4,662 0.1% |
| hitab | NSF, | 10,671 0.3% |
| infotabs | InfoTabs | 23,738 0.6% |
| tabfact | TabFact | 112,432 2.8% |
| tabmwp | TabMWP | 38,431 0.9% |
| tat-qa | TAT-QA | 2,756 0.1% |
| totto | ToTTo | 125,095 3.1% |
| wikitq | WikiTableQuestions | 22,033 0.5% |
| rel_extraction | TURL | 64,790 1.6% |
| table_instruction | InfoTabs, | 136,944 3.4% |
| row_column_extraction | InfoTabs, | 8,678 0.2% |
| table_cell_extraction | InfoTabs, | 8,693 0.2% |
| table_cell_location | InfoTabs, | 8,664 0.2% |
| table_recognition | InfoTabs, | 7,839 0.2% |
| table_size_detection | InfoTabs, | 8,750 0.2% |
| merged_cell_detection | InfoTabs, | 8,450 0.2% |
| visual_table_qa | ToTTo, | 306 0.1% |
| Total | 4,066,851 |
Visual Table QA Benchmark
To evaluate whether models can truly combine visual perception with table understanding, we introduce VisualTableQA, a manually curated benchmark of 306 examples. Human annotators selected tables with high visual complexity and formulated questions that can only be answered by jointly attending to visual cues and tabular structure, for instance, identifying a car by its color or a historical figure by visual attributes in an accompanying image.
You can find VisualTableQA benchmark alongside other tasks in TABLET-test or you can download it directly here.
Cite
If you find this dataset useful, please cite it using the following format:
@misc{alonso2025tabletlargescaledatasetrobust,
title={TABLET: A Large-Scale Dataset for Robust Visual Table Understanding},
author={Iñigo Alonso and Imanol Miranda and Eneko Agirre and Mirella Lapata},
year={2025},
eprint={2509.21205},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.21205},
}
Contact
If you need help accessing data or have questions, please contact Iñigo Alonso.