TABLET

A large-scale Visual Table Understanding dataset with
4 million examples across 21 tasks, grounded in 2 million unique tables where 88% preserve original visualizations. Paper accepted at ICLR 2026.


About the dataset

While table understanding increasingly relies on pixel-only settings where tables are processed as visual representations, current benchmarks predominantly use synthetic renderings that lack the complexity and visual diversity of real-world tables. Additionally, existing visual table understanding (VTU) datasets offer fixed examples with single visualizations and pre-defined instructions, providing no access to underlying serialized data for reformulation. We introduce TABLET, a large-scale VTU dataset with 4 million examples across 20 tasks, grounded in 2 million unique tables where 88% preserve original visualizations. Each example includes paired image-HTML representations, comprehensive metadata, and provenance information linking back to the source datasets. By preserving original visualizations and maintaining example traceability in a unified large-scale collection, TABLET establishes a foundation for robust training and extensible evaluation of future VTU models.

Main diagram of TABLET

Previous datasets render table images from serialized tables, losing original visual details. In contrast, TABLET locates and retrieves the original table visualizations across 14 tabular datasets, resulting in 4M examples grounded in 2M unique tables.


Here is a breakdown of examples per task and their source dataset.

Tasks & Data Sources

Breakdown of TABLET tasks, seed datasets and example counts.
Task Seeds Examples
ent_link TURL 1,523,904 37.5%
col_type TURL 628,396 15.4%
struct_aware_parse PubTabNet, 523,699 12.9%
wikibio WikiBIO 728,321 17.9%
hybridqa HybridQA 69,599 1.7%
fetaqa ToTTo 4,662 0.1%
hitab NSF, 10,671 0.3%
infotabs InfoTabs 23,738 0.6%
tabfact TabFact 112,432 2.8%
tabmwp TabMWP 38,431 0.9%
tat-qa TAT-QA 2,756 0.1%
totto ToTTo 125,095 3.1%
wikitq WikiTableQuestions 22,033 0.5%
rel_extraction TURL 64,790 1.6%
table_instruction InfoTabs, 136,944 3.4%
row_column_extraction InfoTabs, 8,678 0.2%
table_cell_extraction InfoTabs, 8,693 0.2%
table_cell_location InfoTabs, 8,664 0.2%
table_recognition InfoTabs, 7,839 0.2%
table_size_detection InfoTabs, 8,750 0.2%
merged_cell_detection InfoTabs, 8,450 0.2%
visual_table_qa ToTTo, 306 0.1%
Total 4,066,851

Visual Table QA Benchmark

To evaluate whether models can truly combine visual perception with table understanding, we introduce VisualTableQA, a manually curated benchmark of 306 examples. Human annotators selected tables with high visual complexity and formulated questions that can only be answered by jointly attending to visual cues and tabular structure, for instance, identifying a car by its color or a historical figure by visual attributes in an accompanying image.

Example from the VisualTableQA benchmark

You can find VisualTableQA benchmark alongside other tasks in TABLET-test or you can download it directly here.

Authors

Iñigo Alonso
Iñigo Alonso
School of Informatics
University of Edinburgh
Imanol Miranda
Imanol Miranda
HiTZ Center
University of the Basque Country - UPV/EHU
Eneko Agirre
Eneko Agirre
HiTZ Center
University of the Basque Country - UPV/EHU
Mirella Lapata
Mirella Lapata
School of Informatics
University of Edinburgh

Cite

If you find this dataset useful, please cite it using the following format:

@misc{alonso2025tabletlargescaledatasetrobust,
  title={TABLET: A Large-Scale Dataset for Robust Visual Table Understanding},
  author={Iñigo Alonso and Imanol Miranda and Eneko Agirre and Mirella Lapata},
  year={2025},
  eprint={2509.21205},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2509.21205},
}

Contact

If you need help accessing data or have questions, please contact Iñigo Alonso.