TABLET

A large-scale Visual Table Understanding dataset with
4 million examples across 21 tasks, grounded in 2 million unique tables where 88% preserve original visualizations. Paper accepted at ICLR 2026.

Paper GitHub Download

Dataset	Examples	Size
TABLET-Small	776,602	98 GB	Download
TABLET-Medium	1,117,217	242 GB	Download
TABLET-Large	3,505,311	1.31 TB	Download
TABLET-test	351,124	16.4 GB	Download
TABLET-dev	210,041	73 GB	Download
TABLET-VisualTableQA	306	123 MB	Download

You can also download the tables in HTML and PNG format directly here.

About the dataset

While table understanding increasingly relies on pixel-only settings where tables are processed as visual representations, current benchmarks predominantly use synthetic renderings that lack the complexity and visual diversity of real-world tables. Additionally, existing visual table understanding (VTU) datasets offer fixed examples with single visualizations and pre-defined instructions, providing no access to underlying serialized data for reformulation. We introduce TABLET, a large-scale VTU dataset with 4 million examples across 20 tasks, grounded in 2 million unique tables where 88% preserve original visualizations. Each example includes paired image-HTML representations, comprehensive metadata, and provenance information linking back to the source datasets. By preserving original visualizations and maintaining example traceability in a unified large-scale collection, TABLET establishes a foundation for robust training and extensible evaluation of future VTU models.

Previous datasets render table images from serialized tables, losing original visual details. In contrast, TABLET locates and retrieves the original table visualizations across 14 tabular datasets, resulting in 4M examples grounded in 2M unique tables.

Here is a breakdown of examples per task and their source dataset.

Tasks & Data Sources

Breakdown of TABLET tasks, seed datasets and example counts.
Task	Seeds	Examples
ent_link	TURL	1,523,904 37.5%
col_type	TURL	628,396 15.4%
struct_aware_parse	PubTabNet,	523,699 12.9%
wikibio	WikiBIO	728,321 17.9%
hybridqa	HybridQA	69,599 1.7%
fetaqa	ToTTo	4,662 0.1%
hitab	NSF,	10,671 0.3%
infotabs	InfoTabs	23,738 0.6%
tabfact	TabFact	112,432 2.8%
tabmwp	TabMWP	38,431 0.9%
tat-qa	TAT-QA	2,756 0.1%
totto	ToTTo	125,095 3.1%
wikitq	WikiTableQuestions	22,033 0.5%
rel_extraction	TURL	64,790 1.6%
table_instruction	InfoTabs,	136,944 3.4%
row_column_extraction	InfoTabs,	8,678 0.2%
table_cell_extraction	InfoTabs,	8,693 0.2%
table_cell_location	InfoTabs,	8,664 0.2%
table_recognition	InfoTabs,	7,839 0.2%
table_size_detection	InfoTabs,	8,750 0.2%
merged_cell_detection	InfoTabs,	8,450 0.2%
visual_table_qa	ToTTo,	306 0.1%
Total		4,066,851

Visual Table QA Benchmark

To evaluate whether models can truly combine visual perception with table understanding, we introduce VisualTableQA, a manually curated benchmark of 306 examples. Human annotators selected tables with high visual complexity and formulated questions that can only be answered by jointly attending to visual cues and tabular structure, for instance, identifying a car by its color or a historical figure by visual attributes in an accompanying image.

You can find VisualTableQA benchmark alongside other tasks in TABLET-test or you can download it directly here.

Authors

**Iñigo Alonso**

School of Informatics

University of Edinburgh

**Imanol Miranda**

HiTZ Center

University of the Basque Country - UPV/EHU

**Eneko Agirre**

HiTZ Center

University of the Basque Country - UPV/EHU

**Mirella Lapata**

School of Informatics

University of Edinburgh

Cite

If you find this dataset useful, please cite it using the following format:

@misc{alonso2025tabletlargescaledatasetrobust,
  title={TABLET: A Large-Scale Dataset for Robust Visual Table Understanding},
  author={Iñigo Alonso and Imanol Miranda and Eneko Agirre and Mirella Lapata},
  year={2025},
  eprint={2509.21205},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2509.21205},
}

Contact

If you need help accessing data or have questions, please contact Iñigo Alonso.