Accelerating Tabular Inference: Training Data Generation with TENET

Authors:

Enzo Veltri, Donatello Santoro, Jean-Flavien Bussotti, Paolo Papotti

Download PDF

Abstract

Tabular Natural Language Inference (TNLI) involves machine learning models that assess whether structured tabular data supports or contradicts a hypothesis formulated in natural language. TNLI models typically require large sets of training examples, which are costly to produce manually. In this demonstration, we present Tenet, a system for the automatic generation of training examples for TNLI applications. Existing TNLI training approaches either depend on costly human annotation or generate simplistic examples that lack data diversity and complex reasoning. In contrast, Tenet can start from a small set of manually annotated examples to automatically generate a large and diverse training dataset. Tenet is based on the idea that SQL queries are the right tool for obtaining rich and complex generated examples. To ensure data variety, evidence-queries extract cell values from tables based on diverse data patterns. Once the relevant data are identiﬁed, semantic queries deﬁne diﬀerent ways to interpret it using SQL clauses. These interpretations are then verbalized as text to create annotated examples for TNLI. This demonstration oﬀers an interactive experience where users will be able to select evidence from tabular data, inspect and reﬁne generated queries, and observe how Tenet transforms structured data into natural language hypotheses. By engaging with diﬀerent scenarios, users will see how Tenet enables the rapid creation of high-quality TNLI datasets, leading to inference models with performance comparable to those trained on manually crafted examples.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 18, No. 12

Accelerating Tabular Inference: Training Data Generation with TENET

Abstract