go back
go back
Volume 18, No. 11
Semantic Operators and Their Optimization: Towards AI-Based Data Analytics with Accuracy Guarantees
Abstract
The semantic capabilities of large language models (LLMs) have the potential to enable rich analytics and reasoning over vast knowledge corpora. Unfortunately, existing systems either empirically optimize expensive LLM-powered operations with no performance guarantees , or limit their support to simple batched-inference primitives. We introduce semantic operators , the rst formalism with statistical accuracy guarantees for general-purpose AI-based operations with natural language parameters (e.g., ltering, sorting, joining or aggregating records using natural language criteria). Each operator can be implemented by multiple AI algorithms , which compose individual model invocations to orchestrate the model over the data. Our programming model species the expected behavior of each operator with a high-quality reference algorithm , and we develop an optimization framework that reduces cost, while providing accuracy guarantees for individual operators. Using this approach, we propose several novel optimizations to accelerate semantic ltering, joining, group-by and top-k operations by up to 1 , 000 ⇥ . We implement semantic operators in the LOTUS system and demonstrate LOTUS’ eectiveness on real, bulk-semantic processing applications, including fact-checking, biomedical multilabel classication, search, and topic analysis. We show that the semantic operator model is expressive, capturing state-of-the-art AI pipelines in a few operator calls, and making it easy to express new pipelines that match or exceed quality of recent LLM-based analytic systems by up to 170%, while oering accuracy guarantees. Overall, LOTUS programs match or exceed the accuracy of state-ofthe-art AI pipelines for each task while running up to 3 . 6 ⇥ faster than the highest-quality baselines. LOTUS is publicly available at https://github.com/lotus-data/lotus.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy