Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation

Authors:

Changlun Li, Chenyu Yang, Yuyu Luo, Ju Fan, Nan Tang

Download PDF

Abstract

Data transformation poses significant challenges due to the wide diversity in input data formats and different requirements. Existing approaches—including human-driven, algorithmic, and large language model (LLM)-based solutions—each exhibits trade-offs in terms of cost, accuracy, and the range of supported transformations. To address these limitations, we propose MegaTran , a novel framework for generating accurate and cost-effective data transformation code. MegaTran employs a two-stage process: Weak2StrongPrompt , which converts a user’s weak prompt (a loosely specified user input) into a strong, structured prompt, and Prompt2Code , which generates transformation code based on this refined prompt. In Weak2StrongPrompt , a fine-tuned lightweight LLM predicts the transformation type and generates a detailed task description from the user’s input. In Prompt2Code , a powerful LLM generates the corresponding transformation code, guided by two key optimizations: (1) Sanity-check Reflection with checklist , which iteratively debugs and refines the code by addressing errors; and (2) LazyRAG , a retrieval-augmented generation technique that retrieves relevant code snippets or documentation from external resources ( e.g., GitHub, DataPrep) to enhance code quality. Extensive experiments show that MegaTran achieves results varying from +2.2% to +26.1% accuracy improvement compared with SoTA methods.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 18, No. 8

Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation

Abstract