go back
go back
Volume 18, No. 8
Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation
Abstract
Data transformation poses significant challenges due to the wide diversity in input data formats and different requirements. Existing approaches—including human-driven, algorithmic, and large language model (LLM)-based solutions—each exhibits trade-offs in terms of cost, accuracy, and the range of supported transformations. To address these limitations, we propose MegaTran , a novel framework for generating accurate and cost-effective data transformation code. MegaTran employs a two-stage process: Weak2StrongPrompt , which converts a user’s weak prompt (a loosely specified user input) into a strong, structured prompt, and Prompt2Code , which generates transformation code based on this refined prompt. In Weak2StrongPrompt , a fine-tuned lightweight LLM predicts the transformation type and generates a detailed task description from the user’s input. In Prompt2Code , a powerful LLM generates the corresponding transformation code, guided by two key optimizations: (1) Sanity-check Reflection with checklist , which iteratively debugs and refines the code by addressing errors; and (2) LazyRAG , a retrieval-augmented generation technique that retrieves relevant code snippets or documentation from external resources ( e.g., GitHub, DataPrep) to enhance code quality. Extensive experiments show that MegaTran achieves results varying from +2.2% to +26.1% accuracy improvement compared with SoTA methods.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy