CodingarXiv cs.CL — 16 d ago

Source-Grounded Data Generation for Text-to-JSON Learning

The article introduces STAGE (Spreadsheet-grounded Text-to-JSON Artifact GEneration), a data generation pipeline designed to create structured JSON outputs from unstructured text using large language models (LLMs). Evaluations on the STAGE-Eval benchmark, which includes 851 examples, demonstrate significant performance improvements for the Qwen3-4B model, with exact match rates increasing from 31.37% to 74.27% and value accuracy rising from 45.46% to 90.69%. This advancement is crucial for practitioners as it enhances the reliability and scalability of training data for text-to-JSON tasks, facilitating better integration of unstructured data into automated systems.

text-to-jsondata generationllmrelevance 0.00 · engagement 0.00

Read at source ↗← all news