ai-digest.dev
last updated 4 h ago
CodingarXiv cs.CL 16 d ago

Source-Grounded Data Generation for Text-to-JSON Learning

The article introduces STAGE (Spreadsheet-grounded Text-to-JSON Artifact GEneration), a data generation pipeline designed to create structured JSON outputs from unstructured text using large language models (LLMs). Evaluations on the STAGE-Eval benchmark, which includes 851 examples, demonstrate significant performance improvements for the Qwen3-4B model, with exact match rates increasing from 31.37% to 74.27% and value accuracy rising from 45.46% to 90.69%. This advancement is crucial for practitioners as it enhances the reliability and scalability of training data for text-to-JSON tasks, facilitating better integration of unstructured data into automated systems.

text-to-jsondata generationllmrelevance 0.00 · engagement 0.00
Read at source ↗← all news
Source-Grounded Data Generation for Text-to-JSON Learning — AI News Digest