Research
Know Your Limits : On the Faithfulness of LLMs as Solvers and Autoformalizers in Legal Reasoning
This study investigates the faithfulness of Large Language Models (LLMs) in legal reasoning by comparing LLM classification, LLM-based Formal Reasoning, and solver-based Formal Reasoning using the Z3 SMT solver on a re-annotated subset of ContractNLI. The findings indicate that while LLM-based Formal Reasoning achieves the highest benchmark performance, it does not guarantee faithful reasoning, highlighting issues such as scope laundering, implicit constraint blindness, and program synthesis failures. These results underscore a critical gap between benchmark accuracy and logical faithfulness, which is essential for practitioners relying on LLMs for formal reasoning tasks.
llmlegal reasoningfaithfulness