ResearcharXiv cs.AI — 10 d ago

P3B3: A Multi-Turn Conversational Benchmark for Measuring European and Brazilian Portuguese Variety Bias in LLMs

The article introduces P3B3, a benchmark designed to evaluate regional bias in Large Language Models (LLMs) regarding European and Brazilian Portuguese. It provides a set of curated conversational prompts and an evaluation framework to measure variety bias, revealing a significant preference for Brazilian Portuguese in existing models. This underscores the necessity for practitioners to consider linguistic diversity in training datasets to ensure equitable performance across language variants.

language biasLLMPortugueseP3B3relevance 0.00 · engagement 0.00

Read at source ↗← all news