Research
P3B3: A Multi-Turn Conversational Benchmark for Measuring European and Brazilian Portuguese Variety Bias in LLMs
The article introduces P3B3, a benchmark designed to evaluate regional bias in Large Language Models (LLMs) regarding European and Brazilian Portuguese. It provides a set of curated conversational prompts and an evaluation framework to measure variety bias, revealing a significant preference for Brazilian Portuguese in existing models. This underscores the necessity for practitioners to consider linguistic diversity in training datasets to ensure equitable performance across language variants.
language biasLLMPortugueseP3B3