Research
Parallel Causal Associative Fields: Gated Sparse Memory for Long-Context Language Modeling
The paper introduces the Parallel Causal Associative Field (PCAF), a novel architecture for long-context language modeling that utilizes a parallel content-addressed memory to enhance efficiency and scalability. With 303M parameters and a context length of 2048, PCAF-semantic achieves a perplexity of 36.31 on WikiText-103, outperforming a matched dense Transformer while processing tokens at a rate of 0.61-0.62M tokens/s. This approach allows for sparse long-context access without the limitations of a fixed recurrent state, making it significant for practitioners aiming to optimize performance in large-scale language models.
language modelingtransformerslong-context