ai-digest.dev
last updated 3 h ago
ResearcharXiv cs.AI 14 d ago

Toten: Knowledge-Based Ontological Tokenization Of Physical Quantities And Technical Notation In Brazilian Portuguese

The article introduces TOTEN, a knowledge-based ontological tokenization framework designed for Brazilian Portuguese that improves upon traditional Byte-Pair Encoding by incorporating a formal ontology of engineering entities. TOTEN's architecture includes a classification function and an instantiator family, which together enable structured representation of physical quantities and technical notation, achieving unit ontological atomicity and numerical reconstruction scores ranging from 0.775 to 0.904 against eight state-of-the-art baselines. This framework is significant for practitioners as it enhances the semantic understanding of technical content, improving the accuracy and robustness of AI applications in engineering and scientific domains.

tokenizationontologicalbrazilian-portugueserelevance 0.00 · engagement 0.00
Read at source ↗← all news
Toten: Knowledge-Based Ontological Tokenization Of Physical Quantities And Technical Notation In Brazilian Portuguese — AI News Digest