Coding
findsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and Embedding
The article introduces findsylls, a language-agnostic toolkit designed for syllable-level speech tokenization and embedding, which integrates various syllable detection methods under a unified interface. It supports syllable segmentation, embedding extraction, and multi-granular evaluation, facilitating controlled comparisons of algorithms and representations. This toolkit is significant for practitioners as it standardizes syllabification processes across diverse languages, enhancing reproducibility and enabling research in both high-resource and under-resourced linguistic contexts.
speech tokenizationembedding