Research
The Dynamics of Human and AI-Generated Language: How Semantics Fluctuates across Different Timescales
The article introduces a semantic-timescale analysis pipeline that transforms word-level transcripts with timestamps into semantic time-series, enabling the comparison of human and AI-generated speech. It utilizes metrics like semantic specificity derived from WordNet and contextual similarity measured through SBERT embeddings, revealing that segments with longer autocorrelation-window measures (ACW-0) contain more generic vocabulary, while shorter ACW-0 segments are rich in specific words. This methodology offers practitioners a novel approach to analyze the temporal organization of semantic content in LLM outputs, enhancing the understanding of language dynamics in AI systems.
llmlanguagesemanticstime-series