Research
From Genes to Tokens: a GWAS-inspired Approach for Interpretable Stylometric Analysis
The paper presents a novel stylometric analysis method inspired by genome-wide association studies (GWAS), utilizing logistic regression to assess the association between "gene" tokens and "phenotype" authorship. The approach was tested on English, German, and Russian corpora, successfully identifying statistically significant lexical markers that differentiate individual authors. This method offers practitioners a new tool for authorship attribution and linguistic analysis, enhancing interpretability in stylometric studies.
stylometricanalysisGWASinterpretation