Authors: Mehrdad Dadgostar, Lindsay C. Hanford, Maryam Tavakoli, Steven E. Arnold, David H. Salat, Tatiana Sitnikova, Pia Kivisakk Webb, Jordan R. Green, Hengru Liu, Brian D. Richburg, Mariam Tkeshelashvili, Marziye Eshghi

Abstract

INTRODUCTION We tested whether spontaneous speech acoustics provide a scalable digital marker of biologically defined Alzheimer’s disease (AD) risk.

METHODS Forty-nine cognitively unimpaired older adults were stratified within APOE genotype into Low-, Moderate-, and High-Risk groups based on log₁₀-transformed plasma p-tau217. Acoustic features were extracted from spontaneous speech and entered into multiclass SVM classifiers with leave-one-out cross-validation, with and without genetic-algorithm feature selection and age. Parallel models using neuropsychological measures were evaluated for comparison. Feature contributions were interpreted using SHAP.

RESULTS Speech-based models substantially outperformed cognition-only models and exceeded chance performance for three-group classification (33.3%), achieving up to 77% accuracy compared with 47% for neuropsychological models. SHAP analyses identified a compact, stage-dependent acoustic signature dominated by voice-quality, spectral-envelope, and formant-bandwidth features, with age contributing secondary effects.

DISCUSSION Spontaneous speech acoustics capture p-tau217/APOE-defined AD risk despite preserved cognition, supporting speech as a scalable, biologically grounded biomarker for preclinical AD risk stratification.

doi: https://doi.org/10.64898/2026.01.15.26344226