EEG Speech Emotion Conversion Demo

Comparison of Speech Emotion Conversion Models Based on EEG Emotional Features (Audio Sample Demonstration)

Abstract

Speech emotion conversion aims to transform the emotional expression of source speech into a target emotion. Although recent studies mainly rely on speech-derived emotional representations, the use of electroencephalography (EEG) signals as emotional conditioning inputs remains relatively underexplored. This work presents an EEG-conditioned speech emotion conversion framework that incorporates EEG-derived emotional representations through cross-modal alignment. To address the modality discrepancy between EEG and speech signals, a three-stage training strategy is adopted, including speech-side pretraining, EEG–speech representation alignment, and EEG-conditioned joint optimization. Different EEG emotion encoder architectures are further investigated to model emotion-related neural representations. Experiments on a synchronized EEG–speech dataset show that explicit EEG emotional representation modeling provides more consistent emotional guidance than directly feeding time-aligned EEG features into the emotion conversion model without explicit emotional modeling. Among the evaluated architectures, the CNN+Transformer encoder achieves the best overall performance, yielding the lowest average emotion embedding distance of 0.1958 and the highest direction consistency of 0.4333. Subjective evaluations further demonstrate improved emotional expressiveness and speech naturalness, with an overall mean opinion score (MOS) of 4.1. These results support the feasibility of incorporating EEG signals as emotional conditioning inputs for speech emotion conversion.

Code

https://github.com/roman1115/eeg-sec

Samples

S1(seen)

N2H (Neutral-to-Happy)

neutral
Baseline
CNN+TCN
CNN+Transformer

N2A (Neutral-to-Angry)

neutral
Baseline
CNN+TCN
CNN+Transformer

N2S (Neutral-to-Sad)

neutral
Baseline
CNN+TCN
CNN+Transformer

S2(seen)

N2H (Neutral-to-Happy)

neutral
Baseline
CNN+TCN
CNN+Transformer

N2A (Neutral-to-Angry)

neutral
Baseline
CNN+TCN
CNN+Transformer

N2S (Neutral-to-Sad)

neutral
Baseline
CNN+TCN
CNN+Transformer

U1(unseen)

N2H (Neutral-to-Happy)

neutral
Baseline
CNN+TCN
CNN+Transformer

N2A (Neutral-to-Angry)

neutral
Baseline
CNN+TCN
CNN+Transformer

N2S (Neutral-to-Sad)

neutral
Baseline
CNN+TCN
CNN+Transformer

© 2026 Your Lab / Your Name