语音情感识别 on 语音/音频论文速递

语音情感识别 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E8%AF%AD%E9%9F%B3%E6%83%85%E6%84%9F%E8%AF%86%E5%88%AB/ Recent content in 语音情感识别 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 Acoustic and Facial Markers of Perceived Conversational Success in Spontaneous Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-and-facial-markers-of-perceived/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-and-facial-markers-of-perceived/ 语音情感识别 | 6.0/10 ADH-VA: Adaptive Directed-Hypergraph Convolution with VA Contrastive Learning for Multimodal Conversational Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adh-va-adaptive-directed-hypergraph-convolution/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adh-va-adaptive-directed-hypergraph-convolution/ 语音情感识别 | 7.5/10 Affect-Jigsaw: Integrating Core and Peripheral Emotions for Harmonious Fine-Grained Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-affect-jigsaw-integrating-core-and-peripheral/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-affect-jigsaw-integrating-core-and-peripheral/ 语音情感识别 | 8.0/10 AMBER2: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-amber2-dual-ambiguity-aware-emotion-recognition/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-amber2-dual-ambiguity-aware-emotion-recognition/ 语音情感识别 | 8.0/10 APKD: Aligned And Paced Knowledge Distillation Towards Lightweight Heterogeneous Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-apkd-aligned-and-paced-knowledge-distillation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-apkd-aligned-and-paced-knowledge-distillation/ 情感识别 | 7.5/10 Attention-Weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied To Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-weighted-centered-kernel-alignment-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-weighted-centered-kernel-alignment-for/ 语音情感识别 | 8.0/10 B-GRPO: Unsupervised Speech Emotion Recognition Based on Batched-Group Relative Policy Optimization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-b-grpo-unsupervised-speech-emotion-recognition/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-b-grpo-unsupervised-speech-emotion-recognition/ 语音情感识别 | 6.5/10 Behind the Scenes: Mechanistic Interpretability of Lora-Adapted Whisper for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-behind-the-scenes-mechanistic-interpretability-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-behind-the-scenes-mechanistic-interpretability-of/ 语音情感识别 | 7.5/10 Bimodal Fusion Framework for Dynamic Facial Expression Recognition In-The-Wild https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bimodal-fusion-framework-for-dynamic-facial/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bimodal-fusion-framework-for-dynamic-facial/ 语音情感识别 | 7.0/10 Clue2Emo: A Brain-Inspired Framework for Open-Vocabulary Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-clue2emo-a-brain-inspired-framework-for-open/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-clue2emo-a-brain-inspired-framework-for-open/ 语音情感识别 | 8.5/10 Context-Aware Dynamic Graph Learning for Multimodal Emotion Recognition with Missing Modalities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-context-aware-dynamic-graph-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-context-aware-dynamic-graph-learning-for/ 语音情感识别 | 8.8/10 DDSR-Net: Robust Multimodal Sentiment Analysis via Dynamic Modality Reliability Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsr-net-robust-multimodal-sentiment-analysis-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsr-net-robust-multimodal-sentiment-analysis-via/ 语音情感识别 | 6.5/10 DGSDNet: Dual-Graph Spectral Diffusion Network for Incomplete Multimodal Emotion Recognition in Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dgsdnet-dual-graph-spectral-diffusion-network-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dgsdnet-dual-graph-spectral-diffusion-network-for/ 语音情感识别 | 8.0/10 Diffemotalk: Audio-Driven Facial Animation with Fine-Grained Emotion Control via Diffusion Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffemotalk-audio-driven-facial-animation-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffemotalk-audio-driven-facial-animation-with/ 语音情感识别 | 7.5/10 Do You Hear What I Mean? Quantifying the Instruction-Perception GAP in Instruction-Guided Expressive Text-to-Speech Systems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-you-hear-what-i-mean-quantifying-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-you-hear-what-i-mean-quantifying-the/ 语音合成 | 8.0/10 Dynamic Balanced Cross-Modal Attention with Gated Sequence Restoration: Towards Robust Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-balanced-cross-modal-attention-with-gated/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-balanced-cross-modal-attention-with-gated/ 跨模态 | 7.5/10 ECSA: Dual-Branch Emotion Compensation for Emotion-Consistent Speaker Anonymization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ecsa-dual-branch-emotion-compensation-for-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ecsa-dual-branch-emotion-compensation-for-emotion/ 语音匿名化 | 8.5/10 Emo-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emo-tta-improving-test-time-adaptation-of-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emo-tta-improving-test-time-adaptation-of-audio/ 语音情感识别 | 7.0/10 EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emorl-tts-reinforcement-learning-for-fine-grained/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emorl-tts-reinforcement-learning-for-fine-grained/ 语音合成 | 8.5/10 Emotion-Aligned Generation in Diffusion Text to Speech Models Via Preference-Guided Optimization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotion-aligned-generation-in-diffusion-text-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotion-aligned-generation-in-diffusion-text-to/ 语音合成 | 8.0/10 Emotional Dimension Control in Language Model-Based Text-To-Speech: Spanning a Broad Spectrum of Human Emotions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/ 语音合成 | 7.5/10 EmoTri-RL: Emotion- and Cause-Aware Reinforcement Learning for Multi-Modal Empathetic Dialogue https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotri-rl-emotion-and-cause-aware-reinforcement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotri-rl-emotion-and-cause-aware-reinforcement/ 语音情感识别 | 7.0/10 Encoding Emotion Through Self-Supervised Eye Movement Reconstruction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-encoding-emotion-through-self-supervised-eye/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-encoding-emotion-through-self-supervised-eye/ 语音情感识别 | 7.5/10 Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-emotion-recognition-in-spoken-language/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-emotion-recognition-in-spoken-language/ 语音情感识别 | 7.5/10 Expressive Voice Conversion with Controllable Emotional Intensity https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-expressive-voice-conversion-with-controllable/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-expressive-voice-conversion-with-controllable/ 语音转换 | 7.5/10 FIDIC:Fine-Grained Conversational Emotion Recognition via Individual Differences in Inertia and Contagion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fidicfine-grained-conversational-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fidicfine-grained-conversational-emotion/ 语音情感识别 | 7.5/10 Gen-SER: When the Generative Model Meets Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/ 语音情感识别 | 6.5/10 Graph-based Modality Alignment for Robustness in Conversational Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-based-modality-alignment-for-robustness-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-based-modality-alignment-for-robustness-in/ 语音情感识别 | 8.0/10 ICASSP 2026 - 语音情感识别论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-066/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-066/ 共 49 篇 ICASSP 2026 语音情感识别方向论文 InconVAD: A Two-Stage Dual-Tower Framework for Multimodal Emotion Inconsistency Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inconvad-a-two-stage-dual-tower-framework-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inconvad-a-two-stage-dual-tower-framework-for/ 语音情感识别 | 7.5/10 Input-Adaptive Differentiable Filterbanks via Hypernetworks for Robust Speech Processing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-input-adaptive-differentiable-filterbanks-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-input-adaptive-differentiable-filterbanks-via/ 语音识别 | 7.5/10 Inter-Dialog Contrastive Learning for Multimodal Emotion Recognition in Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inter-dialog-contrastive-learning-for-multimodal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inter-dialog-contrastive-learning-for-multimodal/ 语音情感识别 | 7.5/10 It Is Personal: The Importance of Personalization for Recognizing Self-Reported Emotion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-it-is-personal-the-importance-of-personalization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-it-is-personal-the-importance-of-personalization/ 语音情感识别 | 8.0/10 Korean aegyo speech shows systematic F1 increase to signal childlike qualities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-korean-aegyo-speech-shows-systematic-f1-increase/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-korean-aegyo-speech-shows-systematic-f1-increase/ 语音情感识别 | 6.0/10 LETPAV: Lexicon-Enhanced Text with Progressive Audio-Visual Fusion for Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-letpav-lexicon-enhanced-text-with-progressive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-letpav-lexicon-enhanced-text-with-progressive/ 语音情感识别 | 7.5/10 Leveraging Large Speech Language Models as Evaluators for Expressive Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-large-speech-language-models-as/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-large-speech-language-models-as/ 语音情感识别 | 6.5/10 MECap-R1: Emotion-Aware Policy with Reinforcement Learning for Multimodal Emotion Captioning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mecap-r1-emotion-aware-policy-with-reinforcement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mecap-r1-emotion-aware-policy-with-reinforcement/ 语音情感识别 | 7.5/10 MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large Audio-Language Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mi-fuse-label-fusion-for-unsupervised-domain/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mi-fuse-label-fusion-for-unsupervised-domain/ 语音情感识别 | 8.0/10 Mixture-of-Experts Based Soft-Label Learning for Multi-Label Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-based-soft-label-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-based-soft-label-learning-for/ 语音情感识别 | 7.5/10 ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ml-san-multi-level-speaker-adaptive-network-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ml-san-multi-level-speaker-adaptive-network-for/ 语音情感识别 | 8.0/10 Modeling Both Intra- And Inter-Utterance Variability for Conversational Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-modeling-both-intra-and-inter-utterance/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-modeling-both-intra-and-inter-utterance/ 语音情感识别 | 6.5/10 MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msf-ser-enriching-acoustic-modeling-with-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msf-ser-enriching-acoustic-modeling-with-multi/ 语音情感识别 | 7.5/10 Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-channel-speech-enhancement-for-cocktail/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-channel-speech-enhancement-for-cocktail/ 语音情感识别 | 7.5/10 Multi-View Hierarchical Hypergraph Neural Network for Automatic Stuttering Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-view-hierarchical-hypergraph-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-view-hierarchical-hypergraph-neural-network/ 语音生物标志物 | 7.5/10 Multimodal Self-Attention Network with Temporal Alignment for Audio-Visual Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-self-attention-network-with-temporal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-self-attention-network-with-temporal/ 语音情感识别 | 8.0/10 Multimodal Variational Graph Network for Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-variational-graph-network-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-variational-graph-network-for/ 语音情感识别 | 7.5/10 Plug-and-Play Emotion Graphs for Compositional Prompting in Zero-Shot Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-plug-and-play-emotion-graphs-for-compositional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-plug-and-play-emotion-graphs-for-compositional/ 语音情感识别 | 7.0/10 Prompt-Guided Mixture-of-Experts for Robust Multimodal Sentiment Analysis with Missing Modalities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prompt-guided-mixture-of-experts-for-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prompt-guided-mixture-of-experts-for-robust/ 语音情感识别 | 8.5/10 Rationale-Guided Learning for Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rationale-guided-learning-for-multimodal-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rationale-guided-learning-for-multimodal-emotion/ 语音情感识别 | 7.0/10 Reasoning Driven Captions to Assist Noise Robust Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reasoning-driven-captions-to-assist-noise-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reasoning-driven-captions-to-assist-noise-robust/ 语音情感识别 | 7.0/10 Recovering Performance in Speech Emotion Recognition from Discrete Tokens Via Multi-Layer Fusion and Paralinguistic Feature Integration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recovering-performance-in-speech-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recovering-performance-in-speech-emotion/ 语音情感识别 | 6.5/10 Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-ambiguity-augmenting-human-annotation-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-ambiguity-augmenting-human-annotation-in/ 语音情感识别 | 6.5/10 SmoothCLAP: Soft-Target Enhanced Contrastive Language-Audio Pretraining for Affective Computing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-smoothclap-soft-target-enhanced-contrastive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-smoothclap-soft-target-enhanced-contrastive/ 语音情感识别 | 6.5/10 Speaker Anonymisation for Speech-Based Suicide Risk Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speaker-anonymisation-for-speech-based-suicide/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speaker-anonymisation-for-speech-based-suicide/ 语音匿名化 | 7.5/10 Speech Emotion Recognition based on Hierarchical Transformer with Shifted Windows https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-emotion-recognition-based-on-hierarchical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-emotion-recognition-based-on-hierarchical/ 语音情感识别 | 8.0/10 Staged Diffusion with Hybrid Mixture-of-Experts (MOE) for Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-staged-diffusion-with-hybrid-mixture-of-experts/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-staged-diffusion-with-hybrid-mixture-of-experts/ 语音情感识别 | 8.0/10 Stress Prediction from Temporal Emotion Trajectories in Clinical Patient-Physician Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/ 语音情感识别 | 7.0/10 StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylebench-evaluating-speech-language-models-on/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylebench-evaluating-speech-language-models-on/ 基准测试 | 8.5/10 SURE: Synergistic Uncertainty-Aware Reasoning for Multimodal Emotion Recognition in Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sure-synergistic-uncertainty-aware-reasoning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sure-synergistic-uncertainty-aware-reasoning-for/ 语音情感识别 | 7.5/10 Synthetic yet Striking? Assessing Vocal Charisma in TTS via Perceptual and Algorithmic Measures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-yet-striking-assessing-vocal-charisma/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-yet-striking-assessing-vocal-charisma/ 语音合成 | 7.5/10 Temporal Graph Modeling for Speech Emotion Recognition Using LSTM-Aggregated Multigraph Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-graph-modeling-for-speech-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-graph-modeling-for-speech-emotion/ 语音情感识别 | 7.5/10 Test Time Adaptation for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-adaptation-for-speech-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-adaptation-for-speech-emotion/ 语音情感识别 | 7.0/10 Tpeformer: Temporal Patch Embedding Transformer https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tpeformer-temporal-patch-embedding-transformer/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tpeformer-temporal-patch-embedding-transformer/ 语音情感识别 | 7.5/10 Unrequited Emotions: Investigating the Gaps in Motivation and Practice in Speech Emotion Recognition Research https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unrequited-emotions-investigating-the-gaps-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unrequited-emotions-investigating-the-gaps-in/ 语音情感识别 | 8.0/10 When Audio Matters: A Lightweight, Hierarchical Fusion Model for Speech and Non-Verbal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-audio-matters-a-lightweight-hierarchical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-audio-matters-a-lightweight-hierarchical/ 语音情感识别 | 8.0/10 Whisper-QF: Leveraging Dual Cross-Attention Q-Former for Speech Emotion Recognition With Multi-Task Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-qf-leveraging-dual-cross-attention-q/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-qf-leveraging-dual-cross-attention-q/ 语音情感识别 | 7.5/10 Psychologically-Grounded Graph Modeling for Interpretable Depression Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-psychologically-grounded-graph-modeling-for/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-psychologically-grounded-graph-modeling-for/ 语音情感识别 | 8.0/10 MER 2026: From Discriminative Emotion Recognition to Generative Emotion Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-mer-2026-from-discriminative-emotion-recognition/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-mer-2026-from-discriminative-emotion-recognition/ 语音情感识别 | 6.0/10 Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-prosody-as-supervision-bridging-the-non-verbal/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-prosody-as-supervision-bridging-the-non-verbal/ 语音情感识别 | 8.0/10 MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-move-translating-laughter-and-tears-via-mixture/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-move-translating-laughter-and-tears-via-mixture/ 这篇论文旨在解决语音到语音翻译（S2ST）系统普遍丢失源语音中非语言声音（如笑声、哭声）和情感信息的问题，这严重影响了跨语言交流的自然度和准确性。为此，作者提出了三项核心贡献：首先，设计了一个可扩展的自动化数据合成管道，用于生成大规模、高质量的英中富有表现力S2ST平行语料，克服了训练数据稀缺的瓶颈 Deep Supervised Contrastive Learning of Pitch Contours for Robust Pitch Accent Classification in Seoul Korean https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-deep-supervised-contrastive-learning-of-pitch/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-deep-supervised-contrastive-learning-of-pitch/ 这篇论文旨在解决将连续变化的基频（F0）曲线映射到首尔韩语中离散、不变的音高重音类别（如LHLH, HHLH）这一难题。传统方法易受F0测量噪声和说话人差异的影响。为此，作者提出了**Dual-Glob**，一个深度监督对比学习框架。其核心是通过一个**双分支（干净视图和增强视图）编码器**，在共享 FreezeEmpath: Efficient Training for Empathetic Spoken Chatbots with Frozen LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-freezeempath-efficient-training-for-empathetic/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-freezeempath-efficient-training-for-empathetic/ 本文旨在解决训练共情语音聊天机器人时面临的**共情语音数据稀缺、模型泛化能力弱、以及微调导致LLM通用能力退化**三大难题。作者提出了**FreezeEmpath**，一种高效的端到端训练框架。其核心方法是**冻结基础LLM**，采用**语义-情感解耦编码策略**，通过独立的语义适配器和情感提取器从 Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/ 这篇论文旨在解决低资源多语言语音情感识别（SER）中标注数据稀缺的核心瓶颈。作者提出了一个颠覆性的范式：**将SER重新定义为无监督的“非言语到言语”迁移问题**。其核心假设是，非言语发声（如笑、哭）中蕴含的韵律情感线索比言语更纯粹、更跨语言，因此可以作为更好的监督源。为此，作者设计了**NOVA- SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-self-emo-emotional-self-evolution-from/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-self-emo-emotional-self-evolution-from/ 本文旨在解决对话系统中情感识别（ERC）与情感表达能力受限于高质量标注数据稀缺且静态的问题。**核心贡献**是提出了一个心理学动机的自我进化框架 **SELF-EMO**。**关键方法**是构建一个角色扮演的自博弈范式，使模型同时充当“情绪识别者”和“对话响应者”，并通过一个“生成-筛选-重用”的数