ICASSP 2026 - 语音对话系统 论文列表

ICASSP 2026 - 语音对话系统 共 10 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 DOMA: Leveraging Diffusion Language Models with Adaptive Pri 8.5分 前25% 🥈 PersonaPlex: Voice and Role Control for Full Duplex Conversa 8.5分 前25% 🥉 UTI-LLM: A Personalized Articulatory-Speech Therapy Assistan 7.5分 前25% 4. A Dataset of Robot-Patient and Doctor-Patient Medical Dialog 7.5分 前25% 5. Game-Time: Evaluating Temporal Dynamics in Spoken Language M 7.5分 前25% 6. The Role of Prosodic and Lexical Cues in Turn-Taking with Se 7.5分 前25% 7. Vocalnet-M2: Advancing Low-Latency Spoken Language Modeling 7.5分 前25% 8. Easy Turn: Integrating Acoustic and Linguistic Modalities fo 7.0分 前25% 9. Still Thinking or Stopped Talking? Dialogue Silence Intentio 6.5分 前25% 10. Enhancing Dialogue-Related Speech Tasks with Generated Spoke 6.5分 前25% 📋 论文详情 🥇 DOMA: Leveraging Diffusion Language Models with Adaptive Prior for Intent Classification and Slot Filling 🔥 8.5/10 | 前25% | #语音对话系统 | #扩散模型 | #意图识别 #槽填充 ...

2026-04-29

ICASSP 2026 - 语音情感识别 论文列表

ICASSP 2026 - 语音情感识别 共 49 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Context-Aware Dynamic Graph Learning for Multimodal Emotion 8.8分 前10% 🥈 Prompt-Guided Mixture-of-Experts for Robust Multimodal Senti 8.5分 前25% 🥉 Clue2Emo: A Brain-Inspired Framework for Open-Vocabulary Mul 8.5分 前25% 4. Attention-Weighted Centered Kernel Alignment for Knowledge D 8.0分 前25% 5. Staged Diffusion with Hybrid Mixture-of-Experts (MOE) for Mu 8.0分 前25% 6. DGSDNet: Dual-Graph Spectral Diffusion Network for Incomplet 8.0分 前25% 7. Graph-based Modality Alignment for Robustness in Conversatio 8.0分 前25% 8. Multimodal Self-Attention Network with Temporal Alignment fo 8.0分 前25% 9. It Is Personal: The Importance of Personalization for Recogn 8.0分 前25% 10. AMBER2: Dual Ambiguity-Aware Emotion Recognition Applied to 8.0分 前25% 11. MI-Fuse: Label Fusion for Unsupervised Domain Adaptation wit 8.0分 前25% 12. Speech Emotion Recognition based on Hierarchical Transformer 8.0分 前25% 13. Affect-Jigsaw: Integrating Core and Peripheral Emotions for 8.0分 前25% 14. When Audio Matters: A Lightweight, Hierarchical Fusion Model 8.0分 前25% 15. Behind the Scenes: Mechanistic Interpretability of Lora-Adap 7.5分 前25% 16. Encoding Emotion Through Self-Supervised Eye Movement Recons 7.5分 前25% 17. Inter-Dialog Contrastive Learning for Multimodal Emotion Rec 7.5分 前25% 18. ADH-VA: Adaptive Directed-Hypergraph Convolution with VA Con 7.5分 前10% 19. SURE: Synergistic Uncertainty-Aware Reasoning for Multimodal 7.5分 前25% 20. Tpeformer: Temporal Patch Embedding Transformer 7.5分 前25% 21. LETPAV: Lexicon-Enhanced Text with Progressive Audio-Visual 7.5分 前25% 22. Multimodal Variational Graph Network for Multimodal Sentimen 7.5分 前25% 23. Diffemotalk: Audio-Driven Facial Animation with Fine-Grained 7.5分 前25% 24. MECap-R1: Emotion-Aware Policy with Reinforcement Learning f 7.5分 前25% 25. FIDIC:Fine-Grained Conversational Emotion Recognition via In 7.5分 前25% 26. Whisper-QF: Leveraging Dual Cross-Attention Q-Former for Spe 7.5分 前25% 27. Temporal Graph Modeling for Speech Emotion Recognition Using 7.5分 前25% 28. Mixture-of-Experts Based Soft-Label Learning for Multi-Label 7.5分 前25% 29. Multi-Channel Speech Enhancement for Cocktail Party Speech E 7.5分 前25% 30. Evaluating Emotion Recognition in Spoken Language Models on 7.5分 前50% 31. InconVAD: A Two-Stage Dual-Tower Framework for Multimodal Em 7.5分 前25% 32. MSF-SER: Enriching Acoustic Modeling with Multi-Granularity 7.5分 前25% 33. Rationale-Guided Learning for Multimodal Emotion Recognition 7.0分 前25% 34. Bimodal Fusion Framework for Dynamic Facial Expression Recog 7.0分 前25% 35. Stress Prediction from Temporal Emotion Trajectories in Clin 7.0分 前25% 36. Emo-TTA: Improving Test-Time Adaptation of Audio-Language Mo 7.0分 前25% 37. Test Time Adaptation for Speech Emotion Recognition 7.0分 前25% 38. Plug-and-Play Emotion Graphs for Compositional Prompting in 7.0分 前25% 39. Reasoning Driven Captions to Assist Noise Robust Speech Emot 7.0分 前25% 40. EmoTri-RL: Emotion- and Cause-Aware Reinforcement Learning f 7.0分 前25% 41. Modeling Both Intra- And Inter-Utterance Variability for Con 6.5分 前25% 42. DDSR-Net: Robust Multimodal Sentiment Analysis via Dynamic M 6.5分 前50% 43. Scaling Ambiguity: Augmenting Human Annotation in Speech Emo 6.5分 前50% 44. Recovering Performance in Speech Emotion Recognition from Di 6.5分 前50% 45. B-GRPO: Unsupervised Speech Emotion Recognition Based on Bat 6.5分 前50% 46. Leveraging Large Speech Language Models as Evaluators for Ex 6.5分 前50% 47. Gen-SER: When the Generative Model Meets Speech Emotion Reco 6.5分 前50% 48. SmoothCLAP: Soft-Target Enhanced Contrastive Language-Audio 6.5分 前50% 49. Acoustic and Facial Markers of Perceived Conversational Succ 6.0分 前50% 📋 论文详情 🥇 Context-Aware Dynamic Graph Learning for Multimodal Emotion Recognition with Missing Modalities 🔥 8.8/10 | 前10% | #语音情感识别 | #多模态模型 | #大语言模型 #多任务学习 ...

2026-04-29

ICASSP 2026 - 语音摘要 论文列表

ICASSP 2026 - 语音摘要 共 1 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Semantic Anchor Transfer from Short to Long Speech in a Dist 7.5分 前25% 📋 论文详情 🥇 Semantic Anchor Transfer from Short to Long Speech in a Distillation-Based Summarization Framework ✅ 7.5/10 | 前25% | #语音摘要 | #知识蒸馏 | #端到端 #迁移学习 👥 作者与机构 第一作者:Xiang He (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心) 通讯作者:Liang He (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心;新疆大学智能科学与技术学院;清华大学电子工程系) 作者列表:Xiang He (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心)、Xuejian Zhao (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心)、Longwei Li (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心)、Liang He (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心;新疆大学智能科学与技术学院;清华大学电子工程系) 💡 毒舌点评 ...

2026-04-29

ICASSP 2026 - 语音活动检测 论文列表

ICASSP 2026 - 语音活动检测 共 5 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Lingometer: On-Device Personal Speech Word Counting System 8.0分 前25% 🥈 EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detect 7.5分 前25% 🥉 Dual Data Scaling for Robust Two-Stage User-Defined Keyword 7.5分 前25% 4. EdgeSpot: Efficient and High-Performance Few-Shot Model for 7.5分 前25% 5. TVP-UNet: Threshold Variance Penalty U-Net for Voice Activit 7.0分 前25% 📋 论文详情 🥇 Lingometer: On-Device Personal Speech Word Counting System 🔥 8.0/10 | 前25% | #语音活动检测 | #端到端 | #低资源 #数据增强 ...

2026-04-29

ICASSP 2026 - 语音理解 论文列表

ICASSP 2026 - 语音理解 共 2 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Exploring Fine-Tuning Of Large Audio Language Models For Spo 8.0分 前25% 🥈 Scaling Spoken Language Models with Syllabic Speech Tokeniza 7.0分 前25% 📋 论文详情 🥇 Exploring Fine-Tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data 🔥 8.0/10 | 前25% | #语音理解 | #迁移学习 | #低资源 #多语言 👥 作者与机构 第一作者:Youngwon Choi (MAUM AI Inc., Republic of Korea) 通讯作者:Huu-Kim Nguyen (∗ 作者列表中标注星号,现单位为 Atmanity Inc., USA) 作者列表: Youngwon Choi (MAUM AI Inc., Republic of Korea) Jaeyoon Jung (MAUM AI Inc., Republic of Korea & Soongsil University, Republic of Korea) Hyeonyu Kim (MAUM AI Inc., Republic of Korea) Huu-Kim Nguyen (MAUM AI Inc., Republic of Korea → 现 Atmanity Inc., USA) Hwayeon Kim (MAUM AI Inc., Republic of Korea) 💡 毒舌点评 ...

2026-04-29

ICASSP 2026 - 语音生成 论文列表

ICASSP 2026 - 语音生成 共 1 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Why Do Speech Language Models Fail to Generate Semantically 7.0分 前25% 📋 论文详情 🥇 Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective ✅ 7.0/10 | 前25% | #语音生成 | #模型评估 | #语音大模型 #零样本 👥 作者与机构 第一作者:Hankun Wang(X-LANCE Lab, 上海交通大学计算机科学与技术学院) 通讯作者:Kai Yu(X-LANCE Lab, 上海交通大学计算机科学与技术学院) 作者列表:Hankun Wang(X-LANCE Lab, 上海交通大学), Haoran Wang(X-LANCE Lab, 上海交通大学), Yiwei Guo(X-LANCE Lab, 上海交通大学), Zhihan Li(X-LANCE Lab, 上海交通大学), Chenpeng Du(X-LANCE Lab, 上海交通大学), Kai Yu(X-LANCE Lab, 上海交通大学) 💡 毒舌点评 ...

2026-04-29

ICASSP 2026 - 语音生物标志物 论文列表

ICASSP 2026 - 语音生物标志物 共 24 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Interval-Aware Retrieval Framework For Speech-Based Automati 8.5分 前25% 🥈 Low-Resource Speech-Based Early Alzheimers Detection via Cro 7.5分 前25% 🥉 Reliable AI via Age-Balanced Validation: Fair Model Selectio 7.5分 前25% 4. Efficient Depression Detection from Speech via Language-Inde 7.5分 前25% 5. Multi-View Hierarchical Hypergraph Neural Network for Automa 7.5分 前25% 6. Evaluating Pretrained Speech Embedding Systems for Dysarthri 7.5分 前50% 7. Optimizing Domain-Adaptive Self-Supervised Learning for Clin 7.0分 前25% 8. Does the Pre-Training of an Embedding Influence its Encoding 7.0分 前50% 9. An Anomaly-Aware and Audio-Enhanced Dual-Pathway Framework f 7.0分 前25% 10. Leveraging Text-to-Speech and Voice Conversion as Data Augme 7.0分 前50% 11. DPT-Net: Dual-Path Transformer Network with Hierarchical Fus 7.0分 前25% 12. CMSA-Mamba: Hierarchical State Space Modeling for Audio-Base 7.0分 前25% 13. Dual Contrastive Learning for Semi-Supervised Domain Adaptat 7.0分 前25% 14. An Unsupervised Alignment Feature Fusion System for Spoken L 7.0分 前25% 15. Modeling Inter-Segment Relationships in Speech for Dementia 7.0分 前25% 16. When Children Talk and Machines Listen: Toward an Interpreta 7.0分 前50% 17. Graph-Biased EEG Transformers for Silent Speech Decoding 6.5分 前25% 18. A Consistent Learning Depression Detection Framework Integra 6.5分 前50% 19. Obstructive Sleep Apnea Endotype Prediction During Wakefulne 6.5分 前50% 20. Cross-Lingual Alzheimer’s Disease Detection with Multimodal 6.5分 前25% 21. Multimodal LLMs as Expert Speech Annotators: Acoustic Macro- 6.5分 前50% 22. Probing Whisper for Dysarthric Speech in Detection and Asses 6.5分 前25% 23. Mixture of Experts for Recognizing Depression from Interview 6.0分 前50% 24. Estimating Hand-Related Features from Speech Using Machine L 5.0分 前50% 📋 论文详情 🥇 Interval-Aware Retrieval Framework For Speech-Based Automatic Alzheimer’s Detection 🔥 8.5/10 | 前25% | #语音生物标志物 | #检索增强生成 | #多模态模型 #迁移学习 ...

2026-04-29

ICASSP 2026 - 语音编码 论文列表

ICASSP 2026 - 语音编码 共 5 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Lisa: Lightweight Yet Superb Neural Speech Coding 8.5分 前25% 🥈 FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via C 8.0分 前25% 🥉 CodecSlime: Temporal Redundancy Compression of Neural Speech 7.5分 前10% 4. Speaking Clearly: A Simplified Whisper-Based Codec for Low-B 7.5分 前25% 5. IBPCodec : A Low-Bitrate Lightweight Speech Codec With Inter 7.0分 前25% 📋 论文详情 🥇 Lisa: Lightweight Yet Superb Neural Speech Coding 🔥 8.5/10 | 前25% | #语音编码 | #信号处理 | #向量量化 #实时处理 ...

2026-04-29

ICASSP 2026 - 语音编码器 论文列表

ICASSP 2026 - 语音编码器 共 1 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Auden-Voice: General-Purpose Voice Encoder for Speech and La 7.5分 前25% 📋 论文详情 🥇 Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding ✅ 7.5/10 | 前25% | #语音编码器 | #多任务学习 | #说话人识别 #副语言理解 👥 作者与机构 第一作者:Mingyue Huo(University of Illinois Urbana-Champaign) 通讯作者:未说明(论文作者列表为三位,未明确标注通讯作者) 作者列表:Mingyue Huo(University of Illinois Urbana-Champaign)、Wei-Cheng Tseng(University of Texas at Austin)、Yiwen Shao(Tencent AI Lab, USA)、Hao Zhang(Tencent AI Lab, USA)、Dong Yu(Tencent AI Lab, USA) 💡 毒舌点评 ...

2026-04-29

ICASSP 2026 - 语音翻译 论文列表

ICASSP 2026 - 语音翻译 共 8 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 MTP-S2UT: Enhancing Speech-to-Speech Translation Quality wit 8.5分 前25% 🥈 ATOM: Adaptive Token-Level Optimal Transport Mixup for Speec 8.0分 前25% 🥉 SEP-ST: Incorporating Speech Entity Prompt Into Large Langua 7.5分 前25% 4. Phrased: Phrase Dictionary Biasing for Speech Translation 7.5分 前25% 5. Direct Transfer of Prosody in Speech-to-speech Translation u 7.5分 前25% 6. PROST-LLM: Progressively Enhancing the Speech-to-Speech Tran 7.5分 前25% 7. Revisiting Direct Speech-to-Text Translation with Speech LLM 7.5分 前50% 8. Direct Simultaneous Translation Activation for Large Audio-L 6.0分 前25% 📋 论文详情 🥇 MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-Token Prediction 🔥 8.5/10 | 前25% | #语音翻译 | #多任务学习 | #语音大模型 #多语言 ...

2026-04-29