ICASSP 2026 - 语音生物标志物 论文列表

ICASSP 2026 - 语音生物标志物 共 24 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Interval-Aware Retrieval Framework For Speech-Based Automati 8.5分 前25% 🥈 Low-Resource Speech-Based Early Alzheimers Detection via Cro 7.5分 前25% 🥉 Reliable AI via Age-Balanced Validation: Fair Model Selectio 7.5分 前25% 4. Efficient Depression Detection from Speech via Language-Inde 7.5分 前25% 5. Multi-View Hierarchical Hypergraph Neural Network for Automa 7.5分 前25% 6. Evaluating Pretrained Speech Embedding Systems for Dysarthri 7.5分 前50% 7. Optimizing Domain-Adaptive Self-Supervised Learning for Clin 7.0分 前25% 8. Does the Pre-Training of an Embedding Influence its Encoding 7.0分 前50% 9. An Anomaly-Aware and Audio-Enhanced Dual-Pathway Framework f 7.0分 前25% 10. Leveraging Text-to-Speech and Voice Conversion as Data Augme 7.0分 前50% 11. DPT-Net: Dual-Path Transformer Network with Hierarchical Fus 7.0分 前25% 12. CMSA-Mamba: Hierarchical State Space Modeling for Audio-Base 7.0分 前25% 13. Dual Contrastive Learning for Semi-Supervised Domain Adaptat 7.0分 前25% 14. An Unsupervised Alignment Feature Fusion System for Spoken L 7.0分 前25% 15. Modeling Inter-Segment Relationships in Speech for Dementia 7.0分 前25% 16. When Children Talk and Machines Listen: Toward an Interpreta 7.0分 前50% 17. Graph-Biased EEG Transformers for Silent Speech Decoding 6.5分 前25% 18. A Consistent Learning Depression Detection Framework Integra 6.5分 前50% 19. Obstructive Sleep Apnea Endotype Prediction During Wakefulne 6.5分 前50% 20. Cross-Lingual Alzheimer’s Disease Detection with Multimodal 6.5分 前25% 21. Multimodal LLMs as Expert Speech Annotators: Acoustic Macro- 6.5分 前50% 22. Probing Whisper for Dysarthric Speech in Detection and Asses 6.5分 前25% 23. Mixture of Experts for Recognizing Depression from Interview 6.0分 前50% 24. Estimating Hand-Related Features from Speech Using Machine L 5.0分 前50% 📋 论文详情 🥇 Interval-Aware Retrieval Framework For Speech-Based Automatic Alzheimer’s Detection 🔥 8.5/10 | 前25% | #语音生物标志物 | #检索增强生成 | #多模态模型 #迁移学习 ...

2026-04-29

ICASSP 2026 - 语音编码 论文列表

ICASSP 2026 - 语音编码 共 5 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Lisa: Lightweight Yet Superb Neural Speech Coding 8.5分 前25% 🥈 FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via C 8.0分 前25% 🥉 CodecSlime: Temporal Redundancy Compression of Neural Speech 7.5分 前10% 4. Speaking Clearly: A Simplified Whisper-Based Codec for Low-B 7.5分 前25% 5. IBPCodec : A Low-Bitrate Lightweight Speech Codec With Inter 7.0分 前25% 📋 论文详情 🥇 Lisa: Lightweight Yet Superb Neural Speech Coding 🔥 8.5/10 | 前25% | #语音编码 | #信号处理 | #向量量化 #实时处理 ...

2026-04-29

ICASSP 2026 - 语音编码器 论文列表

ICASSP 2026 - 语音编码器 共 1 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Auden-Voice: General-Purpose Voice Encoder for Speech and La 7.5分 前25% 📋 论文详情 🥇 Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding ✅ 7.5/10 | 前25% | #语音编码器 | #多任务学习 | #说话人识别 #副语言理解 👥 作者与机构 第一作者:Mingyue Huo(University of Illinois Urbana-Champaign) 通讯作者:未说明(论文作者列表为三位,未明确标注通讯作者) 作者列表:Mingyue Huo(University of Illinois Urbana-Champaign)、Wei-Cheng Tseng(University of Texas at Austin)、Yiwen Shao(Tencent AI Lab, USA)、Hao Zhang(Tencent AI Lab, USA)、Dong Yu(Tencent AI Lab, USA) 💡 毒舌点评 ...

2026-04-29

ICASSP 2026 - 语音翻译 论文列表

ICASSP 2026 - 语音翻译 共 8 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 MTP-S2UT: Enhancing Speech-to-Speech Translation Quality wit 8.5分 前25% 🥈 ATOM: Adaptive Token-Level Optimal Transport Mixup for Speec 8.0分 前25% 🥉 SEP-ST: Incorporating Speech Entity Prompt Into Large Langua 7.5分 前25% 4. Phrased: Phrase Dictionary Biasing for Speech Translation 7.5分 前25% 5. Direct Transfer of Prosody in Speech-to-speech Translation u 7.5分 前25% 6. PROST-LLM: Progressively Enhancing the Speech-to-Speech Tran 7.5分 前25% 7. Revisiting Direct Speech-to-Text Translation with Speech LLM 7.5分 前50% 8. Direct Simultaneous Translation Activation for Large Audio-L 6.0分 前25% 📋 论文详情 🥇 MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-Token Prediction 🔥 8.5/10 | 前25% | #语音翻译 | #多任务学习 | #语音大模型 #多语言 ...

2026-04-29

ICASSP 2026 - 语音表示学习 论文列表

ICASSP 2026 - 语音表示学习 共 1 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Phonological Tokenizer: Prosody-Aware Phonetic Token Via Mul 8.0分 前25% 📋 论文详情 🥇 Phonological Tokenizer: Prosody-Aware Phonetic Token Via Multi-Objective Fine-Tuning with Differentiable K-Means 🔥 8.0/10 | 前25% | #语音表示学习 | #离散token | #多任务学习 #自监督学习 👥 作者与机构 第一作者:Kentaro Onda(东京大学, 索尼集团) 通讯作者:未说明 作者列表:Kentaro Onda(东京大学, 索尼集团)、Hayato Futami(索尼集团)、Yosuke Kashiwagi(索尼集团)、Emiru Tsunoo(索尼集团)、Shinji Watanabe(卡内基梅隆大学) 💡 毒舌点评 这篇论文的亮点在于其巧妙地利用多目标优化和可微分k-means,在理论上“纯净”的语音学token和“丰富”的声学token之间找到了一个实用且性能优异的平衡点,尤其在情感识别和语音转换等韵律敏感任务上取得了显著提升。然而,其短板在于对“不同iable k-means”这一核心工具的离散化本质在端到端训练中可能带来的优化挑战(如梯度估计方差)探讨不足,且虽然声码器使用了预训练说话人编码器进行条件化以“剥离”说话人信息,但这种剥离是否彻底以及对下游任务的潜在影响分析不够深入。 🔗 开源详情 代码:论文中未提及代码仓库链接。方法基于ESPnet工具包实现。 模型权重:未提及是否公开微调后的模型权重。 数据集:使用了VCTK, LibriSpeech, RAVDESS, VoxCeleb, LJSpeech, TIMIT, Expresso, LibriLight等公开数据集,获取方式见各自官网。 Demo:提供了在线演示网站:https://ondatk68.github.io/onda-demo/projects/phonological-tokenizer。 复现材料:给出了部分训练细节(如两阶段训练、学习率、epoch数、α值),但未提供完整的配置文件、检查点或详细的超参数列表。 论文中引用的开源项目:ESPnet, HiFi-GAN(ParallelWaveGAN), ECAPA-TDNN(SpeechBrain), WavLM, Qwen2.5, Llama-3.2等。 📌 核心摘要 ...

2026-04-29

ICASSP 2026 - 语音解码 论文列表

ICASSP 2026 - 语音解码 共 1 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 A Robust Multi-Scale Framework with Test-Time Adaptation for 7.5分 前25% 📋 论文详情 🥇 A Robust Multi-Scale Framework with Test-Time Adaptation for sEEG-Based Speech Decoding ✅ 7.5/10 | 前25% | #语音解码 | #领域适应 | #脑机接口 #多尺度特征学习 👥 作者与机构 第一作者:Yang-yang Li(南京理工大学计算机科学与工程学院;香港中文大学(深圳)数据科学学院、人工智能学院) 通讯作者:Siqi Cai(哈尔滨工业大学(深圳)智能科学与工程学院、人工智能学院) 作者列表:Yang-yang Li(南京理工大学计算机科学与工程学院;香港中文大学(深圳)数据科学学院、人工智能学院)、Suli Wang(达姆施塔特工业大学计算机科学系;香港中文大学(深圳)数据科学学院、人工智能学院)、Siqi Cai(哈尔滨工业大学(深圳)智能科学与工程学院、人工智能学院)、Haizhou Li(香港中文大学(深圳)数据科学学院、人工智能学院) 💡 毒舌点评 这篇论文的亮点在于直面sEEG信号解码的核心痛点——非平稳性导致的域偏移,并提出了一个逻辑清晰、组件有效的“先强化表示,再在线适应”的两阶段解决方案,在公开数据集上确实取得了显著的性能提升。其短板在于实验仅在一个数据集(DU-IN)上验证,且模型大小(5.964M)在BCI植入式应用场景下可能偏大,论文对模型轻量化和实时推理的考量不足,临床转化的可行性论证略显单薄。 🔗 开源详情 代码:论文提供了代码仓库链接:https://github.com/lyyi599/MDM-Tent。但未说明代码是否已发布,或仅为占位页面。 模型权重:论文中未提及是否提供预训练模型权重。 数据集:实验使用了公开的DU-IN数据集,论文中未提供其具体获取方式,但暗示读者可参考原始研究。 Demo:论文中未提及在线演示。 复现材料:论文中部分训练细节(如优化器、学习率、batch size)未说明。消融实验的完整结果可在提供的GitHub链接中获取。 论文中引用的开源项目:论文引用了多个基线模型的开源实现或相关工作,如DU-IN、EEGNet、Tent等。 📌 核心摘要 ...

2026-04-29

ICASSP 2026 - 语音评估 论文列表

ICASSP 2026 - 语音评估 共 5 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Mispronunciation Detection and Diagnosis Without Model Train 8.0分 前25% 🥈 Matrix-Structured Hierarchical Convolutional Modeling for Pr 8.0分 前25% 🥉 Reference-Aware SFM Layers for Intrusive Intelligibility Pre 7.5分 前10% 4. Session-Level Spoken Language Assessment with A Multimodal F 7.5分 前25% 5. Fine-Tuning Large Multimodal Models for Automatic Pronunciat 7.0分 前50% 📋 论文详情 🥇 Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach 🔥 8.0/10 | 前25% | #语音评估 | #检索增强 | #预训练 #零样本 ...

2026-04-29

ICASSP 2026 - 语音识别 #语音合成 论文列表

ICASSP 2026 - 语音识别 #语音合成 共 1 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 TAGARELA - A Portuguese Speech Dataset from Podcasts 7.0分 前25% 📋 论文详情 🥇 TAGARELA - A Portuguese Speech Dataset from Podcasts ✅ 7.0/10 | 前25% | #语音识别 #语音合成 | #预训练 | #语音识别 #语音合成 👥 作者与机构 第一作者:Frederico Santos de Oliveira(Federal University of Mato Grosso (UFMT)) 通讯作者:未说明 作者列表:Frederico Santos de Oliveira (UFMT), Lucas Rafael Stefanel Gris (UFG), Alef Iury Siqueira Ferreira (UFG), Augusto Seben da Rosa (UNESP), Alexandre Costa Ferro Filho (UFG), Edresson Casanova (NVIDIA), Christopher Dane Shulby (Elsa Speak), Rafael Teixeira Sousa (UFMT), Diogo Fernandes Costa Silva (UFG), Anderson da Silva Soares (UFG), Arlindo Rodrigues Galvão Filho (UFG) 💡 毒舌点评 ...

2026-04-29

ICASSP 2026 - 语音识别 #语音翻译 论文列表

ICASSP 2026 - 语音识别 #语音翻译 共 3 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 LESS: Large Language Model Enhanced Semi-Supervised Learning 7.5分 前25% 🥈 Equipping Large Language Model with Directional Speech Under 7.0分 前50% 🥉 Joint Autoregressive Modeling of Multi-Talker Overlapped Spe 7.0分 前25% 📋 论文详情 🥇 LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data ✅ 7.5/10 | 前25% | #语音识别 #语音翻译 | #半监督学习 #大语言模型 | #语音识别 #语音翻译 ...

2026-04-29

ICASSP 2026 - 语音识别 论文列表

ICASSP 2026 - 语音识别 共 102 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Towards Robust Dysarthric Speech Recognition: LLM-Agent Post 9.0分 前25% 🥈 Target-Speaker LLM-ASR with Speaker-Aware Speech Encoder 8.8分 前10% 🥉 SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper 8.5分 前25% 4. Scaling Multi-Talker ASR with Speaker-Agnostic Activity Stre 8.5分 前25% 5. Improving Contextual Asr Via Multi-Grained Fusion With Large 8.5分 前25% 6. OMNI-AVSR: Towards Unified Multimodal Speech Recognition Wit 8.5分 前10% 7. AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Sp 8.3分 前25% 8. Polynomial Mixing for Efficient Self-Supervised Speech Encod 8.0分 前25% 9. GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialecta 8.0分 前25% 10. Voting-Based Pitch Estimation with Temporal and Frequential 8.0分 前25% 11. Identifying the Minimal and Maximal Phonetic Subspace of Spe 8.0分 前25% 12. Lattice-Guided Consistency Regularization of Dual-Mode Trans 8.0分 前25% 13. BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Su 8.0分 前25% 14. Synthetic Data Domain Adaptation for ASR via LLM-Based Text 8.0分 前25% 15. STACodec: Semantic Token Assignment for Balancing Acoustic F 8.0分 前25% 16. Language-Infused Retrieval-Augmented CTC with Adaptive Soft- 8.0分 前25% 17. Relative Time Intervals Representation For Word-Level Timest 8.0分 前25% 18. RLBR: Reinforcement Learning with Biasing Rewards for Contex 8.0分 前25% 19. Grey-Box Prompt Tuning With Graph Alignment for Speech-Langu 8.0分 前25% 20. Frontend Token Enhancement for Token-Based Speech Recognitio 8.0分 前25% 21. Noise-Robust AV-ASR Using Visual Features both in the Whispe 8.0分 前25% 22. Synthesized Data Selection via Score Distribution Matching f 8.0分 前25% 23. Bayesian Low-Rank Factorization for Robust Model Adaptation 8.0分 前25% 24. nGPT as a Scalable Architecture for Speech Recognition and T 7.5分 前25% 25. Input-Adaptive Differentiable Filterbanks via Hypernetworks 7.5分 前25% 26. A Study of Data Selection Strategies for Pre-Training Self-S 7.5分 前25% 27. K-Function: Joint Pronunciation Transcription and Feedback f 7.5分 前25% 28. Flexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning f 7.5分 前25% 29. Adversarial Fine-Tuning on Speech Foundation Model with Vuln 7.5分 前25% 30. WAV2LEV: Predicting Levenshtein Edit Operation Sequences For 7.5分 前25% 31. LOTUSDIS: A Thai Far-Field Meeting Corpus for Robust Convers 7.5分 前25% 32. Whisper-FEST: Single-Channel Far-Field Enhanced Speech-to-te 7.5分 前50% 33. Production-Scale Dynamic Vocabulary ASR Biasing with Word-Le 7.5分 前25% 34. Do we really need self-attention for streaming automatic spe 7.5分 前25% 35. Advancing LLM-Based Multi-Channel Multi-Speaker Speech Recog 7.5分 前25% 36. Adapting Diarization-Conditioned Whisper for End-to-End Mult 7.5分 前25% 37. CALM: Joint Contextual Acoustic-Linguistic Modeling for Pers 7.5分 前25% 38. TTA: Transcribe, Translate and Alignment for Cross-Lingual S 7.5分 前25% 39. Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annot 7.5分 前25% 40. LLM-Based Post-ASR Error Correction for Disordered Speech 7.5分 前50% 41. Content-Preserving Speech Representation Learning Via Adapti 7.5分 前25% 42. Exploring SSL Discrete Tokens for Multilingual Automatic Spe 7.5分 前25% 43. TICL: Text-Embedding KNN for Speech in-Context Learning Unlo 7.5分 前25% 44. Purification Before Fusion: Toward Mask-Free Speech Enhancem 7.5分 前25% 45. Cross-Modal Bottleneck Fusion for Noise Robust Audio-Visual 7.5分 前25% 46. Inverse-Hessian Regularization for Continual Learning in ASR 7.5分 前25% 47. BEST-RQ-based Self-Supervised Learning for Whisper Domain Ad 7.5分 前25% 48. CCST: Cross-Modal and Consistency-Aware Self-Training for So 7.5分 前25% 49. Chunk-Wise Attention Transducers for Fast and Accurate Strea 7.5分 前25% 50. Chunkwise Aligners for Streaming Speech Recognition 7.5分 前25% 51. FinHuBERT: Hierarchical Feature Imitating Networks for Low-R 7.5分 前25% 52. UMA-SPLIT: Unimodal Aggregation for Both English and Mandari 7.5分 前25% 53. MNV-17: A High-Quality Performative Mandarin Dataset for Non 7.5分 前25% 54. Listen, But Don’t Leak: Sensitive Data Protection for Privac 7.5分 前25% 55. Confidence-Guided Error Correction for Disordered Speech Rec 7.5分 前25% 56. Advancing Semi-Supervised Child Speech Recognition with Omni 7.5分 前25% 57. Variational Low-Rank Adaptation for Personalized Impaired Sp 7.5分 前50% 58. Decoder-Only Conformer with Modality-Aware Sparse Mixtures o 7.5分 前25% 59. Cross-Cultural Bias in Mel-Scale Representations: Evidence a 7.0分 前25% 60. Bridging the Front-End and Back-End for Robust ASR via Cross 7.0分 前25% 61. TASU: Text-only Alignment for Speech Understanding 7.0分 前25% 62. Streaming Speech Recognition with Decoder-Only Large Languag 7.0分 前25% 63. Reducing Prompt Sensitivity in LLM-Based Speech Recognition 7.0分 前25% 64. PAC: Pronunciation-Aware Contextualized Large Language Model 7.0分 前25% 65. Investigating The Effect Of Sentence-Level Syntactic Structu 7.0分 前50% 66. SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD 7.0分 前25% 67. Three Seconds is Sufficient: A Multi-Pronged Framework for M 7.0分 前50% 68. In-Sync: Adaptation of Speech Aware Large Language Models fo 7.0分 前50% 69. AccLID: Accent-aware Language Identification for Robust Mult 7.0分 前25% 70. BBPE16: UTF-16-Based Byte-Level Byte-Pair Encoding for Impro 7.0分 前50% 71. Mixtures of Lightweight Articulatory Experts for Multilingua 7.0分 前25% 72. Towards Orthographically-Informed Evaluation of Speech Recog 7.0分 前25% 73. Contextual Biasing for ASR in Speech LLM with Common Word Cu 7.0分 前25% 74. Peeking Into the Future for Contextual Biasing 7.0分 前50% 75. SLM-TTA: A Framework for Test-Time Adaptation of Generative 7.0分 前50% 76. Tokenchain: A Discrete Speech Chain via Semantic Token Model 7.0分 前25% 77. Advanced modeling of interlanguage speech intelligibility be 7.0分 前25% 78. Leveraging Segment-Level Speech Representations for LLM-Base 7.0分 前50% 79. Mitigating Attention Sinks and Massive Activations in Audio- 7.0分 前25% 80. Teaching the Teachers: Boosting Unsupervised Domain Adaptati 7.0分 前25% 81. Attention2Probability: Attention-Driven Terminology Probabil 7.0分 前25% 82. Whisper-MLA: Reducing GPU Memory Consumption of ASR Models B 7.0分 前25% 83. Mind the Shift: Using Delta SSL Embeddings to Enhance Child 7.0分 前25% 84. PhoenixDSR: Phoneme-Guided and LLM-Enhanced Dysarthric Speec 7.0分 前50% 85. Audio-Conditioned Diffusion LLMs for ASR and Deliberation Pr 7.0分 前50% 86. Sequence-Level Unsupervised Training in Speech Recognition: 6.5分 前50% 87. Ara-BEST-RQ: Multi Dialectal Arabic SSL 6.5分 前50% 88. Medical ASR Enhancement by Domain-Specific Reinforcement Fin 6.5分 前25% 89. CTC-DID: CTC-Based Arabic Dialect Identification for Streami 6.5分 前50% 90. Towards Fair ASR for Second Language Speakers using Fairness 6.5分 前50% 91. Towards Building Speech Large Language Models for Multitask 6.5分 前25% 92. Whisper: Courtside Edition - Enhancing ASR Performance throu 6.5分 前50% 93. SED: Structural Entropy Based Speech Discretization for Disc 6.5分 前50% 94. Multilingual Supervised Pretraining with Lm-Assisted Decodin 6.5分 前50% 95. Improving Automatic Speech Recognition by Mitigating Distort 6.5分 前25% 96. Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Sup 6.5分 前50% 97. Proficiency-Aware Adaptation and Data Augmentation for Robus 6.5分 前25% 98. Domain-Aware Scheduling for ASR Fine-Tuning 6.5分 前50% 99. Online Register For Dual-Mode Self-Supervised Speech Models: 6.5分 前50% 100. Learning to Align with Unbalanced Optimal Transport in Lingu 6.5分 前50% 101. How Far Do SSL Speech Models Listen for Tone? Temporal Focus 6.5分 前50% 102. Leveraging Audio-Visual Data to Reduce the Multilingual Gap 6.0分 前50% 📋 论文详情 🥇 Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER 🔥 9.0/10 | 前25% | #语音识别 | #大语言模型 | #鲁棒性 #数据集 ...

2026-04-29