ICASSP 2026 - 语音识别 论文列表
ICASSP 2026 - 语音识别 共 102 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Towards Robust Dysarthric Speech Recognition: LLM-Agent Post 9.0分 前25% 🥈 Target-Speaker LLM-ASR with Speaker-Aware Speech Encoder 8.8分 前10% 🥉 SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper 8.5分 前25% 4. Scaling Multi-Talker ASR with Speaker-Agnostic Activity Stre 8.5分 前25% 5. Improving Contextual Asr Via Multi-Grained Fusion With Large 8.5分 前25% 6. OMNI-AVSR: Towards Unified Multimodal Speech Recognition Wit 8.5分 前10% 7. AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Sp 8.3分 前25% 8. Polynomial Mixing for Efficient Self-Supervised Speech Encod 8.0分 前25% 9. GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialecta 8.0分 前25% 10. Voting-Based Pitch Estimation with Temporal and Frequential 8.0分 前25% 11. Identifying the Minimal and Maximal Phonetic Subspace of Spe 8.0分 前25% 12. Lattice-Guided Consistency Regularization of Dual-Mode Trans 8.0分 前25% 13. BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Su 8.0分 前25% 14. Synthetic Data Domain Adaptation for ASR via LLM-Based Text 8.0分 前25% 15. STACodec: Semantic Token Assignment for Balancing Acoustic F 8.0分 前25% 16. Language-Infused Retrieval-Augmented CTC with Adaptive Soft- 8.0分 前25% 17. Relative Time Intervals Representation For Word-Level Timest 8.0分 前25% 18. RLBR: Reinforcement Learning with Biasing Rewards for Contex 8.0分 前25% 19. Grey-Box Prompt Tuning With Graph Alignment for Speech-Langu 8.0分 前25% 20. Frontend Token Enhancement for Token-Based Speech Recognitio 8.0分 前25% 21. Noise-Robust AV-ASR Using Visual Features both in the Whispe 8.0分 前25% 22. Synthesized Data Selection via Score Distribution Matching f 8.0分 前25% 23. Bayesian Low-Rank Factorization for Robust Model Adaptation 8.0分 前25% 24. nGPT as a Scalable Architecture for Speech Recognition and T 7.5分 前25% 25. Input-Adaptive Differentiable Filterbanks via Hypernetworks 7.5分 前25% 26. A Study of Data Selection Strategies for Pre-Training Self-S 7.5分 前25% 27. K-Function: Joint Pronunciation Transcription and Feedback f 7.5分 前25% 28. Flexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning f 7.5分 前25% 29. Adversarial Fine-Tuning on Speech Foundation Model with Vuln 7.5分 前25% 30. WAV2LEV: Predicting Levenshtein Edit Operation Sequences For 7.5分 前25% 31. LOTUSDIS: A Thai Far-Field Meeting Corpus for Robust Convers 7.5分 前25% 32. Whisper-FEST: Single-Channel Far-Field Enhanced Speech-to-te 7.5分 前50% 33. Production-Scale Dynamic Vocabulary ASR Biasing with Word-Le 7.5分 前25% 34. Do we really need self-attention for streaming automatic spe 7.5分 前25% 35. Advancing LLM-Based Multi-Channel Multi-Speaker Speech Recog 7.5分 前25% 36. Adapting Diarization-Conditioned Whisper for End-to-End Mult 7.5分 前25% 37. CALM: Joint Contextual Acoustic-Linguistic Modeling for Pers 7.5分 前25% 38. TTA: Transcribe, Translate and Alignment for Cross-Lingual S 7.5分 前25% 39. Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annot 7.5分 前25% 40. LLM-Based Post-ASR Error Correction for Disordered Speech 7.5分 前50% 41. Content-Preserving Speech Representation Learning Via Adapti 7.5分 前25% 42. Exploring SSL Discrete Tokens for Multilingual Automatic Spe 7.5分 前25% 43. TICL: Text-Embedding KNN for Speech in-Context Learning Unlo 7.5分 前25% 44. Purification Before Fusion: Toward Mask-Free Speech Enhancem 7.5分 前25% 45. Cross-Modal Bottleneck Fusion for Noise Robust Audio-Visual 7.5分 前25% 46. Inverse-Hessian Regularization for Continual Learning in ASR 7.5分 前25% 47. BEST-RQ-based Self-Supervised Learning for Whisper Domain Ad 7.5分 前25% 48. CCST: Cross-Modal and Consistency-Aware Self-Training for So 7.5分 前25% 49. Chunk-Wise Attention Transducers for Fast and Accurate Strea 7.5分 前25% 50. Chunkwise Aligners for Streaming Speech Recognition 7.5分 前25% 51. FinHuBERT: Hierarchical Feature Imitating Networks for Low-R 7.5分 前25% 52. UMA-SPLIT: Unimodal Aggregation for Both English and Mandari 7.5分 前25% 53. MNV-17: A High-Quality Performative Mandarin Dataset for Non 7.5分 前25% 54. Listen, But Don’t Leak: Sensitive Data Protection for Privac 7.5分 前25% 55. Confidence-Guided Error Correction for Disordered Speech Rec 7.5分 前25% 56. Advancing Semi-Supervised Child Speech Recognition with Omni 7.5分 前25% 57. Variational Low-Rank Adaptation for Personalized Impaired Sp 7.5分 前50% 58. Decoder-Only Conformer with Modality-Aware Sparse Mixtures o 7.5分 前25% 59. Cross-Cultural Bias in Mel-Scale Representations: Evidence a 7.0分 前25% 60. Bridging the Front-End and Back-End for Robust ASR via Cross 7.0分 前25% 61. TASU: Text-only Alignment for Speech Understanding 7.0分 前25% 62. Streaming Speech Recognition with Decoder-Only Large Languag 7.0分 前25% 63. Reducing Prompt Sensitivity in LLM-Based Speech Recognition 7.0分 前25% 64. PAC: Pronunciation-Aware Contextualized Large Language Model 7.0分 前25% 65. Investigating The Effect Of Sentence-Level Syntactic Structu 7.0分 前50% 66. SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD 7.0分 前25% 67. Three Seconds is Sufficient: A Multi-Pronged Framework for M 7.0分 前50% 68. In-Sync: Adaptation of Speech Aware Large Language Models fo 7.0分 前50% 69. AccLID: Accent-aware Language Identification for Robust Mult 7.0分 前25% 70. BBPE16: UTF-16-Based Byte-Level Byte-Pair Encoding for Impro 7.0分 前50% 71. Mixtures of Lightweight Articulatory Experts for Multilingua 7.0分 前25% 72. Towards Orthographically-Informed Evaluation of Speech Recog 7.0分 前25% 73. Contextual Biasing for ASR in Speech LLM with Common Word Cu 7.0分 前25% 74. Peeking Into the Future for Contextual Biasing 7.0分 前50% 75. SLM-TTA: A Framework for Test-Time Adaptation of Generative 7.0分 前50% 76. Tokenchain: A Discrete Speech Chain via Semantic Token Model 7.0分 前25% 77. Advanced modeling of interlanguage speech intelligibility be 7.0分 前25% 78. Leveraging Segment-Level Speech Representations for LLM-Base 7.0分 前50% 79. Mitigating Attention Sinks and Massive Activations in Audio- 7.0分 前25% 80. Teaching the Teachers: Boosting Unsupervised Domain Adaptati 7.0分 前25% 81. Attention2Probability: Attention-Driven Terminology Probabil 7.0分 前25% 82. Whisper-MLA: Reducing GPU Memory Consumption of ASR Models B 7.0分 前25% 83. Mind the Shift: Using Delta SSL Embeddings to Enhance Child 7.0分 前25% 84. PhoenixDSR: Phoneme-Guided and LLM-Enhanced Dysarthric Speec 7.0分 前50% 85. Audio-Conditioned Diffusion LLMs for ASR and Deliberation Pr 7.0分 前50% 86. Sequence-Level Unsupervised Training in Speech Recognition: 6.5分 前50% 87. Ara-BEST-RQ: Multi Dialectal Arabic SSL 6.5分 前50% 88. Medical ASR Enhancement by Domain-Specific Reinforcement Fin 6.5分 前25% 89. CTC-DID: CTC-Based Arabic Dialect Identification for Streami 6.5分 前50% 90. Towards Fair ASR for Second Language Speakers using Fairness 6.5分 前50% 91. Towards Building Speech Large Language Models for Multitask 6.5分 前25% 92. Whisper: Courtside Edition - Enhancing ASR Performance throu 6.5分 前50% 93. SED: Structural Entropy Based Speech Discretization for Disc 6.5分 前50% 94. Multilingual Supervised Pretraining with Lm-Assisted Decodin 6.5分 前50% 95. Improving Automatic Speech Recognition by Mitigating Distort 6.5分 前25% 96. Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Sup 6.5分 前50% 97. Proficiency-Aware Adaptation and Data Augmentation for Robus 6.5分 前25% 98. Domain-Aware Scheduling for ASR Fine-Tuning 6.5分 前50% 99. Online Register For Dual-Mode Self-Supervised Speech Models: 6.5分 前50% 100. Learning to Align with Unbalanced Optimal Transport in Lingu 6.5分 前50% 101. How Far Do SSL Speech Models Listen for Tone? Temporal Focus 6.5分 前50% 102. Leveraging Audio-Visual Data to Reduce the Multilingual Gap 6.0分 前50% 📋 论文详情 🥇 Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER 🔥 9.0/10 | 前25% | #语音识别 | #大语言模型 | #鲁棒性 #数据集 ...