ICASSP 2026 - 语音合成 论文列表

ICASSP 2026 - 语音合成 共 63 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 T-Cache: Fast Inference For Masked Generative Transformer-Ba 9.0分 前25% 🥈 Wavenext 2: Convnext-Based Fast Neural Vocoders with Residua 9.0分 前25% 🥉 VoXtream: Full-Stream Text-To-Speech With Extremely Low Late 8.5分 前25% 4. EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion C 8.5分 前25% 5. No Verifiable Reward for Prosody: Toward Preference-Guided P 8.0分 前25% 6. Marco-Voice: A Unified Framework for Expressive Speech Synth 8.0分 前25% 7. Neuromamba: Adaptive Frequency Filtering with a Pyramid Mamb 8.0分 前25% 8. Group Relative Policy Optimization for Text-to-Speech with L 8.0分 前25% 9. Do You Hear What I Mean? Quantifying the Instruction-Percept 8.0分 前25% 10. OV-INSTRUCTTTS: Towards Open-Vocabulary Instruct Text-to-Spe 8.0分 前25% 11. HD-PPT: Hierarchical Decoding of Content- and Prompt-Prefere 8.0分 前25% 12. Emotion-Aligned Generation in Diffusion Text to Speech Model 8.0分 前25% 13. Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, 8.0分 前25% 14. DAIEN-TTS: Disentangled Audio Infilling for Environment-Awar 8.0分 前25% 15. BridgeCode: A Dual Speech Representation Paradigm for Autore 8.0分 前25% 16. Continuous-Token Diffusion for Speaker-Referenced TTS in Mul 8.0分 前10% 17. Prosody-Guided Harmonic Attention for Phase-Coherent Neural 8.0分 前25% 18. Optimizing Speech Language Models for Acoustic Consistency 8.0分 前25% 19. NCF-TTS: Enhancing Flow Matching Based Text-To-Speech with N 8.0分 前25% 20. ARCHI-TTS: A Flow-Matching-Based Text-to-Speech Model with S 8.0分 前25% 21. EMG-to-Speech with Fewer Channels 7.5分 前25% 22. VividTalker: A Modular Framework for Expressive 3D Talking A 7.5分 前25% 23. Real-Time Streaming MEL Vocoding with Generative Flow Matchi 7.5分 前25% 24. From Hallucination to Articulation: Language Model-Driven Lo 7.5分 前25% 25. SynParaSpeech: Automated Synthesis of Paralinguistic Dataset 7.5分 前25% 26. Asynchrony-Aware Decoupled Multimodal Control for Cued Speec 7.5分 前10% 27. DMP-TTS: Disentangled Multi-Modal Prompting for Controllable 7.5分 前25% 28. RRPO: Robust Reward Policy Optimization for LLM-Based Emotio 7.5分 前25% 29. Syncspeech: Efficient and Low-Latency Text-to-Speech Based o 7.5分 前25% 30. Principled Coarse-Grained Acceptance For Speculative Decodin 7.5分 前25% 31. SPADE: Structured Pruning and Adaptive Distillation for Effi 7.5分 前25% 32. Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Code 7.5分 前25% 33. Discrete Diffusion for Generative Modeling of Text-Aligned S 7.5分 前25% 34. Emotional Dimension Control in Language Model-Based Text-To- 7.5分 前25% 35. Beyond Global Emotion: Fine-Grained Emotional Speech Synthes 7.5分 前25% 36. QFOCUS: Controllable Synthesis for Automated Speech Stress E 7.5分 前50% 37. Synthetic yet Striking? Assessing Vocal Charisma in TTS via 7.5分 前25% 38. TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Fram 7.5分 前25% 39. Deep Dubbing: End-to-End Auto-Audiobook System with Text-to- 7.5分 前25% 40. Erasing Your Voice Before it’s Heard: Training-Free Speaker 7.5分 前25% 41. InstructAudio: Unified Speech and Music Generation with Natu 7.5分 前25% 42. GLA-GRAD++: An Improved Griffin-Lim Guided Diffusion Model f 7.5分 前25% 43. Int-MeanFlow: Few-Step Speech Generation with Integral Veloc 7.5分 前25% 44. Training Flow Matching Models with Reliable Labels via Self- 7.5分 前25% 45. Hierarchical Discrete Flow Matching For Multi-Codebook Codec 7.5分 前25% 46. Frame-Stacked Local Transformers for Efficient Multi-Codeboo 7.5分 前25% 47. Direct Preference Optimization For Speech Autoregressive Dif 7.5分 前25% 48. MirrorTalk: Forging Personalized Avatars Via Disentangled St 7.0分 前25% 49. Residual Tokens Enhance Masked Autoencoders for Speech Model 7.0分 前50% 50. SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word L 7.0分 前50% 51. SPAM: Style Prompt Adherence Metric for Prompt-Based TTS 7.0分 前50% 52. Gelina: Unified Speech and Gesture Synthesis Via Interleaved 7.0分 前50% 53. Retrieval-Based Speculative Decoding For Autoregressive Spee 7.0分 前50% 54. T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Ph 7.0分 前50% 55. Wave-Trainer-Fit: Neural Vocoder With Trainable Prior And Fi 7.0分 前25% 56. EmoShift: Lightweight Activation Steering for Enhanced Emoti 7.0分 前50% 57. Task Vector in TTS: Toward Emotionally Expressive Dialectal 7.0分 前50% 58. Quantifying Speaker Embedding Phonological Rule Interactions 7.0分 前25% 59. PFluxTTS: Hybrid Flow-Matching TTS with Robust Cross-Lingual 7.0分 前50% 60. LP-CFM: Perceptual Invariance-Aware Conditional Flow Matchin 7.0分 前25% 61. SFM-TTS: Lightweight and Rapid Speech Synthesis with Flexibl 7.0分 前25% 62. MELA-TTS: Joint Transformer-Diffusion Model with Representat 7.0分 前25% 63. Combining Multi-Order Attention and Multi-Resolution Discrim 6.5分 前50% 📋 论文详情 🥇 T-Cache: Fast Inference For Masked Generative Transformer-Based TTS Via Prompt-Aware Feature Caching 🔥 9.0/10 | 前25% | #语音合成 | #实时处理 | #零样本 #语音大模型 ...

2026-04-29

ICASSP 2026 - 语音增强 #对抗防御 论文列表

ICASSP 2026 - 语音增强 #对抗防御 共 1 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Adversarial Defense via Generative Speech Enhancement Module 7.5分 前25% 📋 论文详情 🥇 Adversarial Defense via Generative Speech Enhancement Module ✅ 7.5/10 | 前25% | #语音增强 #对抗防御 | #语音增强 #数据增强 | #语音增强 #对抗防御 👥 作者与机构 第一作者:未说明 通讯作者:未说明 作者列表:Chi-Tao Chen(国立中央大学资讯工程学系),Chun-Shien Lu(中央研究院资讯科技研究所),Jia-Ching Wang(国立中央大学资讯工程学系) 💡 毒舌点评 本文巧妙地将对抗防御问题转化为语音增强任务,使用一个轻量级(2M参数)且高效的生成模型(MP-SENet)实现了在多个数据集和攻击类型下的出色防御效果,推理速度远超基于扩散模型的竞品。然而,其核心防御机制(高斯噪声注入+增强)在理论上可能不够“坚固”,面对精心设计的自适应攻击时(如论文表5),性能仍有显著下降,且在SC09这一基准上并未超越最强的对比方法AudioPure。 🔗 开源详情 代码:提供了官方GitHub仓库链接:apoman123/SpeechEnhancementDefense。 模型权重:论文中提及使用了在DNS Challenge上预训练的MP-SENet模型,但未明确是否公开其微调后的防御专用权重。 数据集:使用了公开数据集:SC09(Google Speech Commands子集), VCTK, QKWS, DNS-Challenge。 Demo:未提及。 复现材料:给出了关键的训练数据增强细节(噪声dBFS范围及最优值)、损失函数公式与权重、攻击参数设置。但优化器、学习率等训练配置未说明。 引用的开源项目:依赖了公开模型MP-SENet,并引用了多个基线方法和攻击方法的开源实现(如DefenseGAN, AudioPure, PGD攻击代码等)。 📌 核心摘要 ...

2026-04-29

ICASSP 2026 - 语音增强 论文列表

ICASSP 2026 - 语音增强 共 75 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 A Lightweight Fourier-Based Network for Binaural Speech Enha 8.5分 前25% 🥈 DiTSE: High-Fidelity Generative Speech Enhancement via Laten 8.5分 前10% 🥉 Towards Lightweight Adaptation of Speech Enhancement Models 8.5分 前25% 4. FastEnhancer: Speed-Optimized Streaming Neural Speech Enhanc 8.5分 前25% 5. DisContSE: Single-Step Diffusion Speech Enhancement based on 8.5分 前10% 6. Sidon: Fast and Robust Open-Source Multilingual Speech Resto 8.5分 前25% 7. Spike-Driven Low-Power Speech Bandwidth Extension 8.0分 前25% 8. MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generativ 8.0分 前25% 9. Deep Learning-Based Joint Optimization of Adaptive Feedback 8.0分 前25% 10. HyFlowSE: Hybrid End-To-End Flow-Matching Speech Enhancement 8.0分 前25% 11. HCGAN: Harmonic-Coupled Generative Adversarial Network for S 8.0分 前50% 12. Exploring Resolution-Wise Shared Attention in Hybrid Mamba-U 8.0分 前25% 13. Mixture To Beamformed Mixture: Leveraging Beamformed Mixture 8.0分 前25% 14. Modeling Strategies For Speech Enhancement in The Latent Spa 8.0分 前50% 15. LAFUFU: Latent Acoustic Features For Ultra-Fast Utterance Re 8.0分 前25% 16. Influence of Clean Speech Characteristics on Speech Enhancem 8.0分 前25% 17. LipsAM: Lipschitz-Continuous Amplitude Modifier for Audio Si 7.5分 前25% 18. MSANET: Multi-Scale Semantic Aggregation Network for Brain-A 7.5分 前25% 19. Bone-Conduction Guided Multimodal Speech Enhancement with Co 7.5分 前25% 20. The 3rd Clarity Prediction Challenge: A Machine Learning Cha 7.5分 前25% 21. Two-Stage Language Model Framework for Acoustic Echo Cancell 7.5分 前25% 22. E2E-AEC: Implementing An End-To-End Neural Network Learning 7.5分 前25% 23. SpatialNet-Echo: Real-Time Acoustic Echo Cancellation via In 7.5分 前25% 24. A Stabilized Hybrid Active Noise Control Algorithm of GFANC 7.5分 前25% 25. Enhancing Speech Intelligibility Prediction for Hearing Aids 7.5分 前25% 26. H-nnPBFDAF: Hierarchical Neural Network Partitioned Block Fr 7.5分 前25% 27. Joint Deep Secondary Path Estimation and Adaptive Control fo 7.5分 前25% 28. Enhancing Noise Robustness for Neural Speech Codecs Through 7.5分 前25% 29. Low-Bandwidth High-Fidelity Speech Transmission with Generat 7.5分 前25% 30. From Diet to Free Lunch: Estimating Auxiliary Signal Propert 7.5分 前25% 31. Beamforming Using Virtual Microphones for Hearing Aid Applic 7.5分 前50% 32. I-DCCRN-VAE: An Improved Deep Representation Learning Framew 7.5分 前25% 33. Do We Need EMA for Diffusion-Based Speech Enhancement? Towar 7.5分 前50% 34. Hair Noise Analysis and Mitigation for Smart Glasses Audio C 7.5分 前25% 35. Are Modern Speech Enhancement Systems Vulnerable to Adversar 7.5分 前25% 36. UJCodec: An End-to-end Unet-Style Codec for Joint Speech Com 7.5分 前25% 37. Spatial Covariance Matrix Reconstruction for Speech Enhancem 7.5分 前25% 38. Training-Free Inference-Time Scaling for Audio Source Separa 7.5分 前25% 39. Forward Convolutive Prediction for Frame Online Monaural Spe 7.5分 前50% 40. MeanFlowSE: One-Step Generative Speech Enhancement via Condi 7.5分 前10% 41. FlowSE-GRPO: Training Flow Matching Speech Enhancement via O 7.5分 前25% 42. Aligning Generative Speech Enhancement with Perceptual Feedb 7.5分 前25% 43. PG-SE: Predictive Acceleration and Correction for Generative 7.5分 前25% 44. Dynamically Slimmable Speech Enhancement Network with Metric 7.5分 前25% 45. Lightweight Phoneme-Conditioned Bandwidth Extension for Body 7.5分 前25% 46. Fast-ULCNet: A Fast and Ultra Low Complexity Network for Sin 7.5分 前25% 47. ParaGSE: Parallel Generative Speech Enhancement with Group-V 7.5分 前25% 48. High-Fidelity Speech Enhancement Via Discrete Audio Tokens 7.5分 前25% 49. DISSR: Disentangling Speech Representation for Degradation-P 7.5分 前25% 50. Ranking The Impact of Contextual Specialization in Neural Sp 7.5分 前25% 51. BSMP-SENet:Band-Split Magnitude-Phase Network for Speech Enh 7.0分 前25% 52. DECAF: Dynamic Envelope Context-Aware Fusion for Speech-Enve 7.0分 前25% 53. DAT-CFTNet: Speech Enhancement for Cochlear Implant Recipien 7.0分 前50% 54. Acoustic Teleportation Via Disentangled Neural Audio Codec R 7.0分 前25% 55. Reference Microphone Selection for Guided Source Separation 7.0分 前50% 56. Low-Latency Audio Front-End Region-of-Interest Beamforming f 7.0分 前25% 57. AmbiDrop: Array-Agnostic Speech Enhancement Using Ambisonics 7.0分 前50% 58. Joint Multichannel Acoustic Feedback Cancellation and Speake 7.0分 前25% 59. Gdiffuse: Diffusion-Based Speech Enhancement with Noise Mode 7.0分 前25% 60. An Efficient Neural Network for Modeling Human Auditory Neur 7.0分 前25% 61. Shortcut Flow Matching for Speech Enhancement: Step-Invarian 7.0分 前25% 62. Generalizability of Predictive and Generative Speech Enhance 7.0分 前50% 63. Mambaformer: State-Space Augmented Self-Attention with Downu 7.0分 前25% 64. Auditory-Inspired Transformer for Binaural Speech Enhancemen 7.0分 前25% 65. A State-Dependent Markov Diffusion Process for Generative Sp 6.5分 前25% 66. Confidence-Based Filtering for Speech Dataset Curation with 6.5分 前50% 67. Sampling-Rate-Agnostic Speech Super-Resolution Based on Gaus 6.5分 前25% 68. Low-Frequency Harmonic Control for Speech Intelligibility in 6.5分 前50% 69. What the student learns in knowledge distillation: A subspac 6.5分 前50% 70. MeanSE: Efficient Generative Speech Enhancement with Mean Fl 6.5分 前25% 71. On The Design of Efficient Neural Methods for Geometry-Agnos 6.5分 前50% 72. Position-Invariant Fine-Tuning Of Speech Enhancement Models 6.5分 前50% 73. Stereophonic Acoustic Echo Cancellation Using an Improved Af 6.0分 前50% 74. Towards Real-Time Generative Speech Restoration with Flow-Ma 6.0分 前50% 75. Is Phase Really Needed for Weakly-Supervised Dereverberation 6.0分 前50% 📋 论文详情 🥇 A Lightweight Fourier-Based Network for Binaural Speech Enhancement with Spatial Cue Preservation 🔥 8.5/10 | 前25% | #语音增强 | #深度学习 | #轻量级模型 #空间音频 ...

2026-04-29

ICASSP 2026 - 语音大模型 论文列表

ICASSP 2026 - 语音大模型 共 3 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Cross-Lingual Interleaving for Speech Language Models 7.5分 前25% 🥈 Cross-Modal Knowledge Distillation for Speech Large Language 7.0分 前25% 🥉 SpeechMapper: Speech-To-Text Embedding Projector for LLMs 7.0分 前25% 📋 论文详情 🥇 Cross-Lingual Interleaving for Speech Language Models ✅ 7.5/10 | 前25% | #语音大模型 | #预训练 #多语言 | #预训练 #多语言 👥 作者与机构 第一作者:Adel Moumen(Department of Engineering, University of Cambridge, UK) 通讯作者:未说明 作者列表:Adel Moumen(Department of Engineering, University of Cambridge, UK)、Guangzhi Sun(Department of Engineering, University of Cambridge, UK)、Philip C. Woodland(Department of Engineering, University of Cambridge, UK) 💡 毒舌点评 ...

2026-04-29

ICASSP 2026 - 语音对话系统 论文列表

ICASSP 2026 - 语音对话系统 共 10 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 DOMA: Leveraging Diffusion Language Models with Adaptive Pri 8.5分 前25% 🥈 PersonaPlex: Voice and Role Control for Full Duplex Conversa 8.5分 前25% 🥉 UTI-LLM: A Personalized Articulatory-Speech Therapy Assistan 7.5分 前25% 4. A Dataset of Robot-Patient and Doctor-Patient Medical Dialog 7.5分 前25% 5. Game-Time: Evaluating Temporal Dynamics in Spoken Language M 7.5分 前25% 6. The Role of Prosodic and Lexical Cues in Turn-Taking with Se 7.5分 前25% 7. Vocalnet-M2: Advancing Low-Latency Spoken Language Modeling 7.5分 前25% 8. Easy Turn: Integrating Acoustic and Linguistic Modalities fo 7.0分 前25% 9. Still Thinking or Stopped Talking? Dialogue Silence Intentio 6.5分 前25% 10. Enhancing Dialogue-Related Speech Tasks with Generated Spoke 6.5分 前25% 📋 论文详情 🥇 DOMA: Leveraging Diffusion Language Models with Adaptive Prior for Intent Classification and Slot Filling 🔥 8.5/10 | 前25% | #语音对话系统 | #扩散模型 | #意图识别 #槽填充 ...

2026-04-29

ICASSP 2026 - 语音情感识别 论文列表

ICASSP 2026 - 语音情感识别 共 49 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Context-Aware Dynamic Graph Learning for Multimodal Emotion 8.8分 前10% 🥈 Prompt-Guided Mixture-of-Experts for Robust Multimodal Senti 8.5分 前25% 🥉 Clue2Emo: A Brain-Inspired Framework for Open-Vocabulary Mul 8.5分 前25% 4. Attention-Weighted Centered Kernel Alignment for Knowledge D 8.0分 前25% 5. Staged Diffusion with Hybrid Mixture-of-Experts (MOE) for Mu 8.0分 前25% 6. DGSDNet: Dual-Graph Spectral Diffusion Network for Incomplet 8.0分 前25% 7. Graph-based Modality Alignment for Robustness in Conversatio 8.0分 前25% 8. Multimodal Self-Attention Network with Temporal Alignment fo 8.0分 前25% 9. It Is Personal: The Importance of Personalization for Recogn 8.0分 前25% 10. AMBER2: Dual Ambiguity-Aware Emotion Recognition Applied to 8.0分 前25% 11. MI-Fuse: Label Fusion for Unsupervised Domain Adaptation wit 8.0分 前25% 12. Speech Emotion Recognition based on Hierarchical Transformer 8.0分 前25% 13. Affect-Jigsaw: Integrating Core and Peripheral Emotions for 8.0分 前25% 14. When Audio Matters: A Lightweight, Hierarchical Fusion Model 8.0分 前25% 15. Behind the Scenes: Mechanistic Interpretability of Lora-Adap 7.5分 前25% 16. Encoding Emotion Through Self-Supervised Eye Movement Recons 7.5分 前25% 17. Inter-Dialog Contrastive Learning for Multimodal Emotion Rec 7.5分 前25% 18. ADH-VA: Adaptive Directed-Hypergraph Convolution with VA Con 7.5分 前10% 19. SURE: Synergistic Uncertainty-Aware Reasoning for Multimodal 7.5分 前25% 20. Tpeformer: Temporal Patch Embedding Transformer 7.5分 前25% 21. LETPAV: Lexicon-Enhanced Text with Progressive Audio-Visual 7.5分 前25% 22. Multimodal Variational Graph Network for Multimodal Sentimen 7.5分 前25% 23. Diffemotalk: Audio-Driven Facial Animation with Fine-Grained 7.5分 前25% 24. MECap-R1: Emotion-Aware Policy with Reinforcement Learning f 7.5分 前25% 25. FIDIC:Fine-Grained Conversational Emotion Recognition via In 7.5分 前25% 26. Whisper-QF: Leveraging Dual Cross-Attention Q-Former for Spe 7.5分 前25% 27. Temporal Graph Modeling for Speech Emotion Recognition Using 7.5分 前25% 28. Mixture-of-Experts Based Soft-Label Learning for Multi-Label 7.5分 前25% 29. Multi-Channel Speech Enhancement for Cocktail Party Speech E 7.5分 前25% 30. Evaluating Emotion Recognition in Spoken Language Models on 7.5分 前50% 31. InconVAD: A Two-Stage Dual-Tower Framework for Multimodal Em 7.5分 前25% 32. MSF-SER: Enriching Acoustic Modeling with Multi-Granularity 7.5分 前25% 33. Rationale-Guided Learning for Multimodal Emotion Recognition 7.0分 前25% 34. Bimodal Fusion Framework for Dynamic Facial Expression Recog 7.0分 前25% 35. Stress Prediction from Temporal Emotion Trajectories in Clin 7.0分 前25% 36. Emo-TTA: Improving Test-Time Adaptation of Audio-Language Mo 7.0分 前25% 37. Test Time Adaptation for Speech Emotion Recognition 7.0分 前25% 38. Plug-and-Play Emotion Graphs for Compositional Prompting in 7.0分 前25% 39. Reasoning Driven Captions to Assist Noise Robust Speech Emot 7.0分 前25% 40. EmoTri-RL: Emotion- and Cause-Aware Reinforcement Learning f 7.0分 前25% 41. Modeling Both Intra- And Inter-Utterance Variability for Con 6.5分 前25% 42. DDSR-Net: Robust Multimodal Sentiment Analysis via Dynamic M 6.5分 前50% 43. Scaling Ambiguity: Augmenting Human Annotation in Speech Emo 6.5分 前50% 44. Recovering Performance in Speech Emotion Recognition from Di 6.5分 前50% 45. B-GRPO: Unsupervised Speech Emotion Recognition Based on Bat 6.5分 前50% 46. Leveraging Large Speech Language Models as Evaluators for Ex 6.5分 前50% 47. Gen-SER: When the Generative Model Meets Speech Emotion Reco 6.5分 前50% 48. SmoothCLAP: Soft-Target Enhanced Contrastive Language-Audio 6.5分 前50% 49. Acoustic and Facial Markers of Perceived Conversational Succ 6.0分 前50% 📋 论文详情 🥇 Context-Aware Dynamic Graph Learning for Multimodal Emotion Recognition with Missing Modalities 🔥 8.8/10 | 前10% | #语音情感识别 | #多模态模型 | #大语言模型 #多任务学习 ...

2026-04-29

ICASSP 2026 - 语音摘要 论文列表

ICASSP 2026 - 语音摘要 共 1 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Semantic Anchor Transfer from Short to Long Speech in a Dist 7.5分 前25% 📋 论文详情 🥇 Semantic Anchor Transfer from Short to Long Speech in a Distillation-Based Summarization Framework ✅ 7.5/10 | 前25% | #语音摘要 | #知识蒸馏 | #端到端 #迁移学习 👥 作者与机构 第一作者:Xiang He (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心) 通讯作者:Liang He (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心;新疆大学智能科学与技术学院;清华大学电子工程系) 作者列表:Xiang He (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心)、Xuejian Zhao (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心)、Longwei Li (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心)、Liang He (新疆大学计算机科学与技术学院,新疆多模态信息技术工程研究中心;新疆大学智能科学与技术学院;清华大学电子工程系) 💡 毒舌点评 ...

2026-04-29

ICASSP 2026 - 语音活动检测 论文列表

ICASSP 2026 - 语音活动检测 共 5 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Lingometer: On-Device Personal Speech Word Counting System 8.0分 前25% 🥈 EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detect 7.5分 前25% 🥉 Dual Data Scaling for Robust Two-Stage User-Defined Keyword 7.5分 前25% 4. EdgeSpot: Efficient and High-Performance Few-Shot Model for 7.5分 前25% 5. TVP-UNet: Threshold Variance Penalty U-Net for Voice Activit 7.0分 前25% 📋 论文详情 🥇 Lingometer: On-Device Personal Speech Word Counting System 🔥 8.0/10 | 前25% | #语音活动检测 | #端到端 | #低资源 #数据增强 ...

2026-04-29

ICASSP 2026 - 语音理解 论文列表

ICASSP 2026 - 语音理解 共 2 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Exploring Fine-Tuning Of Large Audio Language Models For Spo 8.0分 前25% 🥈 Scaling Spoken Language Models with Syllabic Speech Tokeniza 7.0分 前25% 📋 论文详情 🥇 Exploring Fine-Tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data 🔥 8.0/10 | 前25% | #语音理解 | #迁移学习 | #低资源 #多语言 👥 作者与机构 第一作者:Youngwon Choi (MAUM AI Inc., Republic of Korea) 通讯作者:Huu-Kim Nguyen (∗ 作者列表中标注星号,现单位为 Atmanity Inc., USA) 作者列表: Youngwon Choi (MAUM AI Inc., Republic of Korea) Jaeyoon Jung (MAUM AI Inc., Republic of Korea & Soongsil University, Republic of Korea) Hyeonyu Kim (MAUM AI Inc., Republic of Korea) Huu-Kim Nguyen (MAUM AI Inc., Republic of Korea → 现 Atmanity Inc., USA) Hwayeon Kim (MAUM AI Inc., Republic of Korea) 💡 毒舌点评 ...

2026-04-29

ICASSP 2026 - 语音生成 论文列表

ICASSP 2026 - 语音生成 共 1 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 Why Do Speech Language Models Fail to Generate Semantically 7.0分 前25% 📋 论文详情 🥇 Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective ✅ 7.0/10 | 前25% | #语音生成 | #模型评估 | #语音大模型 #零样本 👥 作者与机构 第一作者:Hankun Wang(X-LANCE Lab, 上海交通大学计算机科学与技术学院) 通讯作者:Kai Yu(X-LANCE Lab, 上海交通大学计算机科学与技术学院) 作者列表:Hankun Wang(X-LANCE Lab, 上海交通大学), Haoran Wang(X-LANCE Lab, 上海交通大学), Yiwei Guo(X-LANCE Lab, 上海交通大学), Zhihan Li(X-LANCE Lab, 上海交通大学), Chenpeng Du(X-LANCE Lab, 上海交通大学), Kai Yu(X-LANCE Lab, 上海交通大学) 💡 毒舌点评 ...

2026-04-29