ICASSP 2026 - 语音合成 论文列表
ICASSP 2026 - 语音合成 共 63 篇论文 ← 返回 ICASSP 2026 总览 排名 论文 评分 分档 🥇 T-Cache: Fast Inference For Masked Generative Transformer-Ba 9.0分 前25% 🥈 Wavenext 2: Convnext-Based Fast Neural Vocoders with Residua 9.0分 前25% 🥉 VoXtream: Full-Stream Text-To-Speech With Extremely Low Late 8.5分 前25% 4. EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion C 8.5分 前25% 5. No Verifiable Reward for Prosody: Toward Preference-Guided P 8.0分 前25% 6. Marco-Voice: A Unified Framework for Expressive Speech Synth 8.0分 前25% 7. Neuromamba: Adaptive Frequency Filtering with a Pyramid Mamb 8.0分 前25% 8. Group Relative Policy Optimization for Text-to-Speech with L 8.0分 前25% 9. Do You Hear What I Mean? Quantifying the Instruction-Percept 8.0分 前25% 10. OV-INSTRUCTTTS: Towards Open-Vocabulary Instruct Text-to-Spe 8.0分 前25% 11. HD-PPT: Hierarchical Decoding of Content- and Prompt-Prefere 8.0分 前25% 12. Emotion-Aligned Generation in Diffusion Text to Speech Model 8.0分 前25% 13. Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, 8.0分 前25% 14. DAIEN-TTS: Disentangled Audio Infilling for Environment-Awar 8.0分 前25% 15. BridgeCode: A Dual Speech Representation Paradigm for Autore 8.0分 前25% 16. Continuous-Token Diffusion for Speaker-Referenced TTS in Mul 8.0分 前10% 17. Prosody-Guided Harmonic Attention for Phase-Coherent Neural 8.0分 前25% 18. Optimizing Speech Language Models for Acoustic Consistency 8.0分 前25% 19. NCF-TTS: Enhancing Flow Matching Based Text-To-Speech with N 8.0分 前25% 20. ARCHI-TTS: A Flow-Matching-Based Text-to-Speech Model with S 8.0分 前25% 21. EMG-to-Speech with Fewer Channels 7.5分 前25% 22. VividTalker: A Modular Framework for Expressive 3D Talking A 7.5分 前25% 23. Real-Time Streaming MEL Vocoding with Generative Flow Matchi 7.5分 前25% 24. From Hallucination to Articulation: Language Model-Driven Lo 7.5分 前25% 25. SynParaSpeech: Automated Synthesis of Paralinguistic Dataset 7.5分 前25% 26. Asynchrony-Aware Decoupled Multimodal Control for Cued Speec 7.5分 前10% 27. DMP-TTS: Disentangled Multi-Modal Prompting for Controllable 7.5分 前25% 28. RRPO: Robust Reward Policy Optimization for LLM-Based Emotio 7.5分 前25% 29. Syncspeech: Efficient and Low-Latency Text-to-Speech Based o 7.5分 前25% 30. Principled Coarse-Grained Acceptance For Speculative Decodin 7.5分 前25% 31. SPADE: Structured Pruning and Adaptive Distillation for Effi 7.5分 前25% 32. Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Code 7.5分 前25% 33. Discrete Diffusion for Generative Modeling of Text-Aligned S 7.5分 前25% 34. Emotional Dimension Control in Language Model-Based Text-To- 7.5分 前25% 35. Beyond Global Emotion: Fine-Grained Emotional Speech Synthes 7.5分 前25% 36. QFOCUS: Controllable Synthesis for Automated Speech Stress E 7.5分 前50% 37. Synthetic yet Striking? Assessing Vocal Charisma in TTS via 7.5分 前25% 38. TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Fram 7.5分 前25% 39. Deep Dubbing: End-to-End Auto-Audiobook System with Text-to- 7.5分 前25% 40. Erasing Your Voice Before it’s Heard: Training-Free Speaker 7.5分 前25% 41. InstructAudio: Unified Speech and Music Generation with Natu 7.5分 前25% 42. GLA-GRAD++: An Improved Griffin-Lim Guided Diffusion Model f 7.5分 前25% 43. Int-MeanFlow: Few-Step Speech Generation with Integral Veloc 7.5分 前25% 44. Training Flow Matching Models with Reliable Labels via Self- 7.5分 前25% 45. Hierarchical Discrete Flow Matching For Multi-Codebook Codec 7.5分 前25% 46. Frame-Stacked Local Transformers for Efficient Multi-Codeboo 7.5分 前25% 47. Direct Preference Optimization For Speech Autoregressive Dif 7.5分 前25% 48. MirrorTalk: Forging Personalized Avatars Via Disentangled St 7.0分 前25% 49. Residual Tokens Enhance Masked Autoencoders for Speech Model 7.0分 前50% 50. SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word L 7.0分 前50% 51. SPAM: Style Prompt Adherence Metric for Prompt-Based TTS 7.0分 前50% 52. Gelina: Unified Speech and Gesture Synthesis Via Interleaved 7.0分 前50% 53. Retrieval-Based Speculative Decoding For Autoregressive Spee 7.0分 前50% 54. T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Ph 7.0分 前50% 55. Wave-Trainer-Fit: Neural Vocoder With Trainable Prior And Fi 7.0分 前25% 56. EmoShift: Lightweight Activation Steering for Enhanced Emoti 7.0分 前50% 57. Task Vector in TTS: Toward Emotionally Expressive Dialectal 7.0分 前50% 58. Quantifying Speaker Embedding Phonological Rule Interactions 7.0分 前25% 59. PFluxTTS: Hybrid Flow-Matching TTS with Robust Cross-Lingual 7.0分 前50% 60. LP-CFM: Perceptual Invariance-Aware Conditional Flow Matchin 7.0分 前25% 61. SFM-TTS: Lightweight and Rapid Speech Synthesis with Flexibl 7.0分 前25% 62. MELA-TTS: Joint Transformer-Diffusion Model with Representat 7.0分 前25% 63. Combining Multi-Order Attention and Multi-Resolution Discrim 6.5分 前50% 📋 论文详情 🥇 T-Cache: Fast Inference For Masked Generative Transformer-Based TTS Via Prompt-Aware Feature Caching 🔥 9.0/10 | 前25% | #语音合成 | #实时处理 | #零样本 #语音大模型 ...