掩码生成建模 on 语音/音频论文速递

掩码生成建模 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E6%8E%A9%E7%A0%81%E7%94%9F%E6%88%90%E5%BB%BA%E6%A8%A1/ Recent content in 掩码生成建模 on 语音/音频论文速递 Hugo zh-cn Thu, 23 Apr 2026 00:00:00 +0000 Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-text-to-speech-with-chain-of-details-modeling/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-text-to-speech-with-chain-of-details-modeling/ 1. **问题**：现有基于离散token的TTS模型，其“粗到细”的生成范式主要体现在从语义token到声学token的转换，而对语音固有的时间动态（temporal dynamics）缺乏显式建模。 2. **方法核心**：提出Chain-of-Details (CoD)框架，将语音生成分解为多