<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>语音合成 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E8%AF%AD%E9%9F%B3%E5%90%88%E6%88%90/</link>
    <description>Recent content in 语音合成 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E8%AF%AD%E9%9F%B3%E5%90%88%E6%88%90/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>ARCHI-TTS: A Flow-Matching-Based Text-to-Speech Model with Self-Supervised Semantic Aligner and Accelerated Inference</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Asynchrony-Aware Decoupled Multimodal Control for Cued Speech Video Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiogen-omni-a-unified-multimodal-diffusion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiogen-omni-a-unified-multimodal-diffusion/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>BridgeCode: A Dual Speech Representation Paradigm for Autoregressive Zero-Shot Text-to-Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridgecode-a-dual-speech-representation-paradigm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridgecode-a-dual-speech-representation-paradigm/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Combining Multi-Order Attention and Multi-Resolution Discriminator for High-Fidelity Neural Vocoder</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-multi-order-attention-and-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-multi-order-attention-and-multi/</guid>
      <description>语音合成 | 6.5/10</description>
    </item>
    <item>
      <title>Confidence-Based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-confidence-based-filtering-for-speech-dataset/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-confidence-based-filtering-for-speech-dataset/</guid>
      <description>语音增强 | 6.5/10</description>
    </item>
    <item>
      <title>Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-continuous-token-diffusion-for-speaker-referenced/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-continuous-token-diffusion-for-speaker-referenced/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>CosyAccent: Duration-Controllable Accent Normalization using Source-Synthesis Training Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cosyaccent-duration-controllable-accent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cosyaccent-duration-controllable-accent/</guid>
      <description>语音转换 | 7.8/10</description>
    </item>
    <item>
      <title>Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-f5-tts-towards-language-agnostic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-f5-tts-towards-language-agnostic/</guid>
      <description>语音克隆 | 7.5/10</description>
    </item>
    <item>
      <title>DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-daien-tts-disentangled-audio-infilling-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-daien-tts-disentangled-audio-infilling-for/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-dubbing-end-to-end-auto-audiobook-system/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-dubbing-end-to-end-auto-audiobook-system/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Direct Preference Optimization For Speech Autoregressive Diffusion Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-preference-optimization-for-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-preference-optimization-for-speech/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Discrete Diffusion for Generative Modeling of Text-Aligned Speech Tokens</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-diffusion-for-generative-modeling-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-diffusion-for-generative-modeling-of/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>DMP-TTS: Disentangled Multi-Modal Prompting for Controllable Text-to-Speech with Chained Guidance</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dmp-tts-disentangled-multi-modal-prompting-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dmp-tts-disentangled-multi-modal-prompting-for/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Do You Hear What I Mean? Quantifying the Instruction-Perception GAP in Instruction-Guided Expressive Text-to-Speech Systems</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-you-hear-what-i-mean-quantifying-the/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-you-hear-what-i-mean-quantifying-the/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>ECSA: Dual-Branch Emotion Compensation for Emotion-Consistent Speaker Anonymization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ecsa-dual-branch-emotion-compensation-for-emotion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ecsa-dual-branch-emotion-compensation-for-emotion/</guid>
      <description>语音匿名化 | 8.5/10</description>
    </item>
    <item>
      <title>EMG-to-Speech with Fewer Channels</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emilia-nv-a-non-verbal-speech-dataset-with-word/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emilia-nv-a-non-verbal-speech-dataset-with-word/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emorl-tts-reinforcement-learning-for-fine-grained/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emorl-tts-reinforcement-learning-for-fine-grained/</guid>
      <description>语音合成 | 8.5/10</description>
    </item>
    <item>
      <title>EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emoshift-lightweight-activation-steering-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emoshift-lightweight-activation-steering-for/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Emotion-Aligned Generation in Diffusion Text to Speech Models Via Preference-Guided Optimization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotion-aligned-generation-in-diffusion-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotion-aligned-generation-in-diffusion-text-to/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Emotional Damage: Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-damage-investigating-safety/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-damage-investigating-safety/</guid>
      <description>音频安全 | 7.5/10</description>
    </item>
    <item>
      <title>Emotional Dimension Control in Language Model-Based Text-To-Speech: Spanning a Broad Spectrum of Human Emotions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-entropy-guided-grvq-for-ultra-low-bitrate-neural/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-entropy-guided-grvq-for-ultra-low-bitrate-neural/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Erasing Your Voice Before it’s Heard: Training-Free Speaker Unlearning for Zero-Shot Text-to-Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-erasing-your-voice-before-its-heard-training-free/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-erasing-your-voice-before-its-heard-training-free/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>FED-PISA: Federated Voice Cloning Via Personalized Identity-Style Adaptation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fed-pisa-federated-voice-cloning-via-personalized/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fed-pisa-federated-voice-cloning-via-personalized/</guid>
      <description>语音克隆 | 8.0/10</description>
    </item>
    <item>
      <title>Frame-Stacked Local Transformers for Efficient Multi-Codebook Speech Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frame-stacked-local-transformers-for-efficient/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frame-stacked-local-transformers-for-efficient/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>From Hallucination to Articulation: Language Model-Driven Losses for Ultra Low-Bitrate Neural Speech Coding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Gelina: Unified Speech and Gesture Synthesis Via Interleaved Token Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gelina-unified-speech-and-gesture-synthesis-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gelina-unified-speech-and-gesture-synthesis-via/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>GLA-GRAD&#43;&#43;: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gla-grad-an-improved-griffin-lim-guided-diffusion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gla-grad-an-improved-griffin-lim-guided-diffusion/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Group Relative Policy Optimization for Text-to-Speech with Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-group-relative-policy-optimization-for-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-group-relative-policy-optimization-for-text-to/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>HD-PPT: Hierarchical Decoding of Content- and Prompt-Preference Tokens for Instruction-Based TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hd-ppt-hierarchical-decoding-of-content-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hd-ppt-hierarchical-decoding-of-content-and/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Hierarchical Discrete Flow Matching For Multi-Codebook Codec-Based Text-To-Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-discrete-flow-matching-for-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-discrete-flow-matching-for-multi/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-to-label-resynthesized-audio-the-dual-role-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-to-label-resynthesized-audio-the-dual-role-of/</guid>
      <description>音频深度伪造检测 | 7.5/10</description>
    </item>
    <item>
      <title>IBPCodec : A Low-Bitrate Lightweight Speech Codec With Inter-Band Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ibpcodec-a-low-bitrate-lightweight-speech-codec/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ibpcodec-a-low-bitrate-lightweight-speech-codec/</guid>
      <description>语音编码 | 7.0/10</description>
    </item>
    <item>
      <title>ICASSP 2026 - 语音合成 论文列表</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-061/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-061/</guid>
      <description>共 63 篇 ICASSP 2026 语音合成 方向论文</description>
    </item>
    <item>
      <title>InstructAudio: Unified Speech and Music Generation with Natural Language Instruction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instructaudio-unified-speech-and-music-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instructaudio-unified-speech-and-music-generation/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Int-MeanFlow: Few-Step Speech Generation with Integral Velocity Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-int-meanflow-few-step-speech-generation-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-int-meanflow-few-step-speech-generation-with/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Learning Vocal-Tract Area And Radiation With A Physics-Informed Webster Model</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-vocal-tract-area-and-radiation-with-a/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-vocal-tract-area-and-radiation-with-a/</guid>
      <description>歌唱语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Leveraging Text-to-Speech and Voice Conversion as Data Augmentation for Alzheimer&#39;s Disease Detection from Spontaneous Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-text-to-speech-and-voice-conversion-as/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-text-to-speech-and-voice-conversion-as/</guid>
      <description>语音生物标志物 | 7.0/10</description>
    </item>
    <item>
      <title>LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Marco-Voice: A Unified Framework for Expressive Speech Synthesis with Voice Cloning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-marco-voice-a-unified-framework-for-expressive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-marco-voice-a-unified-framework-for-expressive/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-measuring-prosody-diversity-in-zero-shot-tts-a/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-measuring-prosody-diversity-in-zero-shot-tts-a/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>MELA-TTS: Joint Transformer-Diffusion Model with Representation Alignment for Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mela-tts-joint-transformer-diffusion-model-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mela-tts-joint-transformer-diffusion-model-with/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Mind Your [m]S, Cross Your [t]S: a Large-Scale Phonetic Analysis of Speech Reproduction in Modern Speech Generators</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-your-ms-cross-your-ts-a-large-scale-phonetic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-your-ms-cross-your-ts-a-large-scale-phonetic/</guid>
      <description>语音伪造检测 | 7.0/10</description>
    </item>
    <item>
      <title>MirrorTalk: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mirrortalk-forging-personalized-avatars-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mirrortalk-forging-personalized-avatars-via/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-intra-speaker-variability-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-intra-speaker-variability-in/</guid>
      <description>说话人日志 | 7.0/10</description>
    </item>
    <item>
      <title>NCF-TTS: Enhancing Flow Matching Based Text-To-Speech with Neighborhood Consistency Flow</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Neuromamba: Adaptive Frequency Filtering with a Pyramid Mamba for sEEG-driven Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neuromamba-adaptive-frequency-filtering-with-a/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neuromamba-adaptive-frequency-filtering-with-a/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-no-verifiable-reward-for-prosody-toward/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-no-verifiable-reward-for-prosody-toward/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Optimizing Speech Language Models for Acoustic Consistency</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-speech-language-models-for-acoustic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-speech-language-models-for-acoustic/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>OV-INSTRUCTTTS: Towards Open-Vocabulary Instruct Text-to-Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ov-instructtts-towards-open-vocabulary-instruct/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ov-instructtts-towards-open-vocabulary-instruct/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>PFluxTTS: Hybrid Flow-Matching TTS with Robust Cross-Lingual Voice Cloning and Inference-Time Model Fusion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pfluxtts-hybrid-flow-matching-tts-with-robust/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pfluxtts-hybrid-flow-matching-tts-with-robust/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Phonological Tokenizer: Prosody-Aware Phonetic Token Via Multi-Objective Fine-Tuning with Differentiable K-Means</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phonological-tokenizer-prosody-aware-phonetic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phonological-tokenizer-prosody-aware-phonetic/</guid>
      <description>语音表示学习 | 8.0/10</description>
    </item>
    <item>
      <title>Praxy Voice: Voice-Prompt Recovery &#43; BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Principled Coarse-Grained Acceptance For Speculative Decoding In Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-principled-coarse-grained-acceptance-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-principled-coarse-grained-acceptance-for/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prosody-guided-harmonic-attention-for-phase/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prosody-guided-harmonic-attention-for-phase/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>PRSA: Preventing Malicious Speaker Recognition and Speech Synthesis Simultaneously with Adversarial Examples</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prsa-preventing-malicious-speaker-recognition-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prsa-preventing-malicious-speaker-recognition-and/</guid>
      <description>语音匿名化 | 7.0/10</description>
    </item>
    <item>
      <title>PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-psp-an-interpretable-per-dimension-accent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-psp-an-interpretable-per-dimension-accent/</guid>
      <description>基准测试 | 7.5/10</description>
    </item>
    <item>
      <title>PSTalker: Realistic 3D Talking Head Synthesis via a Semantic-Aware Audio-Driven Point-Based Shape</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pstalker-realistic-3d-talking-head-synthesis-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pstalker-realistic-3d-talking-head-synthesis-via/</guid>
      <description>说话人合成 | 7.5/10</description>
    </item>
    <item>
      <title>QFOCUS: Controllable Synthesis for Automated Speech Stress Editing to Deliver Human-Like Emphatic Intent</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qfocus-controllable-synthesis-for-automated/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qfocus-controllable-synthesis-for-automated/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quantifying-speaker-embedding-phonological-rule/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quantifying-speaker-embedding-phonological-rule/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Real-Time Streaming MEL Vocoding with Generative Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-real-time-streaming-mel-vocoding-with-generative/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-real-time-streaming-mel-vocoding-with-generative/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Residual Tokens Enhance Masked Autoencoders for Speech Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-residual-tokens-enhance-masked-autoencoders-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-residual-tokens-enhance-masked-autoencoders-for/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Retrieval-Based Speculative Decoding For Autoregressive Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-retrieval-based-speculative-decoding-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-retrieval-based-speculative-decoding-for/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>RoCo: Robust Code for Fast and Effective Proactive Defense against Voice Cloning Attack</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-roco-robust-code-for-fast-and-effective-proactive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-roco-robust-code-for-fast-and-effective-proactive/</guid>
      <description>音频安全 | 7.5/10</description>
    </item>
    <item>
      <title>RRPO: Robust Reward Policy Optimization for LLM-Based Emotional TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rrpo-robust-reward-policy-optimization-for-llm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rrpo-robust-reward-policy-optimization-for-llm/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>SFM-TTS: Lightweight and Rapid Speech Synthesis with Flexible Shortcut Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-Scale Dataset Cleansing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sidon-fast-and-robust-open-source-multilingual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sidon-fast-and-robust-open-source-multilingual/</guid>
      <description>语音增强 | 8.5/10</description>
    </item>
    <item>
      <title>SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sp-mcqa-evaluating-intelligibility-of-tts-beyond/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sp-mcqa-evaluating-intelligibility-of-tts-beyond/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spade-structured-pruning-and-adaptive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spade-structured-pruning-and-adaptive/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>SPAM: Style Prompt Adherence Metric for Prompt-Based TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spam-style-prompt-adherence-metric-for-prompt/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spam-style-prompt-adherence-metric-for-prompt/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-quality-based-localization-of-low-quality/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-quality-based-localization-of-low-quality/</guid>
      <description>语音质量评估 | 7.0/10</description>
    </item>
    <item>
      <title>STACodec: Semantic Token Assignment for Balancing Acoustic Fidelity and Semantic Information in Audio Codecs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stacodec-semantic-token-assignment-for-balancing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stacodec-semantic-token-assignment-for-balancing/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Syncspeech: Efficient and Low-Latency Text-to-Speech Based on Temporal Masked Transformer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-syncspeech-efficient-and-low-latency-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-syncspeech-efficient-and-low-latency-text-to/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synparaspeech-automated-synthesis-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synparaspeech-automated-synthesis-of/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Synthetic yet Striking? Assessing Vocal Charisma in TTS via Perceptual and Algorithmic Measures</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-yet-striking-assessing-vocal-charisma/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-yet-striking-assessing-vocal-charisma/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>T-Cache: Fast Inference For Masked Generative Transformer-Based TTS Via Prompt-Aware Feature Caching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-cache-fast-inference-for-masked-generative/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-cache-fast-inference-for-masked-generative/</guid>
      <description>语音合成 | 9.0/10</description>
    </item>
    <item>
      <title>T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Phone TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-mimi-a-transformer-based-mimi-decoder-for-real/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-mimi-a-transformer-based-mimi-decoder-for-real/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>TAGARELA - A Portuguese Speech Dataset from Podcasts</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/</guid>
      <description>语音识别 语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Training Flow Matching Models with Reliable Labels via Self-Purification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-flow-matching-models-with-reliable/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-flow-matching-models-with-reliable/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Understanding the Strengths and Weaknesses of SSL Models for Audio Deepfake Model Attribution</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-understanding-the-strengths-and-weaknesses-of-ssl/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-understanding-the-strengths-and-weaknesses-of-ssl/</guid>
      <description>音频深度伪造检测 | 7.0/10</description>
    </item>
    <item>
      <title>VividTalker: A Modular Framework for Expressive 3D Talking Avatars with Controllable Gaze and Blink</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vividtalker-a-modular-framework-for-expressive-3d/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vividtalker-a-modular-framework-for-expressive-3d/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>VoxMorph: Scalable Zero-Shot Voice Identity Morphing via Disentangled Embeddings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxmorph-scalable-zero-shot-voice-identity/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxmorph-scalable-zero-shot-voice-identity/</guid>
      <description>语音克隆 | 9.0/10</description>
    </item>
    <item>
      <title>VoXtream: Full-Stream Text-To-Speech With Extremely Low Latency</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxtream-full-stream-text-to-speech-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxtream-full-stream-text-to-speech-with/</guid>
      <description>语音合成 | 8.5/10</description>
    </item>
    <item>
      <title>Wave-Trainer-Fit: Neural Vocoder With Trainable Prior And Fixed-Point Iteration Towards High-Quality Speech Generation From SSL Features</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wave-trainer-fit-neural-vocoder-with-trainable/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wave-trainer-fit-neural-vocoder-with-trainable/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Wavenext 2: Convnext-Based Fast Neural Vocoders with Residual Denoising and Sub-Modeling for Gan And Diffusion Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavenext-2-convnext-based-fast-neural-vocoders/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavenext-2-convnext-based-fast-neural-vocoders/</guid>
      <description>语音合成 | 9.0/10</description>
    </item>
    <item>
      <title>When Voice Matters: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-voice-matters-a-controlled-study-of-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-voice-matters-a-controlled-study-of-audio/</guid>
      <description>模型评估 | 7.0/10</description>
    </item>
    <item>
      <title>ZSV2C-MLLM: Zero-Shot Visual Voice Cloning Via Multimodal Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-zsv2c-mllm-zero-shot-visual-voice-cloning-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-zsv2c-mllm-zero-shot-visual-voice-cloning-via/</guid>
      <description>语音克隆 | 6.5/10</description>
    </item>
    <item>
      <title>MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-magic-tts-fine-grained-controllable-speech/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-magic-tts-fine-grained-controllable-speech/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-talker-t2av-joint-talking-audio-video-generation/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-talker-t2av-joint-talking-audio-video-generation/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-tts-prism-a-perceptual-reasoning-and/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-tts-prism-a-perceptual-reasoning-and/</guid>
      <description>语音质量评估 | 7.5/10</description>
    </item>
    <item>
      <title>UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-unisonate-a-unified-model-for-speech-music-and/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-unisonate-a-unified-model-for-speech-music-and/</guid>
      <description>音频生成 | 8.5/10</description>
    </item>
    <item>
      <title>MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25-magic-tts-fine-grained-controllable-speech/</link>
      <pubDate>Sat, 25 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25-magic-tts-fine-grained-controllable-speech/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-25</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25/</link>
      <pubDate>Sat, 25 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25/</guid>
      <description>共分析 2 篇语音/AI 论文</description>
    </item>
    <item>
      <title>ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-magic-tts-fine-grained-controllable-speech/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-magic-tts-fine-grained-controllable-speech/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-preferences-of-a-voice-first-nation-large-scale/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-preferences-of-a-voice-first-nation-large-scale/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Qwen3.5-Omni Technical Report</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-qwen35-omni-technical-report/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-qwen35-omni-technical-report/</guid>
      <description>这篇论文介绍了Qwen3.5-Omni，一个支持文本、图像、音频和音频-视频输入的全模态大语言模型。为解决现有模型在实时交互、跨模态推理和工具使用上的不足，其核心方法是采用“Thinker-Talker”架构，并引入混合专家（MoE）设计以提升效率。与前代相比，主要创新在于：1）模型规模扩展至数千亿</description>
    </item>
    <item>
      <title>SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-speechparaling-bench-a-comprehensive-benchmark/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-speechparaling-bench-a-comprehensive-benchmark/</guid>
      <description>1.  **问题**：现有大型音频语言模型在副语言（如情绪、语气、音色）生成与理解能力上的评估存在特征覆盖不全、评估方法主观且不可扩展的问题。 2.  **方法**：提出了SpeechParaling-Bench，一个包含1000余个中英平行语音查询、覆盖超过100个细粒度副语言特征的综合基准。基准</description>
    </item>
    <item>
      <title>Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-text-to-speech-with-chain-of-details-modeling/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-text-to-speech-with-chain-of-details-modeling/</guid>
      <description>1. **问题**：现有基于离散token的TTS模型，其“粗到细”的生成范式主要体现在从语义token到声学token的转换，而对语音固有的时间动态（temporal dynamics）缺乏显式建模。 2. **方法核心**：提出Chain-of-Details (CoD)框架，将语音生成分解为多</description>
    </item>
    <item>
      <title>ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-atrie-adaptive-tuning-for-robust-inference-and/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-atrie-adaptive-tuning-for-robust-inference-and/</guid>
      <description>本文针对现有语音合成系统在生成角色驱动、情感丰富的语音时难以同时保持角色身份一致性和情感表达准确性的问题，提出了ATRIE框架。其核心是**Persona-Prosody Dual-Track (P2-DT) 架构**，将语音生成解耦为静态的**音色轨道**（通过标量量化保持身份锚点）和动态的**韵</description>
    </item>
    <item>
      <title>NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-nvbench-a-benchmark-for-speech-synthesis-with-non/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-nvbench-a-benchmark-for-speech-synthesis-with-non/</guid>
      <description>这篇论文旨在解决语音合成（TTS）领域中一个关键但被忽视的问题：如何标准化评估系统生成非语言声音（NVV，如笑声、叹息）的能力。作者提出了**NVBench**，一个包含**45类NVV统一分类体系**的双语（英/中）基准。其核心方法包括：1）构建了一个每类50例、总计4500例的高质量平衡评估数据</description>
    </item>
    <item>
      <title>Qwen3.5-Omni Technical Report</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-qwen35-omni-technical-report/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-qwen35-omni-technical-report/</guid>
      <description>这篇技术报告全面介绍了Qwen3.5-Omni，一个能够统一理解与生成文本、图像、音频和音视频内容的全模态大语言模型。**要解决的问题**是现有模型在实时交互、跨模态推理和自主智能体行为方面的局限性。**采用的方法**是基于“思考者-说话者”架构，引入了多项关键创新：1）思考者和说话者均采用混合注意</description>
    </item>
    <item>
      <title>Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-text-to-speech-with-chain-of-details-modeling/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-text-to-speech-with-chain-of-details-modeling/</guid>
      <description>本文针对文本转语音（TTS）任务，提出了一种名为“细节链”（Chain-of-Details, CoD）的新框架。**要解决的问题**是现有TTS方法在建模语音生成的时域动态（从粗略时序到精细声学细节的渐进过程）方面存在不足。**使用的方法**是将语音生成分解为多个时间分辨率递增的阶段，在每个阶段使</description>
    </item>
    <item>
      <title>MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mint-bench-a-comprehensive-multilingual-benchmark/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mint-bench-a-comprehensive-multilingual-benchmark/</guid>
      <description>这篇论文旨在解决指令跟随文本转语音（TTS）领域缺乏系统化评估工具的问题。当前评估存在覆盖不全、诊断粒度粗、多语言支持弱等缺陷。为此，作者提出了**MINT-Bench**，一个全面的多语言基准测试。其核心方法包括：1）一个基于10种原子声学属性的**分层多轴分类法**，系统性地组织了从简单到复杂（</description>
    </item>
    <item>
      <title>AST: Adaptive, Seamless, and Training-Free Precise Speech Editing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ast-adaptive-seamless-and-training-free-precise/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ast-adaptive-seamless-and-training-free-precise/</guid>
      <description>本文针对现有语音编辑方法依赖任务特定训练、未编辑区域时间一致性差的问题，提出了AST（Adaptive, Seamless, and Training-free），一种基于预训练AM-FM（自回归-流匹配）范式TTS模型的精确语音编辑框架。AST首先通过逆Euler ODE求解器将原始语音反演至潜空</description>
    </item>
    <item>
      <title>Hierarchical Codec Diffusion for Video-to-Speech Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-hierarchical-codec-diffusion-for-video-to-speech/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-hierarchical-codec-diffusion-for-video-to-speech/</guid>
      <description>本论文针对 Video-to-Speech（VTS）生成中视觉-语音模态信息不对称的问题，提出现有方法忽略了语音从粗粒度语义到细粒度韵律的层次结构，导致视觉条件无法与语音表示精准对齐。为此，作者提出 HiCoDiT（Hierarchical Codec Diffusion Transformer），</description>
    </item>
    <item>
      <title>PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ps-tts-phonetic-synchronization-in-text-to-speech/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ps-tts-phonetic-synchronization-in-text-to-speech/</guid>
      <description>这篇论文旨在解决自动配音（AD）中目标语音与源语音在时长和唇形上的同步难题。其核心贡献是提出了一套两阶段的文本改写方法，并集成到TTS系统中：首先通过语言模型进行**等时性**改写，确保目标语音时长匹配源语音；其次引入**音素同步（PS）**，使用动态时间规整（DTW）和从训练数据中学习的元音距离，</description>
    </item>
    <item>
      <title>An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-an-ultra-low-latency-end-to-end-streaming-speech/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-an-ultra-low-latency-end-to-end-streaming-speech/</guid>
      <description>这篇论文旨在解决实时交互式语音合成中**推理延迟高**与**声学质量（尤其是高频细节）易受损**的核心矛盾。传统流水线依赖计算密集的神经声码器进行波形重建，且基于连续回归的声学模型易导致频谱过平滑。为</description>
    </item>
  </channel>
</rss>
