<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>生成模型 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E7%94%9F%E6%88%90%E6%A8%A1%E5%9E%8B/</link>
    <description>Recent content in 生成模型 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E7%94%9F%E6%88%90%E6%A8%A1%E5%9E%8B/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Generative-First Neural Audio Autoencoder</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generative-first-neural-audio-autoencoder/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generative-first-neural-audio-autoencoder/</guid>
      <description>音乐生成 | 8.5/10</description>
    </item>
    <item>
      <title>Adaptive Deterministic Flow Matching for Target Speaker Extraction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-deterministic-flow-matching-for-target/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-deterministic-flow-matching-for-target/</guid>
      <description>目标说话人提取 | 8.0/10</description>
    </item>
    <item>
      <title>Bleed No More: Generative Interference Reduction for Musical Recordings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bleed-no-more-generative-interference-reduction/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bleed-no-more-generative-interference-reduction/</guid>
      <description>音乐源分离 | 7.0/10</description>
    </item>
    <item>
      <title>Combining Multi-Order Attention and Multi-Resolution Discriminator for High-Fidelity Neural Vocoder</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-multi-order-attention-and-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-multi-order-attention-and-multi/</guid>
      <description>语音合成 | 6.5/10</description>
    </item>
    <item>
      <title>Confidence-Based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-confidence-based-filtering-for-speech-dataset/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-confidence-based-filtering-for-speech-dataset/</guid>
      <description>语音增强 | 6.5/10</description>
    </item>
    <item>
      <title>Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cutscene-agent-an-llm-agent-framework-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cutscene-agent-an-llm-agent-framework-for/</guid>
      <description>生成模型 | 8.5/10</description>
    </item>
    <item>
      <title>ECSA: Dual-Branch Emotion Compensation for Emotion-Consistent Speaker Anonymization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ecsa-dual-branch-emotion-compensation-for-emotion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ecsa-dual-branch-emotion-compensation-for-emotion/</guid>
      <description>语音匿名化 | 8.5/10</description>
    </item>
    <item>
      <title>EmoTri-RL: Emotion- and Cause-Aware Reinforcement Learning for Multi-Modal Empathetic Dialogue</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotri-rl-emotion-and-cause-aware-reinforcement/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotri-rl-emotion-and-cause-aware-reinforcement/</guid>
      <description>语音情感识别 | 7.0/10</description>
    </item>
    <item>
      <title>Enhanced Generative Machine Listener</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhanced-generative-machine-listener/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhanced-generative-machine-listener/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>Etude: Piano Cover Generation with a Three-Stage Approach — Extract, Structuralize, and Decode</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-etude-piano-cover-generation-with-a-three-stage/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-etude-piano-cover-generation-with-a-three-stage/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>Gen-SER: When the Generative Model Meets Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>Hanui: Harnessing Distributional Discrepancies for Singing Voice Deepfake Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hanui-harnessing-distributional-discrepancies-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hanui-harnessing-distributional-discrepancies-for/</guid>
      <description>音频深度伪造检测 | 8.0/10</description>
    </item>
    <item>
      <title>HCGAN: Harmonic-Coupled Generative Adversarial Network for Speech Super-Resolution in Low-Bandwidth Scenarios</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hcgan-harmonic-coupled-generative-adversarial/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hcgan-harmonic-coupled-generative-adversarial/</guid>
      <description>语音增强 | 8.0/10</description>
    </item>
    <item>
      <title>Hierarchical Tokenization of Multimodal Music Data for Generative Music Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-tokenization-of-multimodal-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-tokenization-of-multimodal-music/</guid>
      <description>音乐检索 | 7.0/10</description>
    </item>
    <item>
      <title>Huí Sù: Co-constructing a Dual Feedback Apparatus</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hu-s-co-constructing-a-dual-feedback-apparatus/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hu-s-co-constructing-a-dual-feedback-apparatus/</guid>
      <description>音乐生成 | 5.5/10</description>
    </item>
    <item>
      <title>LLAC: Learned Lossless Audio Codec</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llac-learned-lossless-audio-codec/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llac-learned-lossless-audio-codec/</guid>
      <description>音频无损编码 | 7.5/10</description>
    </item>
    <item>
      <title>MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mage-a-coarse-to-fine-speech-enhancer-with-masked/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mage-a-coarse-to-fine-speech-enhancer-with-masked/</guid>
      <description>语音增强 | 8.0/10</description>
    </item>
    <item>
      <title>MeanFlowSE: One-Step Generative Speech Enhancement via Conditional Mean Flow</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflowse-one-step-generative-speech-enhancement/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflowse-one-step-generative-speech-enhancement/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>MeanSE: Efficient Generative Speech Enhancement with Mean Flows</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanse-efficient-generative-speech-enhancement/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanse-efficient-generative-speech-enhancement/</guid>
      <description>语音增强 | 6.5/10</description>
    </item>
    <item>
      <title>MECap-R1: Emotion-Aware Policy with Reinforcement Learning for Multimodal Emotion Captioning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mecap-r1-emotion-aware-policy-with-reinforcement/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mecap-r1-emotion-aware-policy-with-reinforcement/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>Noise-to-Notes: Diffusion-Based Generation and Refinement for Automatic Drum Transcription</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-to-notes-diffusion-based-generation-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-to-notes-diffusion-based-generation-and/</guid>
      <description>音乐信息检索 | 8.0/10</description>
    </item>
    <item>
      <title>ParaGSE: Parallel Generative Speech Enhancement with Group-Vector-Quantization-Based Neural Speech Codec</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-paragse-parallel-generative-speech-enhancement/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-paragse-parallel-generative-speech-enhancement/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>PG-SE: Predictive Acceleration and Correction for Generative Speech Enhancement</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pg-se-predictive-acceleration-and-correction-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pg-se-predictive-acceleration-and-correction-for/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prosody-guided-harmonic-attention-for-phase/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prosody-guided-harmonic-attention-for-phase/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>PSTalker: Realistic 3D Talking Head Synthesis via a Semantic-Aware Audio-Driven Point-Based Shape</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pstalker-realistic-3d-talking-head-synthesis-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pstalker-realistic-3d-talking-head-synthesis-via/</guid>
      <description>说话人合成 | 7.5/10</description>
    </item>
    <item>
      <title>ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recom-realistic-co-speech-motion-generation-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recom-realistic-co-speech-motion-generation-with/</guid>
      <description>音频生成 | 7.0/10</description>
    </item>
    <item>
      <title>SAGA-SR: Semantically and Acoustically Guided Audio Super-Resolution</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saga-sr-semantically-and-acoustically-guided/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saga-sr-semantically-and-acoustically-guided/</guid>
      <description>音频增强 | 7.5/10</description>
    </item>
    <item>
      <title>Timbre-Based Pretraining with Pseudo-Labels for Multi-Instrument Automatic Music Transcription</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-based-pretraining-with-pseudo-labels-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-based-pretraining-with-pseudo-labels-for/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>Tldiffgan: A Latent Diffusion-Gan Framework with Temporal Information Fusion for Anomalous Sound Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tldiffgan-a-latent-diffusion-gan-framework-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tldiffgan-a-latent-diffusion-gan-framework-with/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>Two-Stage Language Model Framework for Acoustic Echo Cancellation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-two-stage-language-model-framework-for-acoustic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-two-stage-language-model-framework-for-acoustic/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uncertainty-aware-3d-emotional-talking-face/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uncertainty-aware-3d-emotional-talking-face/</guid>
      <description>音视频 | 8.0/10</description>
    </item>
    <item>
      <title>Wave-Trainer-Fit: Neural Vocoder With Trainable Prior And Fixed-Point Iteration Towards High-Quality Speech Generation From SSL Features</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wave-trainer-fit-neural-vocoder-with-trainable/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wave-trainer-fit-neural-vocoder-with-trainable/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-text-to-speech-with-chain-of-details-modeling/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-text-to-speech-with-chain-of-details-modeling/</guid>
      <description>本文针对文本转语音（TTS）任务，提出了一种名为“细节链”（Chain-of-Details, CoD）的新框架。**要解决的问题**是现有TTS方法在建模语音生成的时域动态（从粗略时序到精细声学细节的渐进过程）方面存在不足。**使用的方法**是将语音生成分解为多个时间分辨率递增的阶段，在每个阶段使</description>
    </item>
    <item>
      <title>Latent Fourier Transform</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-latent-fourier-transform/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-latent-fourier-transform/</guid>
      <description>这篇论文旨在解决现有音乐生成模型难以对**任意时间尺度**上的音乐模式进行精确控制的问题。作者提出了**潜在傅里叶变换（LatentFT）** 框架，其核心是将离散傅里叶变换应用于由扩散自编码器编码得到的**潜在向量序列**，从而得到“潜在频谱”。通过在训练过程中对潜在频谱进行随机频率掩码，迫使解码</description>
    </item>
    <item>
      <title>Elucidating the SNR-t Bias of Diffusion Probabilistic Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-elucidating-the-snr-t-bias-of-diffusion/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-elucidating-the-snr-t-bias-of-diffusion/</guid>
      <description>这篇论文的核心贡献是识别并系统分析了扩散概率模型（DPMs）中一个基础性问题——信噪比-时间步（SNR-t）偏差。该偏差指推理时去噪样本的实际SNR与其所分配时间步t所理论对应的SNR不匹配，这种错位源于训练时的严格耦合在推理时被累积误差打破。作者通过详实的实验（滑动窗口测试、前向与反向过程对比）揭</description>
    </item>
    <item>
      <title>ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-claricodec-optimising-neural-speech-codes-for/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-claricodec-optimising-neural-speech-codes-for/</guid>
      <description>这篇论文旨在解决卫星、水下等极端带宽受限场景下（如200bps）语音通信清晰度严重下降的问题。传统编解码器以波形重建为目标，在超低比特率下会将宝贵的比特分配给不必要的声学细节，而非核心语义信息。为此，</description>
    </item>
    <item>
      <title>Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-dual-axis-generative-reward-model-toward-semantic/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-dual-axis-generative-reward-model-toward-semantic/</guid>
      <description>本文旨在解决全双工语音对话模型（SDMs）实现类人交互的核心挑战。现有自动化评估指标流于表面（如统计行为或预测时机准确率），无法为强化学习提供可靠的奖励信号，而人工评估成本高昂且难以扩展。为此，作者提</description>
    </item>
    <item>
      <title>UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-unipase-a-generative-model-for-universal-speech/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-unipase-a-generative-model-for-universal-speech/</guid>
      <description>这篇论文旨在解决通用语音增强（USE）中生成模型面临的“高感知质量”与“低内容幻觉”难以兼得的核心矛盾。作者提出了UniPASE框架，它扩展了其先前的低幻觉PASE模型，以处理包括噪声、混响、丢包、风</description>
    </item>
  </channel>
</rss>
