<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>流匹配 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E6%B5%81%E5%8C%B9%E9%85%8D/</link>
    <description>Recent content in 流匹配 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E6%B5%81%E5%8C%B9%E9%85%8D/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Adaptive Deterministic Flow Matching for Target Speaker Extraction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-deterministic-flow-matching-for-target/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-deterministic-flow-matching-for-target/</guid>
      <description>目标说话人提取 | 8.0/10</description>
    </item>
    <item>
      <title>AnyAccomp: Generalizable Accompaniment Generation Via Quantized Melodic Bottleneck</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-anyaccomp-generalizable-accompaniment-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-anyaccomp-generalizable-accompaniment-generation/</guid>
      <description>音乐生成 | 8.0/10</description>
    </item>
    <item>
      <title>ARCHI-TTS: A Flow-Matching-Based Text-to-Speech Model with Self-Supervised Semantic Aligner and Accelerated Inference</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Asynchrony-Aware Decoupled Multimodal Control for Cued Speech Video Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>CosyAccent: Duration-Controllable Accent Normalization using Source-Synthesis Training Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cosyaccent-duration-controllable-accent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cosyaccent-duration-controllable-accent/</guid>
      <description>语音转换 | 7.8/10</description>
    </item>
    <item>
      <title>Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-f5-tts-towards-language-agnostic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-f5-tts-towards-language-agnostic/</guid>
      <description>语音克隆 | 7.5/10</description>
    </item>
    <item>
      <title>DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-daien-tts-disentangled-audio-infilling-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-daien-tts-disentangled-audio-infilling-for/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-dubbing-end-to-end-auto-audiobook-system/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-dubbing-end-to-end-auto-audiobook-system/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Diverse and Few-Step Audio Captioning via Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diverse-and-few-step-audio-captioning-via-flow/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diverse-and-few-step-audio-captioning-via-flow/</guid>
      <description>音频字幕生成 | 6.5/10</description>
    </item>
    <item>
      <title>EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emoshift-lightweight-activation-steering-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emoshift-lightweight-activation-steering-for/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Emotional Dimension Control in Language Model-Based Text-To-Speech: Spanning a Broad Spectrum of Human Emotions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Erasing Your Voice Before it’s Heard: Training-Free Speaker Unlearning for Zero-Shot Text-to-Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-erasing-your-voice-before-its-heard-training-free/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-erasing-your-voice-before-its-heard-training-free/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>FlashFoley: Fast Interactive Sketch2audio Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flashfoley-fast-interactive-sketch2audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flashfoley-fast-interactive-sketch2audio/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flowse-grpo-training-flow-matching-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flowse-grpo-training-flow-matching-speech/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Gelina: Unified Speech and Gesture Synthesis Via Interleaved Token Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gelina-unified-speech-and-gesture-synthesis-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gelina-unified-speech-and-gesture-synthesis-via/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Gen-SER: When the Generative Model Meets Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>Hierarchical Discrete Flow Matching For Multi-Codebook Codec-Based Text-To-Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-discrete-flow-matching-for-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-discrete-flow-matching-for-multi/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>HyFlowSE: Hybrid End-To-End Flow-Matching Speech Enhancement via Generative-Discriminative Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hyflowse-hybrid-end-to-end-flow-matching-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hyflowse-hybrid-end-to-end-flow-matching-speech/</guid>
      <description>语音增强 | 8.0/10</description>
    </item>
    <item>
      <title>Instrument Generation Through Distributional Flow Matching and Test-Time Search</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instrument-generation-through-distributional-flow/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instrument-generation-through-distributional-flow/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>Int-MeanFlow: Few-Step Speech Generation with Integral Velocity Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-int-meanflow-few-step-speech-generation-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-int-meanflow-few-step-speech-generation-with/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Marco-Voice: A Unified Framework for Expressive Speech Synthesis with Voice Cloning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-marco-voice-a-unified-framework-for-expressive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-marco-voice-a-unified-framework-for-expressive/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Meanflow-Accelerated Multimodal Video-to-Audio Synthesis Via One-Step Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflow-accelerated-multimodal-video-to-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflow-accelerated-multimodal-video-to-audio/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>MeanFlowSE: One-Step Generative Speech Enhancement via Conditional Mean Flow</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflowse-one-step-generative-speech-enhancement/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflowse-one-step-generative-speech-enhancement/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>MeanSE: Efficient Generative Speech Enhancement with Mean Flows</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanse-efficient-generative-speech-enhancement/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanse-efficient-generative-speech-enhancement/</guid>
      <description>语音增强 | 6.5/10</description>
    </item>
    <item>
      <title>MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvc-lightweight-and-streaming-zero-shot-voice/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvc-lightweight-and-streaming-zero-shot-voice/</guid>
      <description>语音转换 | 7.5/10</description>
    </item>
    <item>
      <title>MeanVoiceFlow: One-Step Nonparallel Voice Conversion with Mean Flows</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvoiceflow-one-step-nonparallel-voice/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvoiceflow-one-step-nonparallel-voice/</guid>
      <description>语音转换 | 7.0/10</description>
    </item>
    <item>
      <title>Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-intra-speaker-variability-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-intra-speaker-variability-in/</guid>
      <description>说话人日志 | 7.0/10</description>
    </item>
    <item>
      <title>MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmaudiosep-taming-video-to-audio-generative-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmaudiosep-taming-video-to-audio-generative-model/</guid>
      <description>语音分离 | 8.0/10</description>
    </item>
    <item>
      <title>MR-FlowDPO: Multi-Reward Direct Preference Optimization for Flow-Matching Text-to-Music Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-room-impulse-response-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-room-impulse-response-generation/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mutual-forcing-dual-mode-self-evolution-for-fast/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mutual-forcing-dual-mode-self-evolution-for-fast/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>NCF-TTS: Enhancing Flow Matching Based Text-To-Speech with Neighborhood Consistency Flow</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>PFluxTTS: Hybrid Flow-Matching TTS with Robust Cross-Lingual Voice Cloning and Inference-Time Model Fusion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pfluxtts-hybrid-flow-matching-tts-with-robust/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pfluxtts-hybrid-flow-matching-tts-with-robust/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-poly-svc-polyphony-aware-singing-voice-conversion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-poly-svc-polyphony-aware-singing-voice-conversion/</guid>
      <description>歌唱语音转换 | 6.5/10</description>
    </item>
    <item>
      <title>QE-XVC: Zero-Shot Cross-Lingual Voice Conversion via Query-Enhancement and Conditional Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qe-xvc-zero-shot-cross-lingual-voice-conversion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qe-xvc-zero-shot-cross-lingual-voice-conversion/</guid>
      <description>语音转换 | 7.5/10</description>
    </item>
    <item>
      <title>RAP: Real-Time Audio-Driven Portrait Animation with Video Diffusion Transformer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rap-real-time-audio-driven-portrait-animation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rap-real-time-audio-driven-portrait-animation/</guid>
      <description>音视频 | 7.0/10</description>
    </item>
    <item>
      <title>Real-Time Streaming MEL Vocoding with Generative Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-real-time-streaming-mel-vocoding-with-generative/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-real-time-streaming-mel-vocoding-with-generative/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Refgen: Reference-Guided Synthetic Data Generation for Anomalous Sound Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-refgen-reference-guided-synthetic-data-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-refgen-reference-guided-synthetic-data-generation/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>RFM-Editing: Rectified Flow Matching for Text-Guided Audio Editing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rfm-editing-rectified-flow-matching-for-text/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rfm-editing-rectified-flow-matching-for-text/</guid>
      <description>音频编辑 | 7.5/10</description>
    </item>
    <item>
      <title>S2Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s2voice-style-aware-autoregressive-modeling-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s2voice-style-aware-autoregressive-modeling-with/</guid>
      <description>歌唱语音转换 | 7.0/10</description>
    </item>
    <item>
      <title>SAGA-SR: Semantically and Acoustically Guided Audio Super-Resolution</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saga-sr-semantically-and-acoustically-guided/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saga-sr-semantically-and-acoustically-guided/</guid>
      <description>音频增强 | 7.5/10</description>
    </item>
    <item>
      <title>Scalable Evaluation for Audio Identification Via Synthetic Latent Fingerprint Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scalable-evaluation-for-audio-identification-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scalable-evaluation-for-audio-identification-via/</guid>
      <description>音频检索 | 7.0/10</description>
    </item>
    <item>
      <title>SFM-TTS: Lightweight and Rapid Speech Synthesis with Flexible Shortcut Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Shortcut Flow Matching for Speech Enhancement: Step-Invariant Flows via Single Stage Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shortcut-flow-matching-for-speech-enhancement/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shortcut-flow-matching-for-speech-enhancement/</guid>
      <description>语音增强 | 7.0/10</description>
    </item>
    <item>
      <title>Single-Step Controllable Music Bandwidth extension with Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-single-step-controllable-music-bandwidth/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-single-step-controllable-music-bandwidth/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>Stemphonic: All-At-Once Flexible Multi-Stem Music Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stemphonic-all-at-once-flexible-multi-stem-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stemphonic-all-at-once-flexible-multi-stem-music/</guid>
      <description>音乐生成 | 7.7/10</description>
    </item>
    <item>
      <title>StylePitcher: Generating Style-Following and Expressive Pitch Curves for Versatile Singing Tasks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylepitcher-generating-style-following-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylepitcher-generating-style-following-and/</guid>
      <description>歌唱语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Symphony Rendering: Midi and Composer-Conditioned Auto Orchestration with Flow-Matching Transformers</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphony-rendering-midi-and-composer-conditioned/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphony-rendering-midi-and-composer-conditioned/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Towards Real-Time Generative Speech Restoration with Flow-Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-real-time-generative-speech-restoration/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-real-time-generative-speech-restoration/</guid>
      <description>语音增强 | 6.0/10</description>
    </item>
    <item>
      <title>Training Flow Matching Models with Reliable Labels via Self-Purification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-flow-matching-models-with-reliable/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-flow-matching-models-with-reliable/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Universr: Unified and Versatile Audio Super-Resolution Via Vocoder-Free Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-universr-unified-and-versatile-audio-super/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-universr-unified-and-versatile-audio-super/</guid>
      <description>音频超分辨率 | 8.0/10</description>
    </item>
    <item>
      <title>V2A-DPO: Omni-Preference Optimization for Video-To-Audio Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-v2a-dpo-omni-preference-optimization-for-video-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-v2a-dpo-omni-preference-optimization-for-video-to/</guid>
      <description>视频到音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>VoxMorph: Scalable Zero-Shot Voice Identity Morphing via Disentangled Embeddings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxmorph-scalable-zero-shot-voice-identity/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxmorph-scalable-zero-shot-voice-identity/</guid>
      <description>语音克隆 | 9.0/10</description>
    </item>
    <item>
      <title>MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-magic-tts-fine-grained-controllable-speech/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-magic-tts-fine-grained-controllable-speech/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Speech Enhancement Based on Drifting Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-speech-enhancement-based-on-drifting-models/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-speech-enhancement-based-on-drifting-models/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-talker-t2av-joint-talking-audio-video-generation/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-talker-t2av-joint-talking-audio-video-generation/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-unisonate-a-unified-model-for-speech-music-and/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-unisonate-a-unified-model-for-speech-music-and/</guid>
      <description>音频生成 | 8.5/10</description>
    </item>
    <item>
      <title>MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25-magic-tts-fine-grained-controllable-speech/</link>
      <pubDate>Sat, 25 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25-magic-tts-fine-grained-controllable-speech/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-25</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25/</link>
      <pubDate>Sat, 25 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25/</guid>
      <description>共分析 2 篇语音/AI 论文</description>
    </item>
    <item>
      <title>ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-magic-tts-fine-grained-controllable-speech/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-magic-tts-fine-grained-controllable-speech/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>X-VC: Zero-shot Streaming Voice Conversion in Codec Space</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-x-vc-zero-shot-streaming-voice-conversion-in/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-x-vc-zero-shot-streaming-voice-conversion-in/</guid>
      <description>1. **问题**：零样本语音转换需要同时实现高质量的说话人特征迁移和低延迟的流式推理，这是一个尚未很好解决的挑战。 2. **方法核心**：提出X-VC系统，在预训练的SAC语音编解码器的潜在空间中进行一步转换。核心是一个双条件声学转换器，它联合处理源语音的编解码器潜在表示和目标参考语音的帧级梅尔</description>
    </item>
    <item>
      <title>ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-atrie-adaptive-tuning-for-robust-inference-and/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-atrie-adaptive-tuning-for-robust-inference-and/</guid>
      <description>本文针对现有语音合成系统在生成角色驱动、情感丰富的语音时难以同时保持角色身份一致性和情感表达准确性的问题，提出了ATRIE框架。其核心是**Persona-Prosody Dual-Track (P2-DT) 架构**，将语音生成解耦为静态的**音色轨道**（通过标量量化保持身份锚点）和动态的**韵</description>
    </item>
    <item>
      <title>Anonymization, Not Elimination: Utility-Preserved Speech Anonymization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-anonymization-not-elimination-utility-preserved/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-anonymization-not-elimination-utility-preserved/</guid>
      <description>这篇论文针对语音数据隐私保护中“隐私泄露”与“数据效用损失”的核心矛盾，提出了一个新颖的两阶段框架。首先，为解决语音匿名化（保护“谁在说”）中身份多样性不足和可控性差的问题，提出了基于流匹配的说话人嵌入匿名器（F3-VA），它能生成多样且与原始说话人充分分离的新身份。其次，为解决内容匿名化（保护“说</description>
    </item>
    <item>
      <title>AST: Adaptive, Seamless, and Training-Free Precise Speech Editing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ast-adaptive-seamless-and-training-free-precise/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ast-adaptive-seamless-and-training-free-precise/</guid>
      <description>本文针对现有语音编辑方法依赖任务特定训练、未编辑区域时间一致性差的问题，提出了AST（Adaptive, Seamless, and Training-free），一种基于预训练AM-FM（自回归-流匹配）范式TTS模型的精确语音编辑框架。AST首先通过逆Euler ODE求解器将原始语音反演至潜空</description>
    </item>
    <item>
      <title>CoSyncDiT: Cognitive Synchronous Diffusion Transformer for Movie Dubbing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-cosyncdit-cognitive-synchronous-diffusion/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-cosyncdit-cognitive-synchronous-diffusion/</guid>
      <description>本文针对电影配音（视觉语音克隆）中音色保真度与唇形同步难以兼得的痛点，提出了一种基于流匹配的认知同步扩散Transformer（CoSyncDiT）框架。该方法受专业配音员认知过程启发，将噪声到语音的</description>
    </item>
  </channel>
</rss>
