流匹配 on 语音/音频论文速递

流匹配 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E6%B5%81%E5%8C%B9%E9%85%8D/ Recent content in 流匹配 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 Adaptive Deterministic Flow Matching for Target Speaker Extraction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-deterministic-flow-matching-for-target/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-deterministic-flow-matching-for-target/ 目标说话人提取 | 8.0/10 AnyAccomp: Generalizable Accompaniment Generation Via Quantized Melodic Bottleneck https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-anyaccomp-generalizable-accompaniment-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-anyaccomp-generalizable-accompaniment-generation/ 音乐生成 | 8.0/10 ARCHI-TTS: A Flow-Matching-Based Text-to-Speech Model with Self-Supervised Semantic Aligner and Accelerated Inference https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/ 语音合成 | 8.0/10 Asynchrony-Aware Decoupled Multimodal Control for Cued Speech Video Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/ 语音合成 | 7.5/10 Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/ 语音合成 | 7.5/10 CosyAccent: Duration-Controllable Accent Normalization using Source-Synthesis Training Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cosyaccent-duration-controllable-accent/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cosyaccent-duration-controllable-accent/ 语音转换 | 7.8/10 Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-f5-tts-towards-language-agnostic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-f5-tts-towards-language-agnostic/ 语音克隆 | 7.5/10 DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-daien-tts-disentangled-audio-infilling-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-daien-tts-disentangled-audio-infilling-for/ 语音合成 | 8.0/10 Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-dubbing-end-to-end-auto-audiobook-system/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-dubbing-end-to-end-auto-audiobook-system/ 语音合成 | 7.5/10 Diverse and Few-Step Audio Captioning via Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diverse-and-few-step-audio-captioning-via-flow/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diverse-and-few-step-audio-captioning-via-flow/ 音频字幕生成 | 6.5/10 EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emoshift-lightweight-activation-steering-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emoshift-lightweight-activation-steering-for/ 语音合成 | 7.0/10 Emotional Dimension Control in Language Model-Based Text-To-Speech: Spanning a Broad Spectrum of Human Emotions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/ 语音合成 | 7.5/10 Erasing Your Voice Before it’s Heard: Training-Free Speaker Unlearning for Zero-Shot Text-to-Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-erasing-your-voice-before-its-heard-training-free/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-erasing-your-voice-before-its-heard-training-free/ 语音合成 | 7.5/10 FlashFoley: Fast Interactive Sketch2audio Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flashfoley-fast-interactive-sketch2audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flashfoley-fast-interactive-sketch2audio/ 音频生成 | 7.5/10 FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flowse-grpo-training-flow-matching-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flowse-grpo-training-flow-matching-speech/ 语音增强 | 7.5/10 Gelina: Unified Speech and Gesture Synthesis Via Interleaved Token Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gelina-unified-speech-and-gesture-synthesis-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gelina-unified-speech-and-gesture-synthesis-via/ 语音合成 | 7.0/10 Gen-SER: When the Generative Model Meets Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/ 语音情感识别 | 6.5/10 Hierarchical Discrete Flow Matching For Multi-Codebook Codec-Based Text-To-Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-discrete-flow-matching-for-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-discrete-flow-matching-for-multi/ 语音合成 | 7.5/10 HyFlowSE: Hybrid End-To-End Flow-Matching Speech Enhancement via Generative-Discriminative Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hyflowse-hybrid-end-to-end-flow-matching-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hyflowse-hybrid-end-to-end-flow-matching-speech/ 语音增强 | 8.0/10 Instrument Generation Through Distributional Flow Matching and Test-Time Search https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instrument-generation-through-distributional-flow/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instrument-generation-through-distributional-flow/ 音乐生成 | 7.0/10 Int-MeanFlow: Few-Step Speech Generation with Integral Velocity Distillation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-int-meanflow-few-step-speech-generation-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-int-meanflow-few-step-speech-generation-with/ 语音合成 | 7.5/10 LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/ 语音合成 | 7.0/10 Marco-Voice: A Unified Framework for Expressive Speech Synthesis with Voice Cloning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-marco-voice-a-unified-framework-for-expressive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-marco-voice-a-unified-framework-for-expressive/ 语音合成 | 8.0/10 Meanflow-Accelerated Multimodal Video-to-Audio Synthesis Via One-Step Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflow-accelerated-multimodal-video-to-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflow-accelerated-multimodal-video-to-audio/ 音频生成 | 7.5/10 MeanFlowSE: One-Step Generative Speech Enhancement via Conditional Mean Flow https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflowse-one-step-generative-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflowse-one-step-generative-speech-enhancement/ 语音增强 | 7.5/10 MeanSE: Efficient Generative Speech Enhancement with Mean Flows https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanse-efficient-generative-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanse-efficient-generative-speech-enhancement/ 语音增强 | 6.5/10 MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvc-lightweight-and-streaming-zero-shot-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvc-lightweight-and-streaming-zero-shot-voice/ 语音转换 | 7.5/10 MeanVoiceFlow: One-Step Nonparallel Voice Conversion with Mean Flows https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvoiceflow-one-step-nonparallel-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvoiceflow-one-step-nonparallel-voice/ 语音转换 | 7.0/10 Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-intra-speaker-variability-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-intra-speaker-variability-in/ 说话人日志 | 7.0/10 MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmaudiosep-taming-video-to-audio-generative-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmaudiosep-taming-video-to-audio-generative-model/ 语音分离 | 8.0/10 MR-FlowDPO: Multi-Reward Direct Preference Optimization for Flow-Matching Text-to-Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/ 音乐生成 | 7.5/10 Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-room-impulse-response-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-room-impulse-response-generation/ 音频生成 | 7.5/10 Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mutual-forcing-dual-mode-self-evolution-for-fast/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mutual-forcing-dual-mode-self-evolution-for-fast/ 音频生成 | 7.5/10 NCF-TTS: Enhancing Flow Matching Based Text-To-Speech with Neighborhood Consistency Flow https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/ 语音合成 | 8.0/10 PFluxTTS: Hybrid Flow-Matching TTS with Robust Cross-Lingual Voice Cloning and Inference-Time Model Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pfluxtts-hybrid-flow-matching-tts-with-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pfluxtts-hybrid-flow-matching-tts-with-robust/ 语音合成 | 7.0/10 Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-poly-svc-polyphony-aware-singing-voice-conversion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-poly-svc-polyphony-aware-singing-voice-conversion/ 歌唱语音转换 | 6.5/10 QE-XVC: Zero-Shot Cross-Lingual Voice Conversion via Query-Enhancement and Conditional Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qe-xvc-zero-shot-cross-lingual-voice-conversion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qe-xvc-zero-shot-cross-lingual-voice-conversion/ 语音转换 | 7.5/10 RAP: Real-Time Audio-Driven Portrait Animation with Video Diffusion Transformer https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rap-real-time-audio-driven-portrait-animation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rap-real-time-audio-driven-portrait-animation/ 音视频 | 7.0/10 Real-Time Streaming MEL Vocoding with Generative Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-real-time-streaming-mel-vocoding-with-generative/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-real-time-streaming-mel-vocoding-with-generative/ 语音合成 | 7.5/10 Refgen: Reference-Guided Synthetic Data Generation for Anomalous Sound Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-refgen-reference-guided-synthetic-data-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-refgen-reference-guided-synthetic-data-generation/ 音频事件检测 | 7.5/10 RFM-Editing: Rectified Flow Matching for Text-Guided Audio Editing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rfm-editing-rectified-flow-matching-for-text/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rfm-editing-rectified-flow-matching-for-text/ 音频编辑 | 7.5/10 S2Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s2voice-style-aware-autoregressive-modeling-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s2voice-style-aware-autoregressive-modeling-with/ 歌唱语音转换 | 7.0/10 SAGA-SR: Semantically and Acoustically Guided Audio Super-Resolution https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saga-sr-semantically-and-acoustically-guided/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saga-sr-semantically-and-acoustically-guided/ 音频增强 | 7.5/10 Scalable Evaluation for Audio Identification Via Synthetic Latent Fingerprint Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scalable-evaluation-for-audio-identification-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scalable-evaluation-for-audio-identification-via/ 音频检索 | 7.0/10 SFM-TTS: Lightweight and Rapid Speech Synthesis with Flexible Shortcut Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/ 语音合成 | 7.0/10 Shortcut Flow Matching for Speech Enhancement: Step-Invariant Flows via Single Stage Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shortcut-flow-matching-for-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shortcut-flow-matching-for-speech-enhancement/ 语音增强 | 7.0/10 Single-Step Controllable Music Bandwidth extension with Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-single-step-controllable-music-bandwidth/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-single-step-controllable-music-bandwidth/ 音乐信息检索 | 7.0/10 Stemphonic: All-At-Once Flexible Multi-Stem Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stemphonic-all-at-once-flexible-multi-stem-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stemphonic-all-at-once-flexible-multi-stem-music/ 音乐生成 | 7.7/10 StylePitcher: Generating Style-Following and Expressive Pitch Curves for Versatile Singing Tasks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylepitcher-generating-style-following-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylepitcher-generating-style-following-and/ 歌唱语音合成 | 7.5/10 Symphony Rendering: Midi and Composer-Conditioned Auto Orchestration with Flow-Matching Transformers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphony-rendering-midi-and-composer-conditioned/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphony-rendering-midi-and-composer-conditioned/ 音乐生成 | 7.0/10 Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/ 语音合成 | 7.0/10 TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/ 语音合成 | 7.5/10 Towards Real-Time Generative Speech Restoration with Flow-Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-real-time-generative-speech-restoration/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-real-time-generative-speech-restoration/ 语音增强 | 6.0/10 Training Flow Matching Models with Reliable Labels via Self-Purification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-flow-matching-models-with-reliable/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-flow-matching-models-with-reliable/ 语音合成 | 7.5/10 Universr: Unified and Versatile Audio Super-Resolution Via Vocoder-Free Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-universr-unified-and-versatile-audio-super/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-universr-unified-and-versatile-audio-super/ 音频超分辨率 | 8.0/10 V2A-DPO: Omni-Preference Optimization for Video-To-Audio Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-v2a-dpo-omni-preference-optimization-for-video-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-v2a-dpo-omni-preference-optimization-for-video-to/ 视频到音频生成 | 7.5/10 VoxMorph: Scalable Zero-Shot Voice Identity Morphing via Disentangled Embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxmorph-scalable-zero-shot-voice-identity/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxmorph-scalable-zero-shot-voice-identity/ 语音克隆 | 9.0/10 MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-magic-tts-fine-grained-controllable-speech/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-magic-tts-fine-grained-controllable-speech/ 语音合成 | 7.0/10 Speech Enhancement Based on Drifting Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-speech-enhancement-based-on-drifting-models/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-speech-enhancement-based-on-drifting-models/ 语音增强 | 7.5/10 Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-talker-t2av-joint-talking-audio-video-generation/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-talker-t2av-joint-talking-audio-video-generation/ 语音合成 | 7.5/10 UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-unisonate-a-unified-model-for-speech-music-and/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-unisonate-a-unified-model-for-speech-music-and/ 音频生成 | 8.5/10 MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25-magic-tts-fine-grained-controllable-speech/ Sat, 25 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25-magic-tts-fine-grained-controllable-speech/ 语音合成 | 7.5/10 语音/音频论文速递 2026-04-25 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25/ Sat, 25 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25/ 共分析 2 篇语音/AI 论文 ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/ 语音合成 | 7.0/10 MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-magic-tts-fine-grained-controllable-speech/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-magic-tts-fine-grained-controllable-speech/ 语音合成 | 7.5/10 X-VC: Zero-shot Streaming Voice Conversion in Codec Space https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-x-vc-zero-shot-streaming-voice-conversion-in/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-x-vc-zero-shot-streaming-voice-conversion-in/ 1. **问题**：零样本语音转换需要同时实现高质量的说话人特征迁移和低延迟的流式推理，这是一个尚未很好解决的挑战。 2. **方法核心**：提出X-VC系统，在预训练的SAC语音编解码器的潜在空间中进行一步转换。核心是一个双条件声学转换器，它联合处理源语音的编解码器潜在表示和目标参考语音的帧级梅尔 ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-atrie-adaptive-tuning-for-robust-inference-and/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-atrie-adaptive-tuning-for-robust-inference-and/ 本文针对现有语音合成系统在生成角色驱动、情感丰富的语音时难以同时保持角色身份一致性和情感表达准确性的问题，提出了ATRIE框架。其核心是**Persona-Prosody Dual-Track (P2-DT) 架构**，将语音生成解耦为静态的**音色轨道**（通过标量量化保持身份锚点）和动态的**韵 Anonymization, Not Elimination: Utility-Preserved Speech Anonymization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-anonymization-not-elimination-utility-preserved/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-anonymization-not-elimination-utility-preserved/ 这篇论文针对语音数据隐私保护中“隐私泄露”与“数据效用损失”的核心矛盾，提出了一个新颖的两阶段框架。首先，为解决语音匿名化（保护“谁在说”）中身份多样性不足和可控性差的问题，提出了基于流匹配的说话人嵌入匿名器（F3-VA），它能生成多样且与原始说话人充分分离的新身份。其次，为解决内容匿名化（保护“说 AST: Adaptive, Seamless, and Training-Free Precise Speech Editing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ast-adaptive-seamless-and-training-free-precise/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ast-adaptive-seamless-and-training-free-precise/ 本文针对现有语音编辑方法依赖任务特定训练、未编辑区域时间一致性差的问题，提出了AST（Adaptive, Seamless, and Training-free），一种基于预训练AM-FM（自回归-流匹配）范式TTS模型的精确语音编辑框架。AST首先通过逆Euler ODE求解器将原始语音反演至潜空 CoSyncDiT: Cognitive Synchronous Diffusion Transformer for Movie Dubbing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-cosyncdit-cognitive-synchronous-diffusion/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-cosyncdit-cognitive-synchronous-diffusion/ 本文针对电影配音（视觉语音克隆）中音色保真度与唇形同步难以兼得的痛点，提出了一种基于流匹配的认知同步扩散Transformer（CoSyncDiT）框架。该方法受专业配音员认知过程启发，将噪声到语音的