<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>音频生成 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E9%9F%B3%E9%A2%91%E7%94%9F%E6%88%90/</link>
    <description>Recent content in 音频生成 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E9%9F%B3%E9%A2%91%E7%94%9F%E6%88%90/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Speech-Driven Paradigm for Physics-Informed Modeling of Coupled Micro-Speakers</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-speech-driven-paradigm-for-physics-informed/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-speech-driven-paradigm-for-physics-informed/</guid>
      <description>音频生成 | 7.0/10</description>
    </item>
    <item>
      <title>Arbitrarily Settable Frame Rate Neural Speech Codec with Content Adaptive Variable Length Segmentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-arbitrarily-settable-frame-rate-neural-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-arbitrarily-settable-frame-rate-neural-speech/</guid>
      <description>音频生成 | 7.0/10</description>
    </item>
    <item>
      <title>Assessing The Perceptual Impact of Low-Altitude Aircraft Noise in Cities: An Auralization Framework Using Gaussian Beam Tracing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-perceptual-impact-of-low-altitude/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-perceptual-impact-of-low-altitude/</guid>
      <description>音频生成 | 8.0/10</description>
    </item>
    <item>
      <title>Audience-Aware Co-speech Gesture Generation in Public Speaking via Anticipation Tokens</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audience-aware-co-speech-gesture-generation-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audience-aware-co-speech-gesture-generation-in/</guid>
      <description>音频生成 | 8.0/10</description>
    </item>
    <item>
      <title>AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiogen-omni-a-unified-multimodal-diffusion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiogen-omni-a-unified-multimodal-diffusion/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auv-teaching-audio-universal-vector-quantization/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auv-teaching-audio-universal-vector-quantization/</guid>
      <description>音频生成 | 8.0/10</description>
    </item>
    <item>
      <title>Break-the-Beat! Controllable MIDI-to-Drum audio synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-break-the-beat-controllable-midi-to-drum-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-break-the-beat-controllable-midi-to-drum-audio/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Cardiobridge-DM: Bridging Cross-Cohort Heart Sound Synthesis via Rhythm-Aware Semi-Supervised Diffusion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cardiobridge-dm-bridging-cross-cohort-heart-sound/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cardiobridge-dm-bridging-cross-cohort-heart-sound/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Combining Multi-Order Attention and Multi-Resolution Discriminator for High-Fidelity Neural Vocoder</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-multi-order-attention-and-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-multi-order-attention-and-multi/</guid>
      <description>语音合成 | 6.5/10</description>
    </item>
    <item>
      <title>Constraint Optimized Multichannel Mixer-Limiter Design</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-constraint-optimized-multichannel-mixer-limiter/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-constraint-optimized-multichannel-mixer-limiter/</guid>
      <description>多通道 | 7.0/10</description>
    </item>
    <item>
      <title>Diff-vs: Efficient Audio-Aware Diffusion U-Net for Vocals Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diff-vs-efficient-audio-aware-diffusion-u-net-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diff-vs-efficient-audio-aware-diffusion-u-net-for/</guid>
      <description>语音分离 | 7.5/10</description>
    </item>
    <item>
      <title>Diffusion Timbre Transfer via Mutual Information Guided Inpainting</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffusion-timbre-transfer-via-mutual-information/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffusion-timbre-transfer-via-mutual-information/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Disentangling Physiology from Fidelity: Latent-Guided Diffusion Models for Cross-Modal Cardiac Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-disentangling-physiology-from-fidelity-latent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-disentangling-physiology-from-fidelity-latent/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Diverse and Few-Step Audio Captioning via Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diverse-and-few-step-audio-captioning-via-flow/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diverse-and-few-step-audio-captioning-via-flow/</guid>
      <description>音频字幕生成 | 6.5/10</description>
    </item>
    <item>
      <title>EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eulerodec-a-complex-valued-rvq-vae-for-efficient/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eulerodec-a-complex-valued-rvq-vae-for-efficient/</guid>
      <description>音频生成 | 8.0/10</description>
    </item>
    <item>
      <title>Feedback-Driven Retrieval-Augmented Audio Generation with Large Audio Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-feedback-driven-retrieval-augmented-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-feedback-driven-retrieval-augmented-audio/</guid>
      <description>音频生成 | 6.5/10</description>
    </item>
    <item>
      <title>FlashFoley: Fast Interactive Sketch2audio Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flashfoley-fast-interactive-sketch2audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flashfoley-fast-interactive-sketch2audio/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>FODGE : High-Fidelity Dance Generation via Full-Body Optimization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fodge-high-fidelity-dance-generation-via-full/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fodge-high-fidelity-dance-generation-via-full/</guid>
      <description>音频生成 | 6.5/10</description>
    </item>
    <item>
      <title>FoleyBench: A Benchmark for Video-to-Audio Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-foleybench-a-benchmark-for-video-to-audio-models/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-foleybench-a-benchmark-for-video-to-audio-models/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>FxSearcher: Gradient-Free Text-Driven Audio Transformation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fxsearcher-gradient-free-text-driven-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fxsearcher-gradient-free-text-driven-audio/</guid>
      <description>音频生成 | 7.0/10</description>
    </item>
    <item>
      <title>Generating Localized Audible Zones Using a Single-Channel Parametric Loudspeaker</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generating-localized-audible-zones-using-a-single/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generating-localized-audible-zones-using-a-single/</guid>
      <description>空间音频 | 6.5/10</description>
    </item>
    <item>
      <title>Generating Moving 3d Soundscapes with Latent Diffusion Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generating-moving-3d-soundscapes-with-latent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generating-moving-3d-soundscapes-with-latent/</guid>
      <description>空间音频 | 7.5/10</description>
    </item>
    <item>
      <title>Generative Audio Extension and Morphing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generative-audio-extension-and-morphing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generative-audio-extension-and-morphing/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Constrative and Generative Pretraining</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gms-cavp-improving-audio-video-correspondence/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gms-cavp-improving-audio-video-correspondence/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>HFSQVAE: Hierarchical Vector Quantization with Residuals for Frequency-Specific Embedding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hfsqvae-hierarchical-vector-quantization-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hfsqvae-hierarchical-vector-quantization-with/</guid>
      <description>音频生成 | 7.0/10</description>
    </item>
    <item>
      <title>Hierarchical Discrete Flow Matching For Multi-Codebook Codec-Based Text-To-Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-discrete-flow-matching-for-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-discrete-flow-matching-for-multi/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>ICASSP 2026 - 音频生成 论文列表</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-133/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-133/</guid>
      <description>共 39 篇 ICASSP 2026 音频生成 方向论文</description>
    </item>
    <item>
      <title>Improving Interpretability in Generative Multitimbral DDSP Frameworks via Semantically-Disentangled Musical Attributes</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-interpretability-in-generative/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-interpretability-in-generative/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>KSDIFF: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ksdiff-keyframe-augmented-speech-aware-dual-path/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ksdiff-keyframe-augmented-speech-aware-dual-path/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-linearity-in-audio-consistency/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-linearity-in-audio-consistency/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>MAG: Multi-Modal Aligned Autoregressive Co-Speech Gesture Generation Without Vector Quantization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mag-multi-modal-aligned-autoregressive-co-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mag-multi-modal-aligned-autoregressive-co-speech/</guid>
      <description>音频生成 | 8.0/10</description>
    </item>
    <item>
      <title>Matching Reverberant Speech Through Learned Acoustic Embeddings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matching-reverberant-speech-through-learned/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matching-reverberant-speech-through-learned/</guid>
      <description>音频生成 | 8.0/10</description>
    </item>
    <item>
      <title>Meanflow-Accelerated Multimodal Video-to-Audio Synthesis Via One-Step Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflow-accelerated-multimodal-video-to-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflow-accelerated-multimodal-video-to-audio/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Mitigating Data Replication in Text-to-Audio Generative Diffusion Models Through Anti-Memorization Guidance</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-data-replication-in-text-to-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-data-replication-in-text-to-audio/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Mix2Morph: Learning Sound Morphing from Noisy Mixes</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mix2morph-learning-sound-morphing-from-noisy-mixes/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mix2morph-learning-sound-morphing-from-noisy-mixes/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-room-impulse-response-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-room-impulse-response-generation/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Parametric Neural Amp Modeling with Active Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-parametric-neural-amp-modeling-with-active/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-parametric-neural-amp-modeling-with-active/</guid>
      <description>音频生成 | 8.0/10</description>
    </item>
    <item>
      <title>Phase-Retrieval-Based Physics-Informed Neural Networks For Acoustic Magnitude Field Reconstruction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phase-retrieval-based-physics-informed-neural/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phase-retrieval-based-physics-informed-neural/</guid>
      <description>声源定位 | 7.0/10</description>
    </item>
    <item>
      <title>PICOAUDIO2: Temporal Controllable Text-to-Audio Generation with Natural Language Description</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-picoaudio2-temporal-controllable-text-to-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-picoaudio2-temporal-controllable-text-to-audio/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>PRoADS: Provably Secure And Robust Audio Diffusion Steganography With Latent Optimization And Backward Euler Inversion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proads-provably-secure-and-robust-audio-diffusion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proads-provably-secure-and-robust-audio-diffusion/</guid>
      <description>音频安全 | 6.5/10</description>
    </item>
    <item>
      <title>ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recom-realistic-co-speech-motion-generation-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recom-realistic-co-speech-motion-generation-with/</guid>
      <description>音频生成 | 7.0/10</description>
    </item>
    <item>
      <title>S-PRESSO: Ultra Low Bitrate Sound Effect Compression with Diffusion Autoencoders and Offline Quantization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s-presso-ultra-low-bitrate-sound-effect/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s-presso-ultra-low-bitrate-sound-effect/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Sounds that Shape: Audio-Driven 3D Mesh Generation with Attribute-Decoupled Score Distillation Sampling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sounds-that-shape-audio-driven-3d-mesh-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sounds-that-shape-audio-driven-3d-mesh-generation/</guid>
      <description>音频生成 | 7.0/10</description>
    </item>
    <item>
      <title>Spring Reverb Emulation with Hybrid Gated Convolutional Networks and State Space Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spring-reverb-emulation-with-hybrid-gated/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spring-reverb-emulation-with-hybrid-gated/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>StereoFoley: Object-Aware Stereo Audio Generation from Video</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stereofoley-object-aware-stereo-audio-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stereofoley-object-aware-stereo-audio-generation/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>StylePitcher: Generating Style-Following and Expressive Pitch Curves for Versatile Singing Tasks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylepitcher-generating-style-following-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylepitcher-generating-style-following-and/</guid>
      <description>歌唱语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Subsequence SDTW: Differentiable Alignment with Flexible Boundary Conditions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subsequence-sdtw-differentiable-alignment-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subsequence-sdtw-differentiable-alignment-with/</guid>
      <description>音乐信息检索 | 8.0/10</description>
    </item>
    <item>
      <title>Sunac: Source-Aware Unified Neural Audio Codec</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sunac-source-aware-unified-neural-audio-codec/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sunac-source-aware-unified-neural-audio-codec/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>SwitchCodec: Adaptive Residual-Expert Sparse Quantization for High-Fidelity Neural Audio Coding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-switchcodec-adaptive-residual-expert-sparse/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-switchcodec-adaptive-residual-expert-sparse/</guid>
      <description>音频生成 | 8.5/10</description>
    </item>
    <item>
      <title>Synthcloner: Synthesizer-Style Audio Transfer via Factorized Codec with ADSR Envelope Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthcloner-synthesizer-style-audio-transfer-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthcloner-synthesizer-style-audio-transfer-via/</guid>
      <description>音频生成 | 8.5/10</description>
    </item>
    <item>
      <title>TAG: Structured Temporal Audio Generation via LLM-Guided Manual Scription and Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tag-structured-temporal-audio-generation-via-llm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tag-structured-temporal-audio-generation-via-llm/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Taming Audio VAEs via Target-KL Regularization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-taming-audio-vaes-via-target-kl-regularization/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-taming-audio-vaes-via-target-kl-regularization/</guid>
      <description>音频生成 | 6.5/10</description>
    </item>
    <item>
      <title>Text2Move: Text-To-Moving Sound Generation via Trajectory Prediction and Temporal Alignment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2move-text-to-moving-sound-generation-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2move-text-to-moving-sound-generation-via/</guid>
      <description>空间音频 | 8.0/10</description>
    </item>
    <item>
      <title>Training-Free Multimodal Guidance for Video to Audio Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-free-multimodal-guidance-for-video-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-free-multimodal-guidance-for-video-to/</guid>
      <description>音频生成 | 8.0/10</description>
    </item>
    <item>
      <title>Universr: Unified and Versatile Audio Super-Resolution Via Vocoder-Free Flow Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-universr-unified-and-versatile-audio-super/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-universr-unified-and-versatile-audio-super/</guid>
      <description>音频超分辨率 | 8.0/10</description>
    </item>
    <item>
      <title>Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-via-score-to-performance-efficient-human/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-via-score-to-performance-efficient-human/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-unisonate-a-unified-model-for-speech-music-and/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-unisonate-a-unified-model-for-speech-music-and/</guid>
      <description>音频生成 | 8.5/10</description>
    </item>
    <item>
      <title>Materialistic RIR: Material Conditioned Realistic RIR Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-materialistic-rir-material-conditioned-realistic/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-materialistic-rir-material-conditioned-realistic/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-beat-tokenizing-and-generating-symbolic-music-by/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-beat-tokenizing-and-generating-symbolic-music-by/</guid>
      <description>本文针对符号音乐生成中主流的事件序列（event-based）tokenization方法隐含处理时间规律、导致模型需额外学习时间网格的问题，提出了一种名为**BEAT**的新型网格化tokenization框架。其核心思想是将音乐在时间上均匀离散化为“拍”（beat）作为基本单位，将每拍内每个音高</description>
    </item>
    <item>
      <title>Latent Fourier Transform</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-latent-fourier-transform/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-latent-fourier-transform/</guid>
      <description>这篇论文旨在解决现有音乐生成模型难以对**任意时间尺度**上的音乐模式进行精确控制的问题。作者提出了**潜在傅里叶变换（LatentFT）** 框架，其核心是将离散傅里叶变换应用于由扩散自编码器编码得到的**潜在向量序列**，从而得到“潜在频谱”。通过在训练过程中对潜在频谱进行随机频率掩码，迫使解码</description>
    </item>
    <item>
      <title>ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-controlfoley-unified-and-controllable-video-to/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-controlfoley-unified-and-controllable-video-to/</guid>
      <description>本文提出了ControlFoley，一个统一且可控的视频到音频生成框架，旨在解决现有方法在跨模态冲突下文本控制力弱、以及参考音频控制中音色与时间信息纠缠的问题。其核心贡献包括：1）提出联合视觉编码范式</description>
    </item>
    <item>
      <title>Enhancing time-frequency resolution with optimal transport and barycentric fusion of multiple spectrogram</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-enhancing-time-frequency-resolution-with-optimal/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-enhancing-time-frequency-resolution-with-optimal/</guid>
      <description>**核心问题**：短时傅里叶变换（STFT）生成的谱图受制于不确定性原理，无法同时获得优异的时间和频率分辨率。传统融合方法（如几何平均）要求输入谱图网格对齐，且性能有限。 **核心方法**：本文提出一</description>
    </item>
    <item>
      <title>Four Decades of Digital Waveguides</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-four-decades-of-digital-waveguides/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-four-decades-of-digital-waveguides/</guid>
      <description>这篇论文旨在全面回顾数字波导物理建模技术自诞生以来四十年的发展历程、核心应用与最新进展。它要解决的核心问题是，如何在保证物理模拟准确性的同时，实现声波传播模拟的高效计算，以满足实时音频处理（如虚拟乐器</description>
    </item>
    <item>
      <title>Geo2Sound: A Scalable Geo-Aligned Framework for Soundscape Generation from Satellite Imagery</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-geo2sound-a-scalable-geo-aligned-framework-for/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-geo2sound-a-scalable-geo-aligned-framework-for/</guid>
      <description>这篇论文提出了一个名为 **Geo2Sound** 的新任务和框架，旨在从卫星图像生成地理上一致且逼真的声音景观。**要解决的问题**是现有图像到音频模型在处理自上而下的卫星视图时面临三大挑战：缺乏结</description>
    </item>
    <item>
      <title>Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-tora3-trajectory-guided-audio-video-generation/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-tora3-trajectory-guided-audio-video-generation/</guid>
      <description>本文针对现有音视频（AV）生成模型中存在的运动不真实、声音与运动事件不同步、声音强度与运动强度不匹配等问题，提出了Tora3框架。其核心创新在于**将物体轨迹视为连接视觉与听觉模态的共享运动学先验**</description>
    </item>
  </channel>
</rss>
