<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>音乐生成 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E9%9F%B3%E4%B9%90%E7%94%9F%E6%88%90/</link>
    <description>Recent content in 音乐生成 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E9%9F%B3%E4%B9%90%E7%94%9F%E6%88%90/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Generative-First Neural Audio Autoencoder</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generative-first-neural-audio-autoencoder/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generative-first-neural-audio-autoencoder/</guid>
      <description>音乐生成 | 8.5/10</description>
    </item>
    <item>
      <title>Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aligning-language-models-for-lyric-to-melody/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aligning-language-models-for-lyric-to-melody/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>AnyAccomp: Generalizable Accompaniment Generation Via Quantized Melodic Bottleneck</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-anyaccomp-generalizable-accompaniment-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-anyaccomp-generalizable-accompaniment-generation/</guid>
      <description>音乐生成 | 8.0/10</description>
    </item>
    <item>
      <title>Automatic Music Mixing Using a Generative Model of Effect Embeddings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-mixing-using-a-generative-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-mixing-using-a-generative-model/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Break-the-Beat! Controllable MIDI-to-Drum audio synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-break-the-beat-controllable-midi-to-drum-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-break-the-beat-controllable-midi-to-drum-audio/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Compression meets Sampling: LZ78-SPA for Efficient Symbolic Music Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-compression-meets-sampling-lz78-spa-for-efficient/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-compression-meets-sampling-lz78-spa-for-efficient/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>D3PIA: A Discrete Denoising Diffusion Model for Piano Accompaniment Generation from Lead Sheet</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-d3pia-a-discrete-denoising-diffusion-model-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-d3pia-a-discrete-denoising-diffusion-model-for/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Differentiable Pulsetable Synthesis for Wind Instrument Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-differentiable-pulsetable-synthesis-for-wind/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-differentiable-pulsetable-synthesis-for-wind/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Diffusion Timbre Transfer via Mutual Information Guided Inpainting</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffusion-timbre-transfer-via-mutual-information/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffusion-timbre-transfer-via-mutual-information/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Etude: Piano Cover Generation with a Three-Stage Approach — Extract, Structuralize, and Decode</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-etude-piano-cover-generation-with-a-three-stage/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-etude-piano-cover-generation-with-a-three-stage/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>Evaluating Disentangled Representations for Controllable Music Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-disentangled-representations-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-disentangled-representations-for/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Fine-Tuning Bigvgan-V2 for Robust Musical Tuning Preservation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-bigvgan-v2-for-robust-musical-tuning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-bigvgan-v2-for-robust-musical-tuning/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>FUSEMOS: Perceptual Evaluation of Text-to-Music Generation with Dual-Encoder Fusion and Ranking-Aware Composite Loss</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Huí Sù: Co-constructing a Dual Feedback Apparatus</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hu-s-co-constructing-a-dual-feedback-apparatus/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hu-s-co-constructing-a-dual-feedback-apparatus/</guid>
      <description>音乐生成 | 5.5/10</description>
    </item>
    <item>
      <title>ICASSP 2026 - 音乐生成 论文列表</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-110/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-110/</guid>
      <description>共 31 篇 ICASSP 2026 音乐生成 方向论文</description>
    </item>
    <item>
      <title>Improving Interpretability in Generative Multitimbral DDSP Frameworks via Semantically-Disentangled Musical Attributes</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-interpretability-in-generative/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-interpretability-in-generative/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>InstructAudio: Unified Speech and Music Generation with Natural Language Instruction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instructaudio-unified-speech-and-music-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instructaudio-unified-speech-and-music-generation/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Instrument Generation Through Distributional Flow Matching and Test-Time Search</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instrument-generation-through-distributional-flow/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instrument-generation-through-distributional-flow/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-linearity-in-audio-consistency/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-linearity-in-audio-consistency/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Low-Resource Guidance for Controllable Latent Audio Diffusion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-guidance-for-controllable-latent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-guidance-for-controllable-latent/</guid>
      <description>音乐生成 | 8.5/10</description>
    </item>
    <item>
      <title>Melos: Sentence-To-Section Training with Multi-Task Learning for LLM-Driven Song Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-melos-sentence-to-section-training-with-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-melos-sentence-to-section-training-with-multi/</guid>
      <description>音乐生成 | 6.5/10</description>
    </item>
    <item>
      <title>Motionbeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-motionbeat-motion-aligned-music-representation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-motionbeat-motion-aligned-music-representation/</guid>
      <description>舞蹈生成 | 7.5/10</description>
    </item>
    <item>
      <title>MR-FlowDPO: Multi-Reward Direct Preference Optimization for Flow-Matching Text-to-Music Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>MuseTok: Symbolic Music Tokenization for Generation and Semantic Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musetok-symbolic-music-tokenization-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musetok-symbolic-music-tokenization-for/</guid>
      <description>音乐生成 | 8.5/10</description>
    </item>
    <item>
      <title>Pianoroll-Event: A Novel Score Representation for Symbolic Music</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pianoroll-event-a-novel-score-representation-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pianoroll-event-a-novel-score-representation-for/</guid>
      <description>音乐生成 | 6.5/10</description>
    </item>
    <item>
      <title>Sing2Song: An Accompaniment Generation System Based on Solo Singing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sing2song-an-accompaniment-generation-system/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sing2song-an-accompaniment-generation-system/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Stemphonic: All-At-Once Flexible Multi-Stem Music Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stemphonic-all-at-once-flexible-multi-stem-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stemphonic-all-at-once-flexible-multi-stem-music/</guid>
      <description>音乐生成 | 7.7/10</description>
    </item>
    <item>
      <title>Symphony Rendering: Midi and Composer-Conditioned Auto Orchestration with Flow-Matching Transformers</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphony-rendering-midi-and-composer-conditioned/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphony-rendering-midi-and-composer-conditioned/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphonygen-3d-hierarchical-orchestral-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphonygen-3d-hierarchical-orchestral-generation/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2midi-inferalign-improving-symbolic-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2midi-inferalign-improving-symbolic-music/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Time-Shifted Token Scheduling for Symbolic Music Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-time-shifted-token-scheduling-for-symbolic-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-time-shifted-token-scheduling-for-symbolic-music/</guid>
      <description>音乐生成 | 8.5/10</description>
    </item>
    <item>
      <title>Towards Multi-View Hierarchical Video-to-Piano Generation with MIDI Guidance</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-multi-view-hierarchical-video-to-piano/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-multi-view-hierarchical-video-to-piano/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-via-score-to-performance-efficient-human/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-via-score-to-performance-efficient-human/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Virtual Consistency for Audio Editing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-virtual-consistency-for-audio-editing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-virtual-consistency-for-audio-editing/</guid>
      <description>音乐生成 | 8.0/10</description>
    </item>
    <item>
      <title>Visual Keys to Symphonies: Latent Diffusion for Multi-Scene Video-to-Music Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-visual-keys-to-symphonies-latent-diffusion-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-visual-keys-to-symphonies-latent-diffusion-for/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>ViTex: Visual Texture Control for Multi-Track Symbolic Music Generation via Discrete Diffusion Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vitex-visual-texture-control-for-multi-track/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vitex-visual-texture-control-for-multi-track/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>VMSP: Video-to-Music Generation with Two-Stage Alignment and Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vmsp-video-to-music-generation-with-two-stage/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vmsp-video-to-music-generation-with-two-stage/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>When Noise Lowers the Loss: Rethinking Likelihood-Based Evaluation in Music Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-noise-lowers-the-loss-rethinking-likelihood/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-noise-lowers-the-loss-rethinking-likelihood/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>Opening the Design Space: Two Years of Performance with Intelligent Musical Instruments</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-opening-the-design-space-two-years-of-performance/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-opening-the-design-space-two-years-of-performance/</guid>
      <description>音乐生成 | 6.5/10</description>
    </item>
    <item>
      <title>Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-video-robin-autoregressive-diffusion-planning-for/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-video-robin-autoregressive-diffusion-planning-for/</guid>
      <description>音乐生成 | 7.0/10</description>
    </item>
    <item>
      <title>BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-beat-tokenizing-and-generating-symbolic-music-by/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-beat-tokenizing-and-generating-symbolic-music-by/</guid>
      <description>本文针对符号音乐生成中主流的事件序列（event-based）tokenization方法隐含处理时间规律、导致模型需额外学习时间网格的问题，提出了一种名为**BEAT**的新型网格化tokenization框架。其核心思想是将音乐在时间上均匀离散化为“拍”（beat）作为基本单位，将每拍内每个音高</description>
    </item>
    <item>
      <title>A novel LSTM music generator based on the fractional time-frequency feature extraction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-a-novel-lstm-music-generator-based-on-the/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-a-novel-lstm-music-generator-based-on-the/</guid>
      <description>本文提出了一种基于分数阶傅里叶变换（FrFT）和长短期记忆网络（LSTM）的新型AI音乐生成系统。**核心目标**是利用FrFT在分数阶域（时频平面的旋转表示）中提取比传统时域或频域更丰富的音乐信号特征，以解决传统LSTM在捕捉音乐复杂时频结构上的不足。**关键方法**是将输入音乐信号进行FrFT变</description>
    </item>
    <item>
      <title>Latent Fourier Transform</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-latent-fourier-transform/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-latent-fourier-transform/</guid>
      <description>这篇论文旨在解决现有音乐生成模型难以对**任意时间尺度**上的音乐模式进行精确控制的问题。作者提出了**潜在傅里叶变换（LatentFT）** 框架，其核心是将离散傅里叶变换应用于由扩散自编码器编码得到的**潜在向量序列**，从而得到“潜在频谱”。通过在训练过程中对潜在频谱进行随机频率掩码，迫使解码</description>
    </item>
    <item>
      <title>Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-video-robin-autoregressive-diffusion-planning-for/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-video-robin-autoregressive-diffusion-planning-for/</guid>
      <description>本文针对现有视频到音乐（V2M）生成模型缺乏对创作者风格、主题等细粒度意图控制的问题，提出了Video-Robin，一个结合文本提示的视频配乐框架。其核心方法是将生成过程解耦为两个阶段：首先，一个多模态自回归规划头（AR-Head）整合视频帧和文本提示，通过语义语言模型、有限标量量化（FSQ）和残差</description>
    </item>
  </channel>
</rss>
