<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>视频生成 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E8%A7%86%E9%A2%91%E7%94%9F%E6%88%90/</link>
    <description>Recent content in 视频生成 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E8%A7%86%E9%A2%91%E7%94%9F%E6%88%90/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>ICASSP 2026 - 视频生成 论文列表</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-052/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-052/</guid>
      <description>共 2 篇 ICASSP 2026 视频生成 方向论文</description>
    </item>
    <item>
      <title>MirrorTalk: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mirrortalk-forging-personalized-avatars-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mirrortalk-forging-personalized-avatars-via/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>StyHarmo: Efficient Style-Specific Video Generation with Music Synchronization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-styharmo-efficient-style-specific-video/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-styharmo-efficient-style-specific-video/</guid>
      <description>视频生成 | 6.5/10</description>
    </item>
    <item>
      <title>VT-Heads: Voice Cloning and Talking Head Generation from Text Based on V-DiT</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vt-heads-voice-cloning-and-talking-head/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vt-heads-voice-cloning-and-talking-head/</guid>
      <description>视频生成 | 6.5/10</description>
    </item>
    <item>
      <title>CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-cointeract-physically-consistent-human-object/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-cointeract-physically-consistent-human-object/</guid>
      <description>1. **问题**：现有视频扩散模型在生成人机交互（HOI）视频时，常出现手/脸结构崩溃和人机物理穿透等问题，根源在于模型缺乏对3D空间关系和交互结构的理解。 2. **方法核心**：提出CoInteract框架，核心是“空间结构化协同生成”范式。在一个共享的DiT骨干中联合训练RGB外观流和辅助的</description>
    </item>
  </channel>
</rss>
