<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>音频检索 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E9%9F%B3%E9%A2%91%E6%A3%80%E7%B4%A2/</link>
    <description>Recent content in 音频检索 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E9%9F%B3%E9%A2%91%E6%A3%80%E7%B4%A2/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>AUDIOCARDS: Structured Metadata Improves Audio Language Models for Sound Design</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiocards-structured-metadata-improves-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiocards-structured-metadata-improves-audio/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>Auto-MatchCut: An Audio-Visual Retrieval Framework for Seamless Match Cutting</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auto-matchcut-an-audio-visual-retrieval-framework/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auto-matchcut-an-audio-visual-retrieval-framework/</guid>
      <description>跨模态检索 | 7.0/10</description>
    </item>
    <item>
      <title>Automatic Music Sample Identification with Multi-Track Contrastive Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-sample-identification-with-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-sample-identification-with-multi/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>BEST-STD 2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-std-20-balanced-and-efficient-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-std-20-balanced-and-efficient-speech/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>CASTELLA: Long Audio Dataset with Captions and Temporal Boundaries</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-castella-long-audio-dataset-with-captions-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-castella-long-audio-dataset-with-captions-and/</guid>
      <description>音频检索 | 8.5/10</description>
    </item>
    <item>
      <title>Contrastive Timbre Representations for Musical Instrument And Synthesizer Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contrastive-timbre-representations-for-musical/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contrastive-timbre-representations-for-musical/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>Do Speech LLMs Learn Crossmodal Embedding Spaces?</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-speech-llms-learn-crossmodal-embedding-spaces/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-speech-llms-learn-crossmodal-embedding-spaces/</guid>
      <description>音频检索 | 6.5/10</description>
    </item>
    <item>
      <title>EchoRAG: A Two-Stage Framework for Audio-Text Retrieval and Temporal Grounding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echorag-a-two-stage-framework-for-audio-text/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echorag-a-two-stage-framework-for-audio-text/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>GLAP: General Contrastive Audio-Text Pretraining Across Domains and Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glap-general-contrastive-audio-text-pretraining/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glap-general-contrastive-audio-text-pretraining/</guid>
      <description>音频检索 | 8.5/10</description>
    </item>
    <item>
      <title>Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hashing-baseline-rethinking-hashing-in-the-age-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hashing-baseline-rethinking-hashing-in-the-age-of/</guid>
      <description>音频检索 音频分类 | 8.0/10</description>
    </item>
    <item>
      <title>ICASSP 2026 - 音频检索 论文列表</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-129/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-129/</guid>
      <description>共 11 篇 ICASSP 2026 音频检索 方向论文</description>
    </item>
    <item>
      <title>Leveraging Whisper Embeddings For Audio-Based Lyrics Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-whisper-embeddings-for-audio-based/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-whisper-embeddings-for-audio-based/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmeb-v3-measuring-the-performance-gaps-of-omni/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmeb-v3-measuring-the-performance-gaps-of-omni/</guid>
      <description>基准测试 | 7.5/10</description>
    </item>
    <item>
      <title>MusiCRS: Benchmarking Audio-Centric Conversational Recommendation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musicrs-benchmarking-audio-centric-conversational/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musicrs-benchmarking-audio-centric-conversational/</guid>
      <description>音乐推荐 | 7.5/10</description>
    </item>
    <item>
      <title>Scalable Evaluation for Audio Identification Via Synthetic Latent Fingerprint Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scalable-evaluation-for-audio-identification-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scalable-evaluation-for-audio-identification-via/</guid>
      <description>音频检索 | 7.0/10</description>
    </item>
    <item>
      <title>Separate this, and all of these Things Around It: Music Source Separation Via Hyperellipsoidal Queries</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-separate-this-and-all-of-these-things-around-it/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-separate-this-and-all-of-these-things-around-it/</guid>
      <description>音乐分离 | 7.0/10</description>
    </item>
    <item>
      <title>Shared Representation Learning for Reference-Guided Targeted Sound Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shared-representation-learning-for-reference/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shared-representation-learning-for-reference/</guid>
      <description>音频事件检测 | 8.5/10</description>
    </item>
    <item>
      <title>SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slap-scalable-language-audio-pretraining-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slap-scalable-language-audio-pretraining-with/</guid>
      <description>音频检索 | 8.0/10</description>
    </item>
    <item>
      <title>WavLink: Compact Audio–Text Embeddings with a Global Whisper Token</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavlink-compact-audiotext-embeddings-with-a/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavlink-compact-audiotext-embeddings-with-a/</guid>
      <description>音频检索 | 8.0/10</description>
    </item>
    <item>
      <title>Robust Audio-Text Retrieval via Cross-Modal Attention and Hybrid Loss</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-robust-audio-text-retrieval-via-cross-modal/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-robust-audio-text-retrieval-via-cross-modal/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>ATIR: Towards Audio-Text Interleaved Contextual Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-atir-towards-audio-text-interleaved-contextual/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-atir-towards-audio-text-interleaved-contextual/</guid>
      <description>这篇论文旨在解决现有音频-文本检索方法无法处理查询和文档中音频与文本交错出现（如多轮对话、混合输入）的局限性。为此，作者定义了音频-文本交错上下文检索（ATIR）任务，并构建了一个包含约8.8万对样本的大规模基准。为解决直接应用多模态大语言模型（MLLM）时音频token冗余导致的效率和精度问题，论</description>
    </item>
    <item>
      <title>Omni-Embed-Audio: Leveraging Multimodal LLMs for Robust Audio-Text Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-omni-embed-audio-leveraging-multimodal-llms-for/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-omni-embed-audio-leveraging-multimodal-llms-for/</guid>
      <description>这篇论文旨在解决当前音频-文本检索模型在**真实、多样化用户查询**下性能下降的问题。作者指出，现有基准测试（如AudioCaps, Clotho）依赖描述性标题式查询，与真实世界中简短、多变的搜索行为（如问题、命令、关键词、排除性查询）存在巨大差距。为此，论文提出了两大核心贡献：1) **Omni</description>
    </item>
  </channel>
</rss>
