<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>音频问答 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E9%9F%B3%E9%A2%91%E9%97%AE%E7%AD%94/</link>
    <description>Recent content in 音频问答 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E9%9F%B3%E9%A2%91%E9%97%AE%E7%AD%94/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Advancing Speech Summarization in Multi-Modal LLMs with Reinforcement Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-summarization-in-multi-modal/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-summarization-in-multi-modal/</guid>
      <description>音频问答 | 7.0/10</description>
    </item>
    <item>
      <title>AQUA-Bench: Beyond finding answers to knowing when there are None in Audio Question Answering</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aqua-bench-beyond-finding-answers-to-knowing-when/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aqua-bench-beyond-finding-answers-to-knowing-when/</guid>
      <description>音频问答 | 7.0/10</description>
    </item>
    <item>
      <title>AUDIOGENIE-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiogenie-reasoner-a-training-free-multi-agent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiogenie-reasoner-a-training-free-multi-agent/</guid>
      <description>音频问答 | 7.0/10</description>
    </item>
    <item>
      <title>Benchmarking Humans And Machines On Complex Multilingual Speech Understanding Tasks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-humans-and-machines-on-complex/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-humans-and-machines-on-complex/</guid>
      <description>音频问答 | 7.5/10</description>
    </item>
    <item>
      <title>DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dspast-disentangled-representations-for-spatial/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dspast-disentangled-representations-for-spatial/</guid>
      <description>音频问答 | 8.0/10</description>
    </item>
    <item>
      <title>Efficient Audio-Visual Inference Via Token Clustering And Modality Fusion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-audio-visual-inference-via-token/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-audio-visual-inference-via-token/</guid>
      <description>音频问答 | 7.5/10</description>
    </item>
    <item>
      <title>Enhancing Audio Question-Answering Performance Through Log-Likelihood Guided Reward Functions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-audio-question-answering-performance/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-audio-question-answering-performance/</guid>
      <description>音频问答 | 8.5/10</description>
    </item>
    <item>
      <title>FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fastav-efficient-token-pruning-for-audio-visual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fastav-efficient-token-pruning-for-audio-visual/</guid>
      <description>音频问答 | 7.0/10</description>
    </item>
    <item>
      <title>ICASSP 2026 - 音频问答 论文列表</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-137/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-137/</guid>
      <description>共 15 篇 ICASSP 2026 音频问答 方向论文</description>
    </item>
    <item>
      <title>Improving Audio Question Answering with Variational Inference</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-question-answering-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-question-answering-with/</guid>
      <description>音频问答 | 7.5/10</description>
    </item>
    <item>
      <title>Keeping Models Listening: Segment- and time-aware attention rescaling at decoding time</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-keeping-models-listening-segment-and-time-aware/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-keeping-models-listening-segment-and-time-aware/</guid>
      <description>音频问答 | 7.5/10</description>
    </item>
    <item>
      <title>Mitigating Language Prior-Induced Hallucinations via Bi-Level Contrastive Decoding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-language-prior-induced-hallucinations/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-language-prior-induced-hallucinations/</guid>
      <description>多模态模型 | 7.5/10</description>
    </item>
    <item>
      <title>Segmentwise Pruning in Audio-Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-segmentwise-pruning-in-audio-language-models/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-segmentwise-pruning-in-audio-language-models/</guid>
      <description>音频问答 | 7.0/10</description>
    </item>
    <item>
      <title>SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sightsound-r1-cross-modal-reasoning-distillation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sightsound-r1-cross-modal-reasoning-distillation/</guid>
      <description>音频问答 | 7.5/10</description>
    </item>
    <item>
      <title>TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tau-a-benchmark-for-cultural-sound-understanding/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tau-a-benchmark-for-cultural-sound-understanding/</guid>
      <description>音频问答 | 7.5/10</description>
    </item>
    <item>
      <title>Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-Wise Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-audio-models-to-reason-a-unified/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-audio-models-to-reason-a-unified/</guid>
      <description>音频问答 | 7.0/10</description>
    </item>
    <item>
      <title>Test-Time Scaling for Auditory Cognition in Audio Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-scaling-for-auditory-cognition-in-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-scaling-for-auditory-cognition-in-audio/</guid>
      <description>音频问答 | 7.0/10</description>
    </item>
    <item>
      <title>TinyMU: A Compact Audio-Language Model for Music Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tinymu-a-compact-audio-language-model-for-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tinymu-a-compact-audio-language-model-for-music/</guid>
      <description>音乐理解 | 7.5/10</description>
    </item>
    <item>
      <title>Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-walking-through-uncertainty-an-empirical-study-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-walking-through-uncertainty-an-empirical-study-of/</guid>
      <description>音频问答 | 7.5/10</description>
    </item>
    <item>
      <title>All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-all-that-glitters-is-not-audio-rethinking-text/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-all-that-glitters-is-not-audio-rethinking-text/</guid>
      <description>音频问答 | 6.5/10</description>
    </item>
    <item>
      <title>Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-listening-with-time-precise-temporal-awareness/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-listening-with-time-precise-temporal-awareness/</guid>
      <description>音频场景理解 | 8.0/10</description>
    </item>
    <item>
      <title>AUDITA: A New Dataset to Audit Humans vs. AI Skill at Audio QA</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-audita-a-new-dataset-to-audit-humans-vs-ai-skill/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-audita-a-new-dataset-to-audit-humans-vs-ai-skill/</guid>
      <description>音频问答 | 6.5/10</description>
    </item>
    <item>
      <title>Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-audio-cogito-towards-deep-audio-reasoning-in/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-audio-cogito-towards-deep-audio-reasoning-in/</guid>
      <description>本文旨在解决大型音频语言模型（LALMs）在复杂音频推理任务中能力不足、推理过程不透明的问题。**核心贡献**是提出了一个名为 **Audio-Cogito** 的完全开源解决方案，其核心是一个四阶段的自动化数据构建管道 **Cogito-Pipe**，用于生成高质量、多样化的音频推理链（CoT）数</description>
    </item>
    <item>
      <title>Audio-DeepThinker: Progressive Reasoning-Aware Reinforcement Learning for High-Quality Chain-of-Thought Emergence in Audio Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-audio-deepthinker-progressive-reasoning-aware/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-audio-deepthinker-progressive-reasoning-aware/</guid>
      <description>这篇论文旨在解决大型音频语言模型（LALMs）缺乏显式、高质量推理能力的问题。现有方法要么受限于监督数据的质量，要么使用粗糙的奖励，导致生成的思维链形式良好但缺乏声学依据。作者提出了**Audio-DeepThinker**框架，其核心贡献有三：1）设计了一种**混合推理相似度奖励**，结合LLM评</description>
    </item>
    <item>
      <title>Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-temporal-contrastive-decoding-a-training-free/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-temporal-contrastive-decoding-a-training-free/</guid>
      <description>统一的大型音频-语言模型（LALMs）在自回归解码时存在“时间平滑偏差”：短暂、瞬态的声学线索（如电话铃声、乐器拨弦）容易被语言先验和时间上平滑的上下文所淹没，导致生成结果缺乏音频特异性。本文提出 Temporal Contrastive Decoding (TCD)，一种完全免训练、仅在推理时生效</description>
    </item>
  </channel>
</rss>
