<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>语音翻译 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E8%AF%AD%E9%9F%B3%E7%BF%BB%E8%AF%91/</link>
    <description>Recent content in 语音翻译 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E8%AF%AD%E9%9F%B3%E7%BF%BB%E8%AF%91/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Advancing Speech Understanding in Speech-Aware Language Models with GRPO</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-understanding-in-speech-aware/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-understanding-in-speech-aware/</guid>
      <description>语音问答 | 7.0/10</description>
    </item>
    <item>
      <title>ATOM: Adaptive Token-Level Optimal Transport Mixup for Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/</guid>
      <description>语音翻译 | 8.0/10</description>
    </item>
    <item>
      <title>Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-text System</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention2probability-attention-driven/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention2probability-attention-driven/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Chunk-Wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunk-wise-attention-transducers-for-fast-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunk-wise-attention-transducers-for-fast-and/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Direct Simultaneous Translation Activation for Large Audio-Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-simultaneous-translation-activation-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-simultaneous-translation-activation-for/</guid>
      <description>语音翻译 | 6.0/10</description>
    </item>
    <item>
      <title>Direct Transfer of Prosody in Speech-to-speech Translation using Disentangled Speech Tokens</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-transfer-of-prosody-in-speech-to-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-transfer-of-prosody-in-speech-to-speech/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Equipping Large Language Model with Directional Speech Understanding Capabilities</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-equipping-large-language-model-with-directional/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-equipping-large-language-model-with-directional/</guid>
      <description>语音识别 语音翻译 | 7.0/10</description>
    </item>
    <item>
      <title>ICASSP 2026 - 语音翻译 论文列表</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-074/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-074/</guid>
      <description>共 8 篇 ICASSP 2026 语音翻译 方向论文</description>
    </item>
    <item>
      <title>Joint Autoregressive Modeling of Multi-Talker Overlapped Speech Recognition and Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/</guid>
      <description>语音识别 语音翻译 | 7.0/10</description>
    </item>
    <item>
      <title>LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-less-large-language-model-enhanced-semi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-less-large-language-model-enhanced-semi/</guid>
      <description>语音识别 语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-longspeech-a-scalable-benchmark-for-transcription/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-longspeech-a-scalable-benchmark-for-transcription/</guid>
      <description>基准测试 | 7.8/10</description>
    </item>
    <item>
      <title>MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-Token Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/</guid>
      <description>语音翻译 | 8.5/10</description>
    </item>
    <item>
      <title>nGPT as a Scalable Architecture for Speech Recognition and Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ngpt-as-a-scalable-architecture-for-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ngpt-as-a-scalable-architecture-for-speech/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Phrased: Phrase Dictionary Biasing for Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phrased-phrase-dictionary-biasing-for-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phrased-phrase-dictionary-biasing-for-speech/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prost-llm-progressively-enhancing-the-speech-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prost-llm-progressively-enhancing-the-speech-to/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Revisiting Direct Speech-to-Text Translation with Speech LLMS: Better Scaling than Cot Prompting?</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-revisiting-direct-speech-to-text-translation-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-revisiting-direct-speech-to-text-translation-with/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>SEP-ST: Incorporating Speech Entity Prompt Into Large Language Models for Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sep-st-incorporating-speech-entity-prompt-into/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sep-st-incorporating-speech-entity-prompt-into/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slm-tta-a-framework-for-test-time-adaptation-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slm-tta-a-framework-for-test-time-adaptation-of/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>TTA: Transcribe, Translate and Alignment for Cross-Lingual Speech Representation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-hierarchical-policy-optimization-for-simultaneous/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-hierarchical-policy-optimization-for-simultaneous/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-move-translating-laughter-and-tears-via-mixture/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-move-translating-laughter-and-tears-via-mixture/</guid>
      <description>这篇论文旨在解决语音到语音翻译（S2ST）系统普遍丢失源语音中非语言声音（如笑声、哭声）和情感信息的问题，这严重影响了跨语言交流的自然度和准确性。为此，作者提出了三项核心贡献：首先，设计了一个可扩展的自动化数据合成管道，用于生成大规模、高质量的英中富有表现力S2ST平行语料，克服了训练数据稀缺的瓶颈</description>
    </item>
    <item>
      <title>Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-detecting-hallucinations-in-speechllms-at/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-detecting-hallucinations-in-speechllms-at/</guid>
      <description>本文旨在解决语音大模型（SpeechLLMs）在推理时产生的“幻觉”问题，即生成与输入音频不符的流畅文本。现有方法依赖昂贵的黄金标准输出，而文本LLM的方法无法捕捉音频特有信号。为此，作者提出了四个基于注意力图的轻量级指标（AudioRatio, AudioConsistency, AudioEnt</description>
    </item>
    <item>
      <title>MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-move-translating-laughter-and-tears-via-mixture/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-move-translating-laughter-and-tears-via-mixture/</guid>
      <description>这篇论文旨在解决语音到语音翻译（S2ST）系统普遍缺失非语言声音（如笑声、哭泣）和情感韵律的问题，这严重限制了跨语言交流的自然度和语用准确性。作者提出了三大贡献：1) 一个**可扩展的表达性数据合成管道**，能自动生成高质量、带情感标注的S2ST训练对，克服了数据稀缺瓶颈；2) **MoVE（混合声</description>
    </item>
    <item>
      <title>NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-naijas2st-a-multi-accent-benchmark-for-speech-to/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-naijas2st-a-multi-accent-benchmark-for-speech-to/</guid>
      <description>这篇论文旨在解决非洲低资源语言在语音翻译（S2ST和S2TT）研究中面临的高质量、多口音平行语音数据严重匮乏的核心瓶颈。为此，作者构建了**NaijaS2ST**数据集，涵盖豪萨语、伊博语、约鲁巴语和尼日利亚皮钦语与英语的平行语音，每种语言约50小时，捕获了真实的说话者与口音多样性。基于此数据集，论</description>
    </item>
  </channel>
</rss>
