<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>多语言 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E5%A4%9A%E8%AF%AD%E8%A8%80/</link>
    <description>Recent content in 多语言 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E5%A4%9A%E8%AF%AD%E8%A8%80/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Generative-First Neural Audio Autoencoder</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generative-first-neural-audio-autoencoder/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generative-first-neural-audio-autoencoder/</guid>
      <description>音乐生成 | 8.5/10</description>
    </item>
    <item>
      <title>A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-text-to-text-alignment-algorithm-for-better/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-text-to-text-alignment-algorithm-for-better/</guid>
      <description>模型评估 | 7.5/10</description>
    </item>
    <item>
      <title>AccLID: Accent-aware Language Identification for Robust Multilingual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Advancing Speech Summarization in Multi-Modal LLMs with Reinforcement Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-summarization-in-multi-modal/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-summarization-in-multi-modal/</guid>
      <description>音频问答 | 7.0/10</description>
    </item>
    <item>
      <title>Ara-BEST-RQ: Multi Dialectal Arabic SSL</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>ARCHI-TTS: A Flow-Matching-Based Text-to-Speech Model with Self-Supervised Semantic Aligner and Accelerated Inference</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>B-GRPO: Unsupervised Speech Emotion Recognition Based on Batched-Group Relative Policy Optimization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-b-grpo-unsupervised-speech-emotion-recognition/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-b-grpo-unsupervised-speech-emotion-recognition/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>Bayesian Low-Rank Factorization for Robust Model Adaptation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>BBPE16: UTF-16-Based Byte-Level Byte-Pair Encoding for Improved Multilingual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bbpe16-utf-16-based-byte-level-byte-pair-encoding/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bbpe16-utf-16-based-byte-level-byte-pair-encoding/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Benchmarking Humans And Machines On Complex Multilingual Speech Understanding Tasks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-humans-and-machines-on-complex/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-humans-and-machines-on-complex/</guid>
      <description>音频问答 | 7.5/10</description>
    </item>
    <item>
      <title>CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-cultural-bias-in-mel-scale-representations/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-cultural-bias-in-mel-scale-representations/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Cross-Lingual Alzheimer’s Disease Detection with Multimodal LLMs via Speech Cue-Augmented Prompting and Instruction Tuning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-alzheimers-disease-detection-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-alzheimers-disease-detection-with/</guid>
      <description>语音生物标志物 | 6.5/10</description>
    </item>
    <item>
      <title>Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-f5-tts-towards-language-agnostic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-f5-tts-towards-language-agnostic/</guid>
      <description>语音克隆 | 7.5/10</description>
    </item>
    <item>
      <title>Cross-Lingual Interleaving for Speech Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-interleaving-for-speech-language/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-interleaving-for-speech-language/</guid>
      <description>语音大模型 | 7.5/10</description>
    </item>
    <item>
      <title>Decoder-Only Conformer with Modality-Aware Sparse Mixtures of Experts for ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decoder-only-conformer-with-modality-aware-sparse/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decoder-only-conformer-with-modality-aware-sparse/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Detecting and Attributing Synthetic Spanish Speech: The HISPASpoof Dataset</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-detecting-and-attributing-synthetic-spanish/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-detecting-and-attributing-synthetic-spanish/</guid>
      <description>语音伪造检测 | 7.5/10</description>
    </item>
    <item>
      <title>Direct Simultaneous Translation Activation for Large Audio-Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-simultaneous-translation-activation-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-simultaneous-translation-activation-for/</guid>
      <description>语音翻译 | 6.0/10</description>
    </item>
    <item>
      <title>Direct Transfer of Prosody in Speech-to-speech Translation using Disentangled Speech Tokens</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-transfer-of-prosody-in-speech-to-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-transfer-of-prosody-in-speech-to-speech/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Exploring Fine-Tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-fine-tuning-of-large-audio-language/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-fine-tuning-of-large-audio-language/</guid>
      <description>语音理解 | 8.0/10</description>
    </item>
    <item>
      <title>Exploring SSL Discrete Tokens for Multilingual Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-ssl-discrete-tokens-for-multilingual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-ssl-discrete-tokens-for-multilingual/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>GLAP: General Contrastive Audio-Text Pretraining Across Domains and Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glap-general-contrastive-audio-text-pretraining/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glap-general-contrastive-audio-text-pretraining/</guid>
      <description>音频检索 | 8.5/10</description>
    </item>
    <item>
      <title>Group Relative Policy Optimization for Text-to-Speech with Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-group-relative-policy-optimization-for-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-group-relative-policy-optimization-for-text-to/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-Resource Transfer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-human-1-by-josh-talks-a-full-duplex/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-human-1-by-josh-talks-a-full-duplex/</guid>
      <description>语音对话系统 | 7.5/10</description>
    </item>
    <item>
      <title>Improving Contextual Asr Via Multi-Grained Fusion With Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-contextual-asr-via-multi-grained-fusion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-contextual-asr-via-multi-grained-fusion/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>Influence of Clean Speech Characteristics on Speech Enhancement Performance</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-influence-of-clean-speech-characteristics-on/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-influence-of-clean-speech-characteristics-on/</guid>
      <description>语音增强 | 8.0/10</description>
    </item>
    <item>
      <title>Korean aegyo speech shows systematic F1 increase to signal childlike qualities</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-korean-aegyo-speech-shows-systematic-f1-increase/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-korean-aegyo-speech-shows-systematic-f1-increase/</guid>
      <description>语音情感识别 | 6.0/10</description>
    </item>
    <item>
      <title>Language-Infused Retrieval-Augmented CTC with Adaptive Soft-Hard Gating for Robust Code-Switching ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-language-infused-retrieval-augmented-ctc-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-language-infused-retrieval-augmented-ctc-with/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-less-large-language-model-enhanced-semi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-less-large-language-model-enhanced-semi/</guid>
      <description>语音识别 语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-audio-visual-data-to-reduce-the/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-audio-visual-data-to-reduce-the/</guid>
      <description>语音识别 | 6.0/10</description>
    </item>
    <item>
      <title>Leveraging Whisper Embeddings For Audio-Based Lyrics Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-whisper-embeddings-for-audio-based/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-whisper-embeddings-for-audio-based/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-longspeech-a-scalable-benchmark-for-transcription/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-longspeech-a-scalable-benchmark-for-transcription/</guid>
      <description>基准测试 | 7.8/10</description>
    </item>
    <item>
      <title>Low-Resource Speech-Based Early Alzheimers Detection via Cross-Lingual and Few-Shot Transfer Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/</guid>
      <description>语音生物标志物 | 7.5/10</description>
    </item>
    <item>
      <title>Mixtures of Lightweight Articulatory Experts for Multilingual Asr</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-Token Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/</guid>
      <description>语音翻译 | 8.5/10</description>
    </item>
    <item>
      <title>Multilingual Supervised Pretraining with Lm-Assisted Decoding for Visual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Natural Language to Spatial Audio Parameters: Lightweight Deterministic Rendering for Creative Authoring</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-natural-language-to-spatial-audio-parameters/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-natural-language-to-spatial-audio-parameters/</guid>
      <description>空间音频 | 7.5/10</description>
    </item>
    <item>
      <title>NCF-TTS: Enhancing Flow Matching Based Text-To-Speech with Neighborhood Consistency Flow</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>nGPT as a Scalable Architecture for Speech Recognition and Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ngpt-as-a-scalable-architecture-for-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ngpt-as-a-scalable-architecture-for-speech/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>PAC: Pronunciation-Aware Contextualized Large Language Model-Based Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pac-pronunciation-aware-contextualized-large/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pac-pronunciation-aware-contextualized-large/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>PFluxTTS: Hybrid Flow-Matching TTS with Robust Cross-Lingual Voice Cloning and Inference-Time Model Fusion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pfluxtts-hybrid-flow-matching-tts-with-robust/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pfluxtts-hybrid-flow-matching-tts-with-robust/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Phrased: Phrase Dictionary Biasing for Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phrased-phrase-dictionary-biasing-for-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phrased-phrase-dictionary-biasing-for-speech/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Praxy Voice: Voice-Prompt Recovery &#43; BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prost-llm-progressively-enhancing-the-speech-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prost-llm-progressively-enhancing-the-speech-to/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-psp-an-interpretable-per-dimension-accent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-psp-an-interpretable-per-dimension-accent/</guid>
      <description>基准测试 | 7.5/10</description>
    </item>
    <item>
      <title>Revisiting Direct Speech-to-Text Translation with Speech LLMS: Better Scaling than Cot Prompting?</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-revisiting-direct-speech-to-text-translation-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-revisiting-direct-speech-to-text-translation-with/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>SA-SSL-MOS: Self-Supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sa-ssl-mos-self-supervised-learning-mos/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sa-ssl-mos-self-supervised-learning-mos/</guid>
      <description>语音质量评估 | 7.0/10</description>
    </item>
    <item>
      <title>SEP-ST: Incorporating Speech Entity Prompt Into Large Language Models for Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sep-st-incorporating-speech-entity-prompt-into/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sep-st-incorporating-speech-entity-prompt-into/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-Scale Dataset Cleansing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sidon-fast-and-robust-open-source-multilingual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sidon-fast-and-robust-open-source-multilingual/</guid>
      <description>语音增强 | 8.5/10</description>
    </item>
    <item>
      <title>StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylebench-evaluating-speech-language-models-on/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylebench-evaluating-speech-language-models-on/</guid>
      <description>基准测试 | 8.5/10</description>
    </item>
    <item>
      <title>Syncspeech: Efficient and Low-Latency Text-to-Speech Based on Temporal Masked Transformer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-syncspeech-efficient-and-low-latency-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-syncspeech-efficient-and-low-latency-text-to/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>TICL: Text-Embedding KNN for Speech in-Context Learning Unlocks Speech Recognition Abilities of Large Multimodal Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ticl-text-embedding-knn-for-speech-in-context/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ticl-text-embedding-knn-for-speech-in-context/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Towards Fair ASR for Second Language Speakers using Fairness Prompted Finetuning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-fair-asr-for-second-language-speakers/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-fair-asr-for-second-language-speakers/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-orthographically-informed-evaluation-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-orthographically-informed-evaluation-of/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>TTA: Transcribe, Translate and Alignment for Cross-Lingual Speech Representation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>UMA-SPLIT: Unimodal Aggregation for Both English and Mandarin Non-Autoregressive Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uma-split-unimodal-aggregation-for-both-english/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uma-split-unimodal-aggregation-for-both-english/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>VividTalker: A Modular Framework for Expressive 3D Talking Avatars with Controllable Gaze and Blink</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vividtalker-a-modular-framework-for-expressive-3d/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vividtalker-a-modular-framework-for-expressive-3d/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-Resource Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Scaling Properties of Continuous Diffusion Spoken Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-scaling-properties-of-continuous-diffusion-spoken/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-scaling-properties-of-continuous-diffusion-spoken/</guid>
      <description>语音生成 | 8.0/10</description>
    </item>
    <item>
      <title>DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-dm-asr-diarization-aware-multi-speaker-asr-with/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-dm-asr-diarization-aware-multi-speaker-asr-with/</guid>
      <description>说话人识别 | 8.0/10</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-27</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27/</guid>
      <description>共分析 13 篇语音/AI 论文</description>
    </item>
    <item>
      <title>&#34;This Wasn&#39;t Made for Me&#34;: Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-this-wasnt-made-for-me-recentering-user/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-this-wasnt-made-for-me-recentering-user/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-hierarchical-policy-optimization-for-simultaneous/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-hierarchical-policy-optimization-for-simultaneous/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Misinformation Span Detection in Videos via Audio Transcripts</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-misinformation-span-detection-in-videos-via-audio/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-misinformation-span-detection-in-videos-via-audio/</guid>
      <description>音频安全 | 7.5/10</description>
    </item>
    <item>
      <title>Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-preferences-of-a-voice-first-nation-large-scale/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-preferences-of-a-voice-first-nation-large-scale/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-prosody-as-supervision-bridging-the-non-verbal/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-prosody-as-supervision-bridging-the-non-verbal/</guid>
      <description>语音情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-24</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24/</guid>
      <description>共分析 21 篇语音/AI 论文</description>
    </item>
    <item>
      <title>Aligning Stuttered-Speech Research with End-User Needs: Scoping Review, Survey, and Guidelines</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-aligning-stuttered-speech-research-with-end-user/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-aligning-stuttered-speech-research-with-end-user/</guid>
      <description>1.  **问题**：当前口吃语音技术研究与口吃者（PWS）及言语语言病理学家（SLP）的实际需求存在系统性脱节，研究重点、任务定义和评估方法未能充分以用户为中心。 2.  **方法核心**：通过两部分结合分析：1）对228篇相关论文进行范围综述，提出研究任务分类法并分析研究现状；2）对70名利益相</description>
    </item>
    <item>
      <title>FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-flip-towards-understanding-and-interpreting/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-flip-towards-understanding-and-interpreting/</guid>
      <description>这篇论文旨在解决对多语言、多模态句子嵌入（如SONAR, LaBSE）的可解释性问题。核心方法是提出一种称为因子化线性投影（FLiP）的模型，通过将嵌入向量线性投影到词汇表空间来提取关键词，以此作为理解嵌入内容的代理任务。与之前非因子化的线性探测方法（如LiP）和SpLiCE相比，FLiP在关键词提</description>
    </item>
    <item>
      <title>Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-indic-codecfake-meets-satyam-towards-detecting/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-indic-codecfake-meets-satyam-towards-detecting/</guid>
      <description>1.  **问题**：现有针对基于神经音频编解码器的语音深度伪造（CodecFake）检测的研究主要集中在英语和中文，对于语言多样性极高的印度语言缺乏大规模的基准数据集和有效的检测方法。 2.  **方法**：作者构建了首个大规模印度语言CodecFake数据集（ICF），并提出了一个名为SATYA</description>
    </item>
    <item>
      <title>Qwen3.5-Omni Technical Report</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-qwen35-omni-technical-report/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-qwen35-omni-technical-report/</guid>
      <description>这篇论文介绍了Qwen3.5-Omni，一个支持文本、图像、音频和音频-视频输入的全模态大语言模型。为解决现有模型在实时交互、跨模态推理和工具使用上的不足，其核心方法是采用“Thinker-Talker”架构，并引入混合专家（MoE）设计以提升效率。与前代相比，主要创新在于：1）模型规模扩展至数千亿</description>
    </item>
    <item>
      <title>SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-speechparaling-bench-a-comprehensive-benchmark/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-speechparaling-bench-a-comprehensive-benchmark/</guid>
      <description>1.  **问题**：现有大型音频语言模型在副语言（如情绪、语气、音色）生成与理解能力上的评估存在特征覆盖不全、评估方法主观且不可扩展的问题。 2.  **方法**：提出了SpeechParaling-Bench，一个包含1000余个中英平行语音查询、覆盖超过100个细粒度副语言特征的综合基准。基准</description>
    </item>
    <item>
      <title>Tadabur: A Large-Scale Quran Audio Dataset</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-tadabur-a-large-scale-quran-audio-dataset/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-tadabur-a-large-scale-quran-audio-dataset/</guid>
      <description>1. **问题**：现有的古兰经语音数据集在规模、诵读者多样性、音频质量和标注深度上存在严重不足，限制了古兰经ASR、诵读者识别等任务的研究进展。 2. **方法核心**：提出Tadabur数据集及其构建流水线。流水线核心是“古兰经经文对齐模块”（AAM），它结合WhisperX进行初步转录，再利用</description>
    </item>
    <item>
      <title>Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-utterance-level-methods-for-identifying-reliable/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-utterance-level-methods-for-identifying-reliable/</guid>
      <description>1.  **要解决什么问题**：儿童语音自动识别（ASR）错误率高，影响语言学习、阅读辅助等应用。传统置信度估计方法在噪声大、模式多变的儿童语音上可能失效。需要一种在转录后（utterance级别）自动识别哪些ASR输出是可靠的方法，以减少人工审核负担。 2.  **方法核心是什么**：提出两种基于</description>
    </item>
    <item>
      <title>NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-nvbench-a-benchmark-for-speech-synthesis-with-non/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-nvbench-a-benchmark-for-speech-synthesis-with-non/</guid>
      <description>这篇论文旨在解决语音合成（TTS）领域中一个关键但被忽视的问题：如何标准化评估系统生成非语言声音（NVV，如笑声、叹息）的能力。作者提出了**NVBench**，一个包含**45类NVV统一分类体系**的双语（英/中）基准。其核心方法包括：1）构建了一个每类50例、总计4500例的高质量平衡评估数据</description>
    </item>
    <item>
      <title>Tadabur: A Large-Scale Quran Audio Dataset</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-tadabur-a-large-scale-quran-audio-dataset/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-tadabur-a-large-scale-quran-audio-dataset/</guid>
      <description>本文旨在解决古兰经语音研究领域缺乏大规模、多样化、细粒度标注数据集的问题。为此，作者提出了**Tadabur**数据集及其自动化构建流水线。该流水线首先从公共平台收集音频，并利用大语言模型（Gemini）从非结构化文本中提取标准化元数据（如章节、朗诵者）。核心步骤是**Ayah Alignment </description>
    </item>
    <item>
      <title>Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/</guid>
      <description>这篇论文旨在解决现有印度语言语音识别（Indic ASR）基准不反映真实场景、评估方法不公平的核心问题。为此，作者构建了“Voice of India”大规模基准，其数据源自3.6万名说话者的非脚本化电话对话，覆盖15种主要印度语言和139个地区集群，总计536小时。关键创新在于采用了考虑拼写变体的</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-22</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/</guid>
      <description>共分析 21 篇语音/AI 论文</description>
    </item>
    <item>
      <title>BhashaSutra: A Task-Centric Unified Survey of Indian NLP Datasets, Corpora, and Resources</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-bhashasutra-a-task-centric-unified-survey-of/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-bhashasutra-a-task-centric-unified-survey-of/</guid>
      <description>这篇论文旨在解决印度语言NLP研究资源分散、缺乏统一概览的痛点。作者首次提出了一个以任务为中心的统一分类体系，系统性地梳理和整合了超过200个数据集、50个基准测试以及100多个模型、工具和系统，覆盖了从核心语言处理（如分词、词性标注）到文本分类、生成翻译、信息检索、语音与多模态，乃至社会文化任务（</description>
    </item>
    <item>
      <title>FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-flip-towards-understanding-and-interpreting/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-flip-towards-understanding-and-interpreting/</guid>
      <description>本文提出**FLiP**，一种**因子化线性投影模型**，旨在**理解并解释**多语言、多模态句子嵌入空间（如SONAR, LaBSE, Gemini）。核心思想是将嵌入空间的解释转化为一个**线性关键词提取任务**：通过一个简单的线性投影，从句子嵌入向量中恢复出构成该句子的词汇。实验表明，训练良好</description>
    </item>
    <item>
      <title>MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mimiclm-zero-shot-voice-imitation-through/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mimiclm-zero-shot-voice-imitation-through/</guid>
      <description>这篇论文旨在解决零样本语音模仿任务中高质量平行训练数据稀缺的核心瓶颈。传统方法要么依赖复杂的解耦架构，要么使用合成语音作为训练目标，导致输出质量受限于合成系统的能力。作者提出了一种名为 **MimicLM** 的新框架，其核心创新在于**“角色交换”的数据构建策略**：使用TTS生成的语音作为**训</description>
    </item>
    <item>
      <title>MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mint-bench-a-comprehensive-multilingual-benchmark/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mint-bench-a-comprehensive-multilingual-benchmark/</guid>
      <description>这篇论文旨在解决指令跟随文本转语音（TTS）领域缺乏系统化评估工具的问题。当前评估存在覆盖不全、诊断粒度粗、多语言支持弱等缺陷。为此，作者提出了**MINT-Bench**，一个全面的多语言基准测试。其核心方法包括：1）一个基于10种原子声学属性的**分层多轴分类法**，系统性地组织了从简单到复杂（</description>
    </item>
    <item>
      <title>Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/</guid>
      <description>这篇论文旨在解决低资源多语言语音情感识别（SER）中标注数据稀缺的核心瓶颈。作者提出了一个颠覆性的范式：**将SER重新定义为无监督的“非言语到言语”迁移问题**。其核心假设是，非言语发声（如笑、哭）中蕴含的韵律情感线索比言语更纯粹、更跨语言，因此可以作为更好的监督源。为此，作者设计了**NOVA-</description>
    </item>
    <item>
      <title>VoxSafeBench: Not Just What Is Said, but Who, How, and Where</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-voxsafebench-not-just-what-is-said-but-who-how/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-voxsafebench-not-just-what-is-said-but-who-how/</guid>
      <description>这篇论文旨在解决当前语音语言模型（SLM）社会对齐评估不全面、不深入的问题。现有基准要么只关注基础音频理解，要么孤立地研究单一风险，无法区分模型是因“不懂”还是因“没用对地方”而失败。为此，作者提出了**VoxSafeBench**，这是首个联合评估SLM在**安全、公平、隐私**三大社会对齐维度上</description>
    </item>
    <item>
      <title>Where Do Self-Supervised Speech Models Become Unfair?</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-where-do-self-supervised-speech-models-become/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-where-do-self-supervised-speech-models-become/</guid>
      <description>这篇论文旨在探究自监督语音模型（S3M）的不公平性究竟在模型的哪个层级产生。研究团队采用了一种轻量级的线性探针方法，在多个S3M（如WavLM, Wav2Vec2, BEST-RQ, Whisper）的每一层嵌入上，同时评估了说话人识别（SID）和自动语音识别（ASR）任务的整体性能及对不同说话人组</description>
    </item>
    <item>
      <title>HARNESS: Lightweight Distilled Arabic Speech Foundation Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-harness-lightweight-distilled-arabic-speech/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-harness-lightweight-distilled-arabic-speech/</guid>
      <description>这篇论文针对阿拉伯语语音识别、方言识别和情感识别中通用多语言/英语模型性能不足、且大模型难以部署的问题，提出了 HArnESS——一个以阿拉伯语为中心的自监督语音模型家族。作者采用 HuBERT 风格的迭代自蒸馏框架，先在大规模阿拉伯语-英语双语数据（约 23K 小时）上训练 24 层的教师模型 H</description>
    </item>
    <item>
      <title>Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-interactive-asr-towards-human-like-interaction/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-interactive-asr-towards-human-like-interaction/</guid>
      <description>这篇论文针对传统ASR的两大盲区——WER指标对语义错误不敏感、以及系统无法通过自然交互进行纠错——提出了Interactive ASR框架。首先，作者引入S²ER（Sentence-level Semantic Error Rate），利用LLM-as-a-Judge二元判断识别结果与参考文本是否</description>
    </item>
    <item>
      <title>MUSCAT: MUltilingual, SCientific ConversATion Benchmark</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-muscat-multilingual-scientific-conversation/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-muscat-multilingual-scientific-conversation/</guid>
      <description>本文提出了 MUSCAT，一个用于评估多语言科学对话场景下自动语音识别（ASR）性能的新基准。数据集包含 6 组双语对话录音（共约 65 分钟，9,066 词），涉及英语与德语、土耳其语、中文、越南语的配对对话；每组对话使用 Meeting Owl 3、ReSpeaker USB 麦克风阵列和 Me</description>
    </item>
    <item>
      <title>PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ps-tts-phonetic-synchronization-in-text-to-speech/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ps-tts-phonetic-synchronization-in-text-to-speech/</guid>
      <description>这篇论文旨在解决自动配音（AD）中目标语音与源语音在时长和唇形上的同步难题。其核心贡献是提出了一套两阶段的文本改写方法，并集成到TTS系统中：首先通过语言模型进行**等时性**改写，确保目标语音时长匹配源语音；其次引入**音素同步（PS）**，使用动态时间规整（DTW）和从训练数据中学习的元音距离，</description>
    </item>
    <item>
      <title>UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-unipase-a-generative-model-for-universal-speech/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-unipase-a-generative-model-for-universal-speech/</guid>
      <description>这篇论文旨在解决通用语音增强（USE）中生成模型面临的“高感知质量”与“低内容幻觉”难以兼得的核心矛盾。作者提出了UniPASE框架，它扩展了其先前的低幻觉PASE模型，以处理包括噪声、混响、丢包、风</description>
    </item>
  </channel>
</rss>
