<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>语音识别 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB/</link>
    <description>Recent content in 语音识别 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-dataset-of-robot-patient-and-doctor-patient/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-dataset-of-robot-patient-and-doctor-patient/</guid>
      <description>语音对话系统 | 7.5/10</description>
    </item>
    <item>
      <title>A Personalized Real-Time Proactive Voice Memory Assistant</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-personalized-real-time-proactive-voice-memory/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-personalized-real-time-proactive-voice-memory/</guid>
      <description>实时处理 | 7.0/10</description>
    </item>
    <item>
      <title>A Study of Data Selection Strategies for Pre-Training Self-Supervised Speech Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-study-of-data-selection-strategies-for-pre/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-study-of-data-selection-strategies-for-pre/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-text-to-text-alignment-algorithm-for-better/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-text-to-text-alignment-algorithm-for-better/</guid>
      <description>模型评估 | 7.5/10</description>
    </item>
    <item>
      <title>AccLID: Accent-aware Language Identification for Robust Multilingual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adapting-diarization-conditioned-whisper-for-end/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adapting-diarization-conditioned-whisper-for-end/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Advanced modeling of interlanguage speech intelligibility benefit with L1-L2 multi-task learning using differentiable K-means for accent-robust discrete token-based ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advanced-modeling-of-interlanguage-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advanced-modeling-of-interlanguage-speech/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Advancing LLM-Based Multi-Channel Multi-Speaker Speech Recognition with Global Cross-Channel Attention and Sentence-Ordered First-In First-Out Serialized Output Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-llm-based-multi-channel-multi-speaker/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-llm-based-multi-channel-multi-speaker/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Advancing Semi-Supervised Child Speech Recognition with Omni-Temporal Classification under Label Noise</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-semi-supervised-child-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-semi-supervised-child-speech/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Adversarial Fine-Tuning on Speech Foundation Model with Vulnerable Attention Consistency Regularization for Robust Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adversarial-fine-tuning-on-speech-foundation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adversarial-fine-tuning-on-speech-foundation/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Speech Dataset with Speech Recognition Baselines</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aishell6-whisper-a-chinese-mandarin-audio-visual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aishell6-whisper-a-chinese-mandarin-audio-visual/</guid>
      <description>语音识别 | 8.3/10</description>
    </item>
    <item>
      <title>An End-to-End Multimodal System for Subtitle Recognition and Chinese-Japanese Translation in Short Dramas</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-end-to-end-multimodal-system-for-subtitle/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-end-to-end-multimodal-system-for-subtitle/</guid>
      <description>多模态模型 | 7.0/10</description>
    </item>
    <item>
      <title>Ara-BEST-RQ: Multi Dialectal Arabic SSL</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-text System</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention2probability-attention-driven/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention2probability-attention-driven/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-conditioned-diffusion-llms-for-asr-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-conditioned-diffusion-llms-for-asr-and/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Bayesian Low-Rank Factorization for Robust Model Adaptation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>BBPE16: UTF-16-Based Byte-Level Byte-Pair Encoding for Improved Multilingual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bbpe16-utf-16-based-byte-level-byte-pair-encoding/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bbpe16-utf-16-based-byte-level-byte-pair-encoding/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>BEST-RQ-based Self-Supervised Learning for Whisper Domain Adaptation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Bridging the Front-End and Back-End for Robust ASR via Cross-Attention-Based U-Net</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-front-end-and-back-end-for-robust/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-front-end-and-back-end-for-robust/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Can Large Audio Language Models Understand Audio Well? Speech, Scene and Events Understanding Benchmark for LALMs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-can-large-audio-language-models-understand-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-can-large-audio-language-models-understand-audio/</guid>
      <description>基准测试 | 7.0/10</description>
    </item>
    <item>
      <title>CCST: Cross-Modal and Consistency-Aware Self-Training for Source-Free Unsupervised Domain Adaptation in Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ccst-cross-modal-and-consistency-aware-self/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ccst-cross-modal-and-consistency-aware-self/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Chunk-Wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunk-wise-attention-transducers-for-fast-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunk-wise-attention-transducers-for-fast-and/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Chunkwise Aligners for Streaming Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunkwise-aligners-for-streaming-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunkwise-aligners-for-streaming-speech/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Confidence-Guided Error Correction for Disordered Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-confidence-guided-error-correction-for-disordered/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-confidence-guided-error-correction-for-disordered/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Content-Preserving Speech Representation Learning Via Adaptive Segment-Level Alignment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-preserving-speech-representation-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-preserving-speech-representation-learning/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contextual-biasing-for-asr-in-speech-llm-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contextual-biasing-for-asr-in-speech-llm-with/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-cultural-bias-in-mel-scale-representations/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-cultural-bias-in-mel-scale-representations/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Cross-Modal Bottleneck Fusion for Noise Robust Audio-Visual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-bottleneck-fusion-for-noise-robust/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-bottleneck-fusion-for-noise-robust/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>CTC-DID: CTC-Based Arabic Dialect Identification for Streaming Applications</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Decoder-Only Conformer with Modality-Aware Sparse Mixtures of Experts for ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decoder-only-conformer-with-modality-aware-sparse/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decoder-only-conformer-with-modality-aware-sparse/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Do we really need self-attention for streaming automatic speech recognition?</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-we-really-need-self-attention-for-streaming/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-we-really-need-self-attention-for-streaming/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Domain-Aware Scheduling for ASR Fine-Tuning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-aware-scheduling-for-asr-fine-tuning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-aware-scheduling-for-asr-fine-tuning/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emilia-nv-a-non-verbal-speech-dataset-with-word/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emilia-nv-a-non-verbal-speech-dataset-with-word/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Equipping Large Language Model with Directional Speech Understanding Capabilities</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-equipping-large-language-model-with-directional/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-equipping-large-language-model-with-directional/</guid>
      <description>语音识别 语音翻译 | 7.0/10</description>
    </item>
    <item>
      <title>Exploring SSL Discrete Tokens for Multilingual Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-ssl-discrete-tokens-for-multilingual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-ssl-discrete-tokens-for-multilingual/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>FinHuBERT: Hierarchical Feature Imitating Networks for Low-Resource Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-finhubert-hierarchical-feature-imitating-networks/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-finhubert-hierarchical-feature-imitating-networks/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Flexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning for Speech and Reasoning Tasks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flexi-lora-with-input-adaptive-ranks-efficient/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flexi-lora-with-input-adaptive-ranks-efficient/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Frontend Token Enhancement for Token-Based Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frontend-token-enhancement-for-token-based-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frontend-token-enhancement-for-token-based-speech/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gloria-gated-low-rank-interpretable-adaptation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gloria-gated-low-rank-interpretable-adaptation/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Grey-Box Prompt Tuning With Graph Alignment for Speech-Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-grey-box-prompt-tuning-with-graph-alignment-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-grey-box-prompt-tuning-with-graph-alignment-for/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-Resource Transfer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>ICASSP 2026 - 语音识别 论文列表</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-078/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-078/</guid>
      <description>共 102 篇 ICASSP 2026 语音识别 方向论文</description>
    </item>
    <item>
      <title>Identifying the Minimal and Maximal Phonetic Subspace of Speech Representations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identifying-the-minimal-and-maximal-phonetic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identifying-the-minimal-and-maximal-phonetic/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Impact of Phonetics on Speaker Identity in Adversarial Voice Attack</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-impact-of-phonetics-on-speaker-identity-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-impact-of-phonetics-on-speaker-identity-in/</guid>
      <description>说话人验证 | 7.0/10</description>
    </item>
    <item>
      <title>Improving Automatic Speech Recognition by Mitigating Distortions Introduced by Speech Enhancement Under Drone Noise</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-automatic-speech-recognition-by/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-automatic-speech-recognition-by/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Improving Contextual Asr Via Multi-Grained Fusion With Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-contextual-asr-via-multi-grained-fusion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-contextual-asr-via-multi-grained-fusion/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word level timestamp predictions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-in-sync-adaptation-of-speech-aware-large-language/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-in-sync-adaptation-of-speech-aware-large-language/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Input-Adaptive Differentiable Filterbanks via Hypernetworks for Robust Speech Processing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-input-adaptive-differentiable-filterbanks-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-input-adaptive-differentiable-filterbanks-via/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Inverse-Hessian Regularization for Continual Learning in ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inverse-hessian-regularization-for-continual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inverse-hessian-regularization-for-continual/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Investigating The Effect Of Sentence-Level Syntactic Structure On Information Loss In The Human Auditory System</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-investigating-the-effect-of-sentence-level/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-investigating-the-effect-of-sentence-level/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Joint Autoregressive Modeling of Multi-Talker Overlapped Speech Recognition and Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/</guid>
      <description>语音识别 语音翻译 | 7.0/10</description>
    </item>
    <item>
      <title>K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-k-function-joint-pronunciation-transcription-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-k-function-joint-pronunciation-transcription-and/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Language-Infused Retrieval-Augmented CTC with Adaptive Soft-Hard Gating for Robust Code-Switching ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-language-infused-retrieval-augmented-ctc-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-language-infused-retrieval-augmented-ctc-with/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Lattice-Guided Consistency Regularization of Dual-Mode Transducers for Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lattice-guided-consistency-regularization-of-dual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lattice-guided-consistency-regularization-of-dual/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Learning to Align with Unbalanced Optimal Transport in Linguistic Knowledge Transfer for ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-to-align-with-unbalanced-optimal/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-to-align-with-unbalanced-optimal/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-less-large-language-model-enhanced-semi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-less-large-language-model-enhanced-semi/</guid>
      <description>语音识别 语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-audio-visual-data-to-reduce-the/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-audio-visual-data-to-reduce-the/</guid>
      <description>语音识别 | 6.0/10</description>
    </item>
    <item>
      <title>Leveraging Segment-Level Speech Representations for LLM-Based Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-segment-level-speech-representations/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-segment-level-speech-representations/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Leveraging Text-to-Speech and Voice Conversion as Data Augmentation for Alzheimer&#39;s Disease Detection from Spontaneous Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-text-to-speech-and-voice-conversion-as/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-text-to-speech-and-voice-conversion-as/</guid>
      <description>语音生物标志物 | 7.0/10</description>
    </item>
    <item>
      <title>Linguard: Authenticating Speech Recordings Using Speech Recognition and Watermark</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-linguard-authenticating-speech-recordings-using/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-linguard-authenticating-speech-recordings-using/</guid>
      <description>音频安全 | 6.5/10</description>
    </item>
    <item>
      <title>Listen, But Don&#39;t Leak: Sensitive Data Protection for Privacy Aware Automatic Speech Recognition with Acoustic Triggers</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-listen-but-dont-leak-sensitive-data-protection/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-listen-but-dont-leak-sensitive-data-protection/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>LLM-Based Post-ASR Error Correction for Disordered Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llm-based-post-asr-error-correction-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llm-based-post-asr-error-correction-for/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-longspeech-a-scalable-benchmark-for-transcription/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-longspeech-a-scalable-benchmark-for-transcription/</guid>
      <description>基准测试 | 7.8/10</description>
    </item>
    <item>
      <title>LOTUSDIS: A Thai Far-Field Meeting Corpus for Robust Conversational ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lotusdis-a-thai-far-field-meeting-corpus-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lotusdis-a-thai-far-field-meeting-corpus-for/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Medical ASR Enhancement by Domain-Specific Reinforcement Fine-Tuning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-medical-asr-enhancement-by-domain-specific/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-medical-asr-enhancement-by-domain-specific/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Mind the Shift: Using Delta SSL Embeddings to Enhance Child ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-the-shift-using-delta-ssl-embeddings-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-the-shift-using-delta-ssl-embeddings-to/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-attention-sinks-and-massive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-attention-sinks-and-massive/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Mixture To Beamformed Mixture: Leveraging Beamformed Mixture As Weak-Supervision for Speech Enhancement and Noise-Robust ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-to-beamformed-mixture-leveraging/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-to-beamformed-mixture-leveraging/</guid>
      <description>语音增强 | 8.0/10</description>
    </item>
    <item>
      <title>Mixtures of Lightweight Articulatory Experts for Multilingual Asr</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mnv-17-a-high-quality-performative-mandarin/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mnv-17-a-high-quality-performative-mandarin/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Multilingual Supervised Pretraining with Lm-Assisted Decoding for Visual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>nGPT as a Scalable Architecture for Speech Recognition and Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ngpt-as-a-scalable-architecture-for-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ngpt-as-a-scalable-architecture-for-speech/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Noise-Robust AV-ASR Using Visual Features both in the Whisper Encoder and Decoder</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-robust-av-asr-using-visual-features-both-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-robust-av-asr-using-visual-features-both-in/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>OMNI-AVSR: Towards Unified Multimodal Speech Recognition With Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-omni-avsr-towards-unified-multimodal-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-omni-avsr-towards-unified-multimodal-speech/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>Online Register For Dual-Mode Self-Supervised Speech Models: Mitigating the Lack of Future Context</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-online-register-for-dual-mode-self-supervised/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-online-register-for-dual-mode-self-supervised/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>PAC: Pronunciation-Aware Contextualized Large Language Model-Based Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pac-pronunciation-aware-contextualized-large/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pac-pronunciation-aware-contextualized-large/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Peeking Into the Future for Contextual Biasing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-peeking-into-the-future-for-contextual-biasing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-peeking-into-the-future-for-contextual-biasing/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>PhoenixDSR: Phoneme-Guided and LLM-Enhanced Dysarthric Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phoenixdsr-phoneme-guided-and-llm-enhanced/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phoenixdsr-phoneme-guided-and-llm-enhanced/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Polynomial Mixing for Efficient Self-Supervised Speech Encoders</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Position-Invariant Fine-Tuning Of Speech Enhancement Models With Self-Supervised Speech Representations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-position-invariant-fine-tuning-of-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-position-invariant-fine-tuning-of-speech/</guid>
      <description>语音增强 | 6.5/10</description>
    </item>
    <item>
      <title>Production-Scale Dynamic Vocabulary ASR Biasing with Word-Level FST and Robust Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-production-scale-dynamic-vocabulary-asr-biasing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-production-scale-dynamic-vocabulary-asr-biasing/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-purification-before-fusion-toward-mask-free/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-purification-before-fusion-toward-mask-free/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>RAS: a Reliability Oriented Metric for Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ras-a-reliability-oriented-metric-for-automatic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ras-a-reliability-oriented-metric-for-automatic/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Reducing Prompt Sensitivity in LLM-Based Speech Recognition Through Learnable Projection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reducing-prompt-sensitivity-in-llm-based-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reducing-prompt-sensitivity-in-llm-based-speech/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Reference Microphone Selection for Guided Source Separation Based on The Normalized L-P Norm</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-microphone-selection-for-guided-source/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-microphone-selection-for-guided-source/</guid>
      <description>语音增强 | 7.0/10</description>
    </item>
    <item>
      <title>Relative Time Intervals Representation For Word-Level Timestamping With Masked Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-relative-time-intervals-representation-for-word/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-relative-time-intervals-representation-for-word/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rlbr-reinforcement-learning-with-biasing-rewards/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rlbr-reinforcement-learning-with-biasing-rewards/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-accent-identification-via-voice-conversion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-accent-identification-via-voice-conversion/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-multi-talker-asr-with-speaker-agnostic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-multi-talker-asr-with-speaker-agnostic/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-se-dicow-self-enrolled-diarization-conditioned/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-se-dicow-self-enrolled-diarization-conditioned/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>SED: Structural Entropy Based Speech Discretization for Discrete Token-Based ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sed-structural-entropy-based-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sed-structural-entropy-based-speech/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sequence-level-unsupervised-training-in-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sequence-level-unsupervised-training-in-speech/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slm-tta-a-framework-for-test-time-adaptation-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slm-tta-a-framework-for-test-time-adaptation-of/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ssvd-o-parameter-efficient-fine-tuning-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ssvd-o-parameter-efficient-fine-tuning-with/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>STACodec: Semantic Token Assignment for Balancing Acoustic Fidelity and Semantic Information in Audio Codecs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stacodec-semantic-token-assignment-for-balancing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stacodec-semantic-token-assignment-for-balancing/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streaming-speech-recognition-with-decoder-only/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streaming-speech-recognition-with-decoder-only/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Synthesized Data Selection via Score Distribution Matching for Te Reo Māori Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Synthetic Data Domain Adaptation for ASR via LLM-Based Text and Phonetic Respelling Augmentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-data-domain-adaptation-for-asr-via-llm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-data-domain-adaptation-for-asr-via-llm/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>TAGARELA - A Portuguese Speech Dataset from Podcasts</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/</guid>
      <description>语音识别 语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Target-Speaker LLM-ASR with Speaker-Aware Speech Encoder</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-target-speaker-llm-asr-with-speaker-aware-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-target-speaker-llm-asr-with-speaker-aware-speech/</guid>
      <description>语音识别 | 8.8/10</description>
    </item>
    <item>
      <title>TASU: Text-only Alignment for Speech Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tasu-text-only-alignment-for-speech-understanding/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tasu-text-only-alignment-for-speech-understanding/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Teaching the Teachers: Boosting Unsupervised Domain Adaptation In Speech Recognition By Ensemble Update</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-the-teachers-boosting-unsupervised/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-the-teachers-boosting-unsupervised/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Three Seconds is Sufficient: A Multi-Pronged Framework for Model-Based Speaker Adaptation in ASR Under Data-Scarce Conditions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>TICL: Text-Embedding KNN for Speech in-Context Learning Unlocks Speech Recognition Abilities of Large Multimodal Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ticl-text-embedding-knn-for-speech-in-context/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ticl-text-embedding-knn-for-speech-in-context/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Tokenchain: A Discrete Speech Chain via Semantic Token Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tokenchain-a-discrete-speech-chain-via-semantic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tokenchain-a-discrete-speech-chain-via-semantic/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Towards Fair ASR for Second Language Speakers using Fairness Prompted Finetuning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-fair-asr-for-second-language-speakers/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-fair-asr-for-second-language-speakers/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-orthographically-informed-evaluation-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-orthographically-informed-evaluation-of/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-robust-dysarthric-speech-recognition-llm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-robust-dysarthric-speech-recognition-llm/</guid>
      <description>语音识别 | 9.0/10</description>
    </item>
    <item>
      <title>Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-train-short-infer-long-speech-llm-enables-zero/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-train-short-infer-long-speech-llm-enables-zero/</guid>
      <description>说话人分离 | 9.0/10</description>
    </item>
    <item>
      <title>TTA: Transcribe, Translate and Alignment for Cross-Lingual Speech Representation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>UMA-SPLIT: Unimodal Aggregation for Both English and Mandarin Non-Autoregressive Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uma-split-unimodal-aggregation-for-both-english/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uma-split-unimodal-aggregation-for-both-english/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Voting-Based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voting-based-pitch-estimation-with-temporal-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voting-based-pitch-estimation-with-temporal-and/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>WAV2LEV: Predicting Levenshtein Edit Operation Sequences For Fine-Grained Estimation of Automatic Speech Recognition Error</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wav2lev-predicting-levenshtein-edit-operation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wav2lev-predicting-levenshtein-edit-operation/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Whisper-FEST: Single-Channel Far-Field Enhanced Speech-to-text without Parallel Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-fest-single-channel-far-field-enhanced/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-fest-single-channel-far-field-enhanced/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Whisper-MLA: Reducing GPU Memory Consumption of ASR Models Based on MHA2MLA Conversion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-mla-reducing-gpu-memory-consumption-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-mla-reducing-gpu-memory-consumption-of/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Whisper: Courtside Edition - Enhancing ASR Performance through LLM-Driven Context Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-courtside-edition-enhancing-asr/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-courtside-edition-enhancing-asr/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-Resource Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Z-Scores: A Metric for Linguistically Assessing Disfluency Removal</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-z-scores-a-metric-for-linguistically-assessing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-z-scores-a-metric-for-linguistically-assessing/</guid>
      <description>模型评估 | 6.5/10</description>
    </item>
    <item>
      <title>RAS: a Reliability Oriented Metric for Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-ras-a-reliability-oriented-metric-for-automatic/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-ras-a-reliability-oriented-metric-for-automatic/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-advancing-automatic-speech-recognition-using/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-advancing-automatic-speech-recognition-using/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-dm-asr-diarization-aware-multi-speaker-asr-with/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-dm-asr-diarization-aware-multi-speaker-asr-with/</guid>
      <description>说话人识别 | 8.0/10</description>
    </item>
    <item>
      <title>Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-identifying-and-typifying-demographic-unfairness/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-identifying-and-typifying-demographic-unfairness/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>&#34;This Wasn&#39;t Made for Me&#34;: Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-this-wasnt-made-for-me-recentering-user/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-this-wasnt-made-for-me-recentering-user/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-do-llm-decoders-listen-fairly-benchmarking-how/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-do-llm-decoders-listen-fairly-benchmarking-how/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Evaluation of Automatic Speech Recognition Using Generative Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-evaluation-of-automatic-speech-recognition-using/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-evaluation-of-automatic-speech-recognition-using/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Aligning Stuttered-Speech Research with End-User Needs: Scoping Review, Survey, and Guidelines</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-aligning-stuttered-speech-research-with-end-user/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-aligning-stuttered-speech-research-with-end-user/</guid>
      <description>1.  **问题**：当前口吃语音技术研究与口吃者（PWS）及言语语言病理学家（SLP）的实际需求存在系统性脱节，研究重点、任务定义和评估方法未能充分以用户为中心。 2.  **方法核心**：通过两部分结合分析：1）对228篇相关论文进行范围综述，提出研究任务分类法并分析研究现状；2）对70名利益相</description>
    </item>
    <item>
      <title>Enhancing ASR Performance in the Medical Domain for Dravidian Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/</guid>
      <description>这篇论文旨在解决达罗毗荼语言（Telugu和Kannada）在医疗领域自动语音识别（ASR）中面临的标注数据稀缺和语言形态复杂两大挑战。其核心方法是提出一个“置信度感知训练框架”，该框架通过一个混合置信度评分机制（结合静态的感知、声学相似性、WER分数和动态的模型熵），对混合了真实与合成语音的训练数</description>
    </item>
    <item>
      <title>Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-reducing-the-offline-streaming-gap-for-unified/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-reducing-the-offline-streaming-gap-for-unified/</guid>
      <description>1. **问题**：训练一个既能高精度离线转录又能低延迟流式识别的统一ASR模型极具挑战性，传统方法在低延迟下性能会急剧下降。 2. **方法核心**：提出一个统一的Transducer框架，结合分块注意力（含右上下文）和动态块卷积（DCConv）来适配两种模式。核心创新是引入了模式一致性正则化损失</description>
    </item>
    <item>
      <title>Tadabur: A Large-Scale Quran Audio Dataset</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-tadabur-a-large-scale-quran-audio-dataset/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-tadabur-a-large-scale-quran-audio-dataset/</guid>
      <description>1. **问题**：现有的古兰经语音数据集在规模、诵读者多样性、音频质量和标注深度上存在严重不足，限制了古兰经ASR、诵读者识别等任务的研究进展。 2. **方法核心**：提出Tadabur数据集及其构建流水线。流水线核心是“古兰经经文对齐模块”（AAM），它结合WhisperX进行初步转录，再利用</description>
    </item>
    <item>
      <title>Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-utterance-level-methods-for-identifying-reliable/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-utterance-level-methods-for-identifying-reliable/</guid>
      <description>1.  **要解决什么问题**：儿童语音自动识别（ASR）错误率高，影响语言学习、阅读辅助等应用。传统置信度估计方法在噪声大、模式多变的儿童语音上可能失效。需要一种在转录后（utterance级别）自动识别哪些ASR输出是可靠的方法，以减少人工审核负担。 2.  **方法核心是什么**：提出两种基于</description>
    </item>
    <item>
      <title>APRVOS: 1st Place Winner of 5th PVUW MeViS-Audio Track</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-aprvos-1st-place-winner-of-5th-pvuw-mevis-audio/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-aprvos-1st-place-winner-of-5th-pvuw-mevis-audio/</guid>
      <description>这篇论文报告了APRVOS系统，一个专为MEVIS_Audio（音频条件下的指代视频对象分割）任务设计的冠军方案。**要解决的问题**是传统文本指代分割模型无法直接处理包含噪声、不完整且可能描述视频中不存在物体的语音输入。**采用的方法**是一个四阶段流水线：首先使用VibeVoice-ASR将语音</description>
    </item>
    <item>
      <title>Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-detecting-hallucinations-in-speechllms-at/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-detecting-hallucinations-in-speechllms-at/</guid>
      <description>本文旨在解决语音大模型（SpeechLLMs）在推理时产生的“幻觉”问题，即生成与输入音频不符的流畅文本。现有方法依赖昂贵的黄金标准输出，而文本LLM的方法无法捕捉音频特有信号。为此，作者提出了四个基于注意力图的轻量级指标（AudioRatio, AudioConsistency, AudioEnt</description>
    </item>
    <item>
      <title>Qwen3.5-Omni Technical Report</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-qwen35-omni-technical-report/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-qwen35-omni-technical-report/</guid>
      <description>这篇技术报告全面介绍了Qwen3.5-Omni，一个能够统一理解与生成文本、图像、音频和音视频内容的全模态大语言模型。**要解决的问题**是现有模型在实时交互、跨模态推理和自主智能体行为方面的局限性。**采用的方法**是基于“思考者-说话者”架构，引入了多项关键创新：1）思考者和说话者均采用混合注意</description>
    </item>
    <item>
      <title>Tadabur: A Large-Scale Quran Audio Dataset</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-tadabur-a-large-scale-quran-audio-dataset/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-tadabur-a-large-scale-quran-audio-dataset/</guid>
      <description>本文旨在解决古兰经语音研究领域缺乏大规模、多样化、细粒度标注数据集的问题。为此，作者提出了**Tadabur**数据集及其自动化构建流水线。该流水线首先从公共平台收集音频，并利用大语言模型（Gemini）从非结构化文本中提取标准化元数据（如章节、朗诵者）。核心步骤是**Ayah Alignment </description>
    </item>
    <item>
      <title>Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/</guid>
      <description>这篇论文旨在解决现有印度语言语音识别（Indic ASR）基准不反映真实场景、评估方法不公平的核心问题。为此，作者构建了“Voice of India”大规模基准，其数据源自3.6万名说话者的非脚本化电话对话，覆盖15种主要印度语言和139个地区集群，总计536小时。关键创新在于采用了考虑拼写变体的</description>
    </item>
    <item>
      <title>ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-claricodec-optimising-neural-speech-codes-for/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-claricodec-optimising-neural-speech-codes-for/</guid>
      <description>本文针对卫星、水下通信等超低比特率（200bps）场景下，传统神经语音编解码器因优化重建质量而牺牲可懂度的问题，提出了ClariCodec。其核心方法是将编码器的量化过程重新定义为一个随机策略，并利用强化学习（RL），以词错率（WER）作为奖励信号对编码器进行微调，而冻结解码器等声学重建管线。实验表</description>
    </item>
    <item>
      <title>Where Do Self-Supervised Speech Models Become Unfair?</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-where-do-self-supervised-speech-models-become/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-where-do-self-supervised-speech-models-become/</guid>
      <description>这篇论文旨在探究自监督语音模型（S3M）的不公平性究竟在模型的哪个层级产生。研究团队采用了一种轻量级的线性探针方法，在多个S3M（如WavLM, Wav2Vec2, BEST-RQ, Whisper）的每一层嵌入上，同时评估了说话人识别（SID）和自动语音识别（ASR）任务的整体性能及对不同说话人组</description>
    </item>
    <item>
      <title>HARNESS: Lightweight Distilled Arabic Speech Foundation Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-harness-lightweight-distilled-arabic-speech/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-harness-lightweight-distilled-arabic-speech/</guid>
      <description>这篇论文针对阿拉伯语语音识别、方言识别和情感识别中通用多语言/英语模型性能不足、且大模型难以部署的问题，提出了 HArnESS——一个以阿拉伯语为中心的自监督语音模型家族。作者采用 HuBERT 风格的迭代自蒸馏框架，先在大规模阿拉伯语-英语双语数据（约 23K 小时）上训练 24 层的教师模型 H</description>
    </item>
    <item>
      <title>Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-interactive-asr-towards-human-like-interaction/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-interactive-asr-towards-human-like-interaction/</guid>
      <description>这篇论文针对传统ASR的两大盲区——WER指标对语义错误不敏感、以及系统无法通过自然交互进行纠错——提出了Interactive ASR框架。首先，作者引入S²ER（Sentence-level Semantic Error Rate），利用LLM-as-a-Judge二元判断识别结果与参考文本是否</description>
    </item>
    <item>
      <title>MUSCAT: MUltilingual, SCientific ConversATion Benchmark</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-muscat-multilingual-scientific-conversation/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-muscat-multilingual-scientific-conversation/</guid>
      <description>本文提出了 MUSCAT，一个用于评估多语言科学对话场景下自动语音识别（ASR）性能的新基准。数据集包含 6 组双语对话录音（共约 65 分钟，9,066 词），涉及英语与德语、土耳其语、中文、越南语的配对对话；每组对话使用 Meeting Owl 3、ReSpeaker USB 麦克风阵列和 Me</description>
    </item>
    <item>
      <title>ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-claricodec-optimising-neural-speech-codes-for/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-claricodec-optimising-neural-speech-codes-for/</guid>
      <description>这篇论文旨在解决卫星、水下等极端带宽受限场景下（如200bps）语音通信清晰度严重下降的问题。传统编解码器以波形重建为目标，在超低比特率下会将宝贵的比特分配给不必要的声学细节，而非核心语义信息。为此，</description>
    </item>
    <item>
      <title>Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-contextual-biasing-for-asr-in-speech-llm-with/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-contextual-biasing-for-asr-in-speech-llm-with/</guid>
      <description>这篇论文旨在解决语音大模型（SLLM）在识别训练数据中稀有或未见的“偏置词”时性能不佳的问题。传统方法依赖于为偏置词提供精确的音素序列（通过G2P系统生成），但这对用户有专业要求且工具兼容性差。为此，</description>
    </item>
    <item>
      <title>Diffusion Language Models for Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-diffusion-language-models-for-speech-recognition/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-diffusion-language-models-for-speech-recognition/</guid>
      <description>这篇论文探索了将扩散语言模型（DLM）应用于自动语音识别（ASR）任务的新方法。其核心目标是利用扩散模型的双向注意和并行生成能力，来提升基于传统编码器（如CTC）生成的ASR候选假设的准确性。论文主要</description>
    </item>
  </channel>
</rss>
