<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>低资源 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E4%BD%8E%E8%B5%84%E6%BA%90/</link>
    <description>Recent content in 低资源 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E4%BD%8E%E8%B5%84%E6%BA%90/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Bimodal Approach for Detecting Fatigue Using Speech and Personal Assessments in College Students</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bimodal-approach-for-detecting-fatigue-using/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bimodal-approach-for-detecting-fatigue-using/</guid>
      <description>A Bimodal Approach for Detecting Fatigue Using Speech and Personal Assessments in College Students</description>
    </item>
    <item>
      <title>AFT: An Exemplar-Free Class Incremental Learning Method for Environmental Sound Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aft-an-exemplar-free-class-incremental-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aft-an-exemplar-free-class-incremental-learning/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>Ara-BEST-RQ: Multi Dialectal Arabic SSL</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Asynchrony-Aware Decoupled Multimodal Control for Cued Speech Video Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>ATOM: Adaptive Token-Level Optimal Transport Mixup for Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/</guid>
      <description>语音翻译 | 8.0/10</description>
    </item>
    <item>
      <title>Bayesian Low-Rank Factorization for Robust Model Adaptation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Behind the Scenes: Mechanistic Interpretability of Lora-Adapted Whisper for Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-behind-the-scenes-mechanistic-interpretability-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-behind-the-scenes-mechanistic-interpretability-of/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>BEST-RQ-based Self-Supervised Learning for Whisper Domain Adaptation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>CTC-DID: CTC-Based Arabic Dialect Identification for Streaming Applications</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>DDSC: Dynamic Dual-Signal Curriculum for Data-Efficient Acoustic Scene Classification Under Domain Shift</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsc-dynamic-dual-signal-curriculum-for-data/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsc-dynamic-dual-signal-curriculum-for-data/</guid>
      <description>音频场景分类 | 7.0/10</description>
    </item>
    <item>
      <title>Domain-Aware Scheduling for ASR Fine-Tuning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-aware-scheduling-for-asr-fine-tuning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-aware-scheduling-for-asr-fine-tuning/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Efficient Depression Detection from Speech via Language-Independent Prompt-Driven Reprogramming</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/</guid>
      <description>语音生物标志物 | 7.5/10</description>
    </item>
    <item>
      <title>Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-entropy-guided-grvq-for-ultra-low-bitrate-neural/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-entropy-guided-grvq-for-ultra-low-bitrate-neural/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Exploring Fine-Tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-fine-tuning-of-large-audio-language/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-fine-tuning-of-large-audio-language/</guid>
      <description>语音理解 | 8.0/10</description>
    </item>
    <item>
      <title>Fast-ULCNet: A Fast and Ultra Low Complexity Network for Single-Channel Speech Enhancement</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fast-ulcnet-a-fast-and-ultra-low-complexity/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fast-ulcnet-a-fast-and-ultra-low-complexity/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>FinHuBERT: Hierarchical Feature Imitating Networks for Low-Resource Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-finhubert-hierarchical-feature-imitating-networks/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-finhubert-hierarchical-feature-imitating-networks/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-focalcodec-stream-streaming-low-bitrate-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-focalcodec-stream-streaming-low-bitrate-speech/</guid>
      <description>语音编码 | 8.0/10</description>
    </item>
    <item>
      <title>From Hallucination to Articulation: Language Model-Driven Losses for Ultra Low-Bitrate Neural Speech Coding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>H-nnPBFDAF: Hierarchical Neural Network Partitioned Block Frequency Domain Adaptive Filter with Novel Block Activation Probability</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-h-nnpbfdaf-hierarchical-neural-network/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-h-nnpbfdaf-hierarchical-neural-network/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>HCGAN: Harmonic-Coupled Generative Adversarial Network for Speech Super-Resolution in Low-Bandwidth Scenarios</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hcgan-harmonic-coupled-generative-adversarial/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hcgan-harmonic-coupled-generative-adversarial/</guid>
      <description>语音增强 | 8.0/10</description>
    </item>
    <item>
      <title>How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-Resource Transfer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hybrid-pruning-in-situ-compression-of-self/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hybrid-pruning-in-situ-compression-of-self/</guid>
      <description>说话人验证 | 8.0/10</description>
    </item>
    <item>
      <title>Improving Audio Event Recognition with Consistency Regularization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-event-recognition-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-event-recognition-with/</guid>
      <description>音频事件检测 | 7.0/10</description>
    </item>
    <item>
      <title>Leveraging Diffusion U-Net Features for Predominant Instrument Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-diffusion-u-net-features-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-diffusion-u-net-features-for/</guid>
      <description>音乐信息检索 | 8.0/10</description>
    </item>
    <item>
      <title>Lightweight and Perceptually-Guided Voice Conversion for Electro-Laryngeal Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-perceptually-guided-voice/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-perceptually-guided-voice/</guid>
      <description>语音转换 | 7.5/10</description>
    </item>
    <item>
      <title>Lingometer: On-Device Personal Speech Word Counting System</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/</guid>
      <description>语音活动检测 | 8.0/10</description>
    </item>
    <item>
      <title>LLM-Based Post-ASR Error Correction for Disordered Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llm-based-post-asr-error-correction-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llm-based-post-asr-error-correction-for/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>LOTUSDIS: A Thai Far-Field Meeting Corpus for Robust Conversational ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lotusdis-a-thai-far-field-meeting-corpus-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lotusdis-a-thai-far-field-meeting-corpus-for/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Low-Resource Speech-Based Early Alzheimers Detection via Cross-Lingual and Few-Shot Transfer Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/</guid>
      <description>语音生物标志物 | 7.5/10</description>
    </item>
    <item>
      <title>LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Mind the Shift: Using Delta SSL Embeddings to Enhance Child ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-the-shift-using-delta-ssl-embeddings-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-the-shift-using-delta-ssl-embeddings-to/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Mixtures of Lightweight Articulatory Experts for Multilingual Asr</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Multilingual Supervised Pretraining with Lm-Assisted Decoding for Visual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Neuromamba: Adaptive Frequency Filtering with a Pyramid Mamba for sEEG-driven Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neuromamba-adaptive-frequency-filtering-with-a/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neuromamba-adaptive-frequency-filtering-with-a/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>One Model–Three Tasks: Discovering a Shared Winning Ticket for Low-Complexity Audio Intelligence</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-one-modelthree-tasks-discovering-a-shared-winning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-one-modelthree-tasks-discovering-a-shared-winning/</guid>
      <description>音频分类 | 7.5/10</description>
    </item>
    <item>
      <title>Polynomial Mixing for Efficient Self-Supervised Speech Encoders</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Praxy Voice: Voice-Prompt Recovery &#43; BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Prompt-Guided Mixture-of-Experts for Robust Multimodal Sentiment Analysis with Missing Modalities</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prompt-guided-mixture-of-experts-for-robust/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prompt-guided-mixture-of-experts-for-robust/</guid>
      <description>语音情感识别 | 8.5/10</description>
    </item>
    <item>
      <title>Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quantifying-speaker-embedding-phonological-rule/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quantifying-speaker-embedding-phonological-rule/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Ranking The Impact of Contextual Specialization in Neural Speech Enhancement</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ranking-the-impact-of-contextual-specialization/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ranking-the-impact-of-contextual-specialization/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Representation-Diverse Self-Supervision for Cross-Domain Bioacoustic Learning in Low-Resource Settings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/</guid>
      <description>生物声学 | 7.0/10</description>
    </item>
    <item>
      <title>Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-ambiguity-augmenting-human-annotation-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-ambiguity-augmenting-human-annotation-in/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>Self-Supervised Note Tracking and Multi-Pitch Estimation Via Reconstruction-Based Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-self-supervised-note-tracking-and-multi-pitch/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-self-supervised-note-tracking-and-multi-pitch/</guid>
      <description>多音高估计 音符跟踪 | 8.5/10</description>
    </item>
    <item>
      <title>Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sequence-level-unsupervised-training-in-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sequence-level-unsupervised-training-in-speech/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ssvd-o-parameter-efficient-fine-tuning-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ssvd-o-parameter-efficient-fine-tuning-with/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Synthesized Data Selection via Score Distribution Matching for Te Reo Māori Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>TAGARELA - A Portuguese Speech Dataset from Podcasts</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/</guid>
      <description>语音识别 语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Taming Audio VAEs via Target-KL Regularization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-taming-audio-vaes-via-target-kl-regularization/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-taming-audio-vaes-via-target-kl-regularization/</guid>
      <description>音频生成 | 6.5/10</description>
    </item>
    <item>
      <title>Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Three Seconds is Sufficient: A Multi-Pronged Framework for Model-Based Speaker Adaptation in ASR Under Data-Scarce Conditions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>TICL: Text-Embedding KNN for Speech in-Context Learning Unlocks Speech Recognition Abilities of Large Multimodal Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ticl-text-embedding-knn-for-speech-in-context/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ticl-text-embedding-knn-for-speech-in-context/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-lightweight-adaptation-of-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-lightweight-adaptation-of-speech/</guid>
      <description>语音增强 | 8.5/10</description>
    </item>
    <item>
      <title>Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-orthographically-informed-evaluation-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-orthographically-informed-evaluation-of/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Transfer Learning for Paediatric Sleep Apnoea Detection using Physiology-Guided Acoustic Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>UJCodec: An End-to-end Unet-Style Codec for Joint Speech Compression and Enhancement</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ujcodec-an-end-to-end-unet-style-codec-for-joint/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ujcodec-an-end-to-end-unet-style-codec-for-joint/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>UNMIXX: Untangling Highly Correlated Singing Voices Mixtures</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unmixx-untangling-highly-correlated-singing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unmixx-untangling-highly-correlated-singing/</guid>
      <description>语音分离 | 8.5/10</description>
    </item>
    <item>
      <title>Unsupervised Lexicon Learning from Speech is Limited by Representations Rather than Clustering</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-lexicon-learning-from-speech-is/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-lexicon-learning-from-speech-is/</guid>
      <description>语音发现 | 8.0/10</description>
    </item>
    <item>
      <title>Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-Resource Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-29</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29/</guid>
      <description>共分析 29 篇语音/AI 论文</description>
    </item>
    <item>
      <title>Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-dilated-cnns-for-periodic-signal-processing-a-low/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-dilated-cnns-for-periodic-signal-processing-a-low/</guid>
      <description>语音增强 | 6.5/10</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-24</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24/</guid>
      <description>共分析 21 篇语音/AI 论文</description>
    </item>
    <item>
      <title>Enhancing ASR Performance in the Medical Domain for Dravidian Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/</guid>
      <description>这篇论文旨在解决达罗毗荼语言（Telugu和Kannada）在医疗领域自动语音识别（ASR）中面临的标注数据稀缺和语言形态复杂两大挑战。其核心方法是提出一个“置信度感知训练框架”，该框架通过一个混合置信度评分机制（结合静态的感知、声学相似性、WER分数和动态的模型熵），对混合了真实与合成语音的训练数</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-23</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23/</guid>
      <description>共分析 27 篇语音/AI 论文</description>
    </item>
    <item>
      <title>Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/</guid>
      <description>这篇论文旨在解决现有印度语言语音识别（Indic ASR）基准不反映真实场景、评估方法不公平的核心问题。为此，作者构建了“Voice of India”大规模基准，其数据源自3.6万名说话者的非脚本化电话对话，覆盖15种主要印度语言和139个地区集群，总计536小时。关键创新在于采用了考虑拼写变体的</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-22</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/</guid>
      <description>共分析 21 篇语音/AI 论文</description>
    </item>
    <item>
      <title>BhashaSutra: A Task-Centric Unified Survey of Indian NLP Datasets, Corpora, and Resources</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-bhashasutra-a-task-centric-unified-survey-of/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-bhashasutra-a-task-centric-unified-survey-of/</guid>
      <description>这篇论文旨在解决印度语言NLP研究资源分散、缺乏统一概览的痛点。作者首次提出了一个以任务为中心的统一分类体系，系统性地梳理和整合了超过200个数据集、50个基准测试以及100多个模型、工具和系统，覆盖了从核心语言处理（如分词、词性标注）到文本分类、生成翻译、信息检索、语音与多模态，乃至社会文化任务（</description>
    </item>
    <item>
      <title>ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-claricodec-optimising-neural-speech-codes-for/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-claricodec-optimising-neural-speech-codes-for/</guid>
      <description>本文针对卫星、水下通信等超低比特率（200bps）场景下，传统神经语音编解码器因优化重建质量而牺牲可懂度的问题，提出了ClariCodec。其核心方法是将编码器的量化过程重新定义为一个随机策略，并利用强化学习（RL），以词错率（WER）作为奖励信号对编码器进行微调，而冻结解码器等声学重建管线。实验表</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-21</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21/</guid>
      <description>共分析 34 篇语音/AI 论文</description>
    </item>
    <item>
      <title>NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-naijas2st-a-multi-accent-benchmark-for-speech-to/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-naijas2st-a-multi-accent-benchmark-for-speech-to/</guid>
      <description>这篇论文旨在解决非洲低资源语言在语音翻译（S2ST和S2TT）研究中面临的高质量、多口音平行语音数据严重匮乏的核心瓶颈。为此，作者构建了**NaijaS2ST**数据集，涵盖豪萨语、伊博语、约鲁巴语和尼日利亚皮钦语与英语的平行语音，每种语言约50小时，捕获了真实的说话者与口音多样性。基于此数据集，论</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-20</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20/</guid>
      <description>共分析 24 篇语音/AI 论文</description>
    </item>
    <item>
      <title>Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-few-shot-and-pseudo-label-guided-speech-quality/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-few-shot-and-pseudo-label-guided-speech-quality/</guid>
      <description>本文旨在解决非侵入式语音质量评估在标注数据有限场景下的性能瓶颈。作者提出了GatherMOS框架，其核心是将大语言模型（如GPT-5）作为一个元评估器，通过精心设计的文本提示，融合多类异构信号：包括手</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-19</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19/</guid>
      <description>共分析 42 篇语音/AI 论文</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-18</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-18/</link>
      <pubDate>Sat, 18 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-18/</guid>
      <description>共分析 39 篇语音/AI 论文</description>
    </item>
  </channel>
</rss>
