<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>预训练 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E9%A2%84%E8%AE%AD%E7%BB%83/</link>
    <description>Recent content in 预训练 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E9%A2%84%E8%AE%AD%E7%BB%83/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Study of Data Selection Strategies for Pre-Training Self-Supervised Speech Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-study-of-data-selection-strategies-for-pre/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-study-of-data-selection-strategies-for-pre/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>A Task-Aware Dual-Level Self-Supervised Learning Method for Effective Sound Event Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-task-aware-dual-level-self-supervised-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-task-aware-dual-level-self-supervised-learning/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>ACAVCaps: Enabling Large-Scale Training for Fine-Grained and Diverse Audio Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acavcaps-enabling-large-scale-training-for-fine/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acavcaps-enabling-large-scale-training-for-fine/</guid>
      <description>音频分类 | 8.5/10</description>
    </item>
    <item>
      <title>Advancing LLM-Based Multi-Channel Multi-Speaker Speech Recognition with Global Cross-Channel Attention and Sentence-Ordered First-In First-Out Serialized Output Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-llm-based-multi-channel-multi-speaker/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-llm-based-multi-channel-multi-speaker/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Adversarial Fine-Tuning on Speech Foundation Model with Vulnerable Attention Consistency Regularization for Robust Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adversarial-fine-tuning-on-speech-foundation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adversarial-fine-tuning-on-speech-foundation/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>An Event-Based Sequence Modeling Approach to Recognizing Non-Triad Chords with Oversegmentation Minimization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-event-based-sequence-modeling-approach-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-event-based-sequence-modeling-approach-to/</guid>
      <description>音乐信息检索 | 7.5/10</description>
    </item>
    <item>
      <title>An Unsupervised Alignment Feature Fusion System for Spoken Language-Based Dementia Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-unsupervised-alignment-feature-fusion-system/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-unsupervised-alignment-feature-fusion-system/</guid>
      <description>语音生物标志物 | 7.0/10</description>
    </item>
    <item>
      <title>Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-conditioned-diffusion-llms-for-asr-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-conditioned-diffusion-llms-for-asr-and/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Audio-Guided Multimodal Approach for Fine-Grained Alignment and Boundary Modeling in Active Speaker Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-guided-multimodal-approach-for-fine-grained/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-guided-multimodal-approach-for-fine-grained/</guid>
      <description>说话人检测 | 7.5/10</description>
    </item>
    <item>
      <title>Benchmarking Music Autotagging with MGPHot Expert Annotations vs. Generic Tag Datasets</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-music-autotagging-with-mgphot-expert/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-music-autotagging-with-mgphot-expert/</guid>
      <description>音乐信息检索 | 7.5/10</description>
    </item>
    <item>
      <title>Bimodal Fusion Framework for Dynamic Facial Expression Recognition In-The-Wild</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bimodal-fusion-framework-for-dynamic-facial/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bimodal-fusion-framework-for-dynamic-facial/</guid>
      <description>语音情感识别 | 7.0/10</description>
    </item>
    <item>
      <title>BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Break-the-Beat! Controllable MIDI-to-Drum audio synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-break-the-beat-controllable-midi-to-drum-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-break-the-beat-controllable-midi-to-drum-audio/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Bridging the Semantic Gap: Cross-Attentive Fusion for Joint Acoustic-Semantic Speech Quality Assessment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-semantic-gap-cross-attentive-fusion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-semantic-gap-cross-attentive-fusion/</guid>
      <description>语音质量评估 | 8.5/10</description>
    </item>
    <item>
      <title>CASTELLA: Long Audio Dataset with Captions and Temporal Boundaries</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-castella-long-audio-dataset-with-captions-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-castella-long-audio-dataset-with-captions-and/</guid>
      <description>音频检索 | 8.5/10</description>
    </item>
    <item>
      <title>Combining SSL Speech Features, Contextual Transformers and Mamba Models for Realistic Audio Spoofing Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-ssl-speech-features-contextual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-ssl-speech-features-contextual/</guid>
      <description>音频深度伪造检测 | 7.5/10</description>
    </item>
    <item>
      <title>Contrastive Timbre Representations for Musical Instrument And Synthesizer Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contrastive-timbre-representations-for-musical/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contrastive-timbre-representations-for-musical/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>Cross-Lingual Interleaving for Speech Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-interleaving-for-speech-language/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-interleaving-for-speech-language/</guid>
      <description>语音大模型 | 7.5/10</description>
    </item>
    <item>
      <title>DisContSE: Single-Step Diffusion Speech Enhancement based on Joint Discrete and Continuous Embeddings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discontse-single-step-diffusion-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discontse-single-step-diffusion-speech/</guid>
      <description>语音增强 | 8.5/10</description>
    </item>
    <item>
      <title>Do Foundational Audio Encoders Understand Music Structure?</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-foundational-audio-encoders-understand-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-foundational-audio-encoders-understand-music/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>Does the Pre-Training of an Embedding Influence its Encoding of Age?</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-does-the-pre-training-of-an-embedding-influence/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-does-the-pre-training-of-an-embedding-influence/</guid>
      <description>语音生物标志物 | 7.0/10</description>
    </item>
    <item>
      <title>Domain Partitioning Meets Parameter-Efficient Fine-Tuning: A Novel Method for Improved Language-Queried Audio Source Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-partitioning-meets-parameter-efficient/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-partitioning-meets-parameter-efficient/</guid>
      <description>音频分离 | 7.5/10</description>
    </item>
    <item>
      <title>Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-easy-turn-integrating-acoustic-and-linguistic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-easy-turn-integrating-acoustic-and-linguistic/</guid>
      <description>语音对话系统 | 7.0/10</description>
    </item>
    <item>
      <title>Efficient Audio-Visual Inference Via Token Clustering And Modality Fusion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-audio-visual-inference-via-token/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-audio-visual-inference-via-token/</guid>
      <description>音频问答 | 7.5/10</description>
    </item>
    <item>
      <title>Efficient Depression Detection from Speech via Language-Independent Prompt-Driven Reprogramming</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/</guid>
      <description>语音生物标志物 | 7.5/10</description>
    </item>
    <item>
      <title>Emotional Dimension Control in Language Model-Based Text-To-Speech: Spanning a Broad Spectrum of Human Emotions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Enhancing Speaker Verification with w2v-BERT 2.0 and Knowledge Distillation Guided Structured Pruning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speaker-verification-with-w2v-bert-20/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speaker-verification-with-w2v-bert-20/</guid>
      <description>说话人验证 | 7.5/10</description>
    </item>
    <item>
      <title>Enhancing Speech Intelligibility Prediction for Hearing Aids with Complementary Speech Foundation Model Representations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speech-intelligibility-prediction-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speech-intelligibility-prediction-for/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Exploring How Audio Effects Alter Emotion with Foundation Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-how-audio-effects-alter-emotion-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-how-audio-effects-alter-emotion-with/</guid>
      <description>音乐理解 | 7.0/10</description>
    </item>
    <item>
      <title>FUSEMOS: Perceptual Evaluation of Text-to-Music Generation with Dual-Encoder Fusion and Ranking-Aware Composite Loss</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Gen-SER: When the Generative Model Meets Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>GLAP: General Contrastive Audio-Text Pretraining Across Domains and Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glap-general-contrastive-audio-text-pretraining/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glap-general-contrastive-audio-text-pretraining/</guid>
      <description>音频检索 | 8.5/10</description>
    </item>
    <item>
      <title>GLUE: Gradient-free Learning to Unify Experts</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/</guid>
      <description>迁移学习 | 6.5/10</description>
    </item>
    <item>
      <title>Graph-Biased EEG Transformers for Silent Speech Decoding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-biased-eeg-transformers-for-silent-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-biased-eeg-transformers-for-silent-speech/</guid>
      <description>语音生物标志物 | 6.5/10</description>
    </item>
    <item>
      <title>Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hashing-baseline-rethinking-hashing-in-the-age-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hashing-baseline-rethinking-hashing-in-the-age-of/</guid>
      <description>音频检索 音频分类 | 8.0/10</description>
    </item>
    <item>
      <title>Hierarchical Activity Recognition and Captioning from Long-Form Audio</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-activity-recognition-and-captioning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-activity-recognition-and-captioning/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>High-Fidelity Speech Enhancement Via Discrete Audio Tokens</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-high-fidelity-speech-enhancement-via-discrete/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-high-fidelity-speech-enhancement-via-discrete/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>I-DCCRN-VAE: An Improved Deep Representation Learning Framework for Complex VAE-Based Single-Channel Speech Enhancement</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-i-dccrn-vae-an-improved-deep-representation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-i-dccrn-vae-an-improved-deep-representation/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>ICASSP 2026 - 预训练 论文列表</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-138/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-138/</guid>
      <description>共 1 篇 ICASSP 2026 预训练 方向论文</description>
    </item>
    <item>
      <title>Improving Anomalous Sound Detection with Attribute-Aware Representation from Domain-Adaptive Pre-Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-anomalous-sound-detection-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-anomalous-sound-detection-with/</guid>
      <description>音频事件检测 | 8.0/10</description>
    </item>
    <item>
      <title>Leveraging Large Speech Language Models as Evaluators for Expressive Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-large-speech-language-models-as/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-large-speech-language-models-as/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-multiple-speech-enhancers-for-non/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-multiple-speech-enhancers-for-non/</guid>
      <description>模型评估 | 7.5/10</description>
    </item>
    <item>
      <title>Leveraging Segment-Level Speech Representations for LLM-Based Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-segment-level-speech-representations/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-segment-level-speech-representations/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mispronunciation-detection-and-diagnosis-without/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mispronunciation-detection-and-diagnosis-without/</guid>
      <description>语音评估 | 8.0/10</description>
    </item>
    <item>
      <title>Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-attention-sinks-and-massive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-attention-sinks-and-massive/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Mixture-of-Experts Based Soft-Label Learning for Multi-Label Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-based-soft-label-learning-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-based-soft-label-learning-for/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmaudiosep-taming-video-to-audio-generative-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmaudiosep-taming-video-to-audio-generative-model/</guid>
      <description>语音分离 | 8.0/10</description>
    </item>
    <item>
      <title>Modeling Inter-Segment Relationships in Speech for Dementia Detection with Audio Spectrogram Transformers and Graph Attention Networks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-modeling-inter-segment-relationships-in-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-modeling-inter-segment-relationships-in-speech/</guid>
      <description>语音生物标志物 | 7.0/10</description>
    </item>
    <item>
      <title>MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msf-ser-enriching-acoustic-modeling-with-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msf-ser-enriching-acoustic-modeling-with-multi/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-Token Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/</guid>
      <description>语音翻译 | 8.5/10</description>
    </item>
    <item>
      <title>Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-channel-speech-enhancement-for-cocktail/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-channel-speech-enhancement-for-cocktail/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>Multi-Task Learning For Speech Quality Assessment Using ASR-Derived Entropy Features</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-learning-for-speech-quality-assessment/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-learning-for-speech-quality-assessment/</guid>
      <description>语音质量评估 | 7.5/10</description>
    </item>
    <item>
      <title>Multilingual Supervised Pretraining with Lm-Assisted Decoding for Visual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Multimodal Transformer with Multiperspective Training for Predicting Self-Expression Skills from Video Interview</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-transformer-with-multiperspective/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-transformer-with-multiperspective/</guid>
      <description>多模态模型 | 7.0/10</description>
    </item>
    <item>
      <title>MuseTok: Symbolic Music Tokenization for Generation and Semantic Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musetok-symbolic-music-tokenization-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musetok-symbolic-music-tokenization-for/</guid>
      <description>音乐生成 | 8.5/10</description>
    </item>
    <item>
      <title>Noise-Robust AV-ASR Using Visual Features both in the Whisper Encoder and Decoder</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-robust-av-asr-using-visual-features-both-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-robust-av-asr-using-visual-features-both-in/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>On deepfake voice detection - It’s all in the presentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-deepfake-voice-detection-its-all-in-the/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-deepfake-voice-detection-its-all-in-the/</guid>
      <description>音频深度伪造检测 | 8.0/10</description>
    </item>
    <item>
      <title>Online Register For Dual-Mode Self-Supervised Speech Models: Mitigating the Lack of Future Context</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-online-register-for-dual-mode-self-supervised/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-online-register-for-dual-mode-self-supervised/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>PADAM: Perceptual Audio Defect Assessment Model</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-padam-perceptual-audio-defect-assessment-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-padam-perceptual-audio-defect-assessment-model/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>Probing the Hidden Talent of ASR foundation models for L2 English Oral Assessment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-the-hidden-talent-of-asr-foundation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-the-hidden-talent-of-asr-foundation/</guid>
      <description>预训练 | 7.5/10</description>
    </item>
    <item>
      <title>Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for Voicemos 2024</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quality-assessment-of-noisy-and-enhanced-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quality-assessment-of-noisy-and-enhanced-speech/</guid>
      <description>语音质量评估 | 7.0/10</description>
    </item>
    <item>
      <title>RASD-SR: A Robust Anomalous Sound Detection Framework with Score Recalibration</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rasd-sr-a-robust-anomalous-sound-detection/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rasd-sr-a-robust-anomalous-sound-detection/</guid>
      <description>异常声音检测 | 8.5/10</description>
    </item>
    <item>
      <title>Reading Between the Waves: Robust Topic Segmentation Using Inter-Sentence Audio Features</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reading-between-the-waves-robust-topic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reading-between-the-waves-robust-topic/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>Reasoning Driven Captions to Assist Noise Robust Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reasoning-driven-captions-to-assist-noise-robust/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reasoning-driven-captions-to-assist-noise-robust/</guid>
      <description>语音情感识别 | 7.0/10</description>
    </item>
    <item>
      <title>Recovering Performance in Speech Emotion Recognition from Discrete Tokens Via Multi-Layer Fusion and Paralinguistic Feature Integration</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recovering-performance-in-speech-emotion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recovering-performance-in-speech-emotion/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>Reference-Aware SFM Layers for Intrusive Intelligibility Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-aware-sfm-layers-for-intrusive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-aware-sfm-layers-for-intrusive/</guid>
      <description>语音评估 | 7.5/10</description>
    </item>
    <item>
      <title>SAASDNet: An EEG-Based Streaming Auditory Attention Switch Decoding Network for Self-Initiated Attention Switching in Mixed Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saasdnet-an-eeg-based-streaming-auditory/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saasdnet-an-eeg-based-streaming-auditory/</guid>
      <description>脑机接口 | 8.0/10</description>
    </item>
    <item>
      <title>SAUNA: Song-Level Audio &amp; User-Listening Data Neural Alignment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sauna-song-level-audio-user-listening-data-neural/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sauna-song-level-audio-user-listening-data-neural/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-multi-talker-asr-with-speaker-agnostic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-multi-talker-asr-with-speaker-agnostic/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-se-dicow-self-enrolled-diarization-conditioned/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-se-dicow-self-enrolled-diarization-conditioned/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>Shared Representation Learning for Reference-Guided Targeted Sound Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shared-representation-learning-for-reference/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shared-representation-learning-for-reference/</guid>
      <description>音频事件检测 | 8.5/10</description>
    </item>
    <item>
      <title>SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slap-scalable-language-audio-pretraining-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slap-scalable-language-audio-pretraining-with/</guid>
      <description>音频检索 | 8.0/10</description>
    </item>
    <item>
      <title>SmoothCLAP: Soft-Target Enhanced Contrastive Language-Audio Pretraining for Affective Computing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-smoothclap-soft-target-enhanced-contrastive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-smoothclap-soft-target-enhanced-contrastive/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>SONAR: Self-Distilled Continual Pre-Training for Domain Adaptive Audio Representation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sonar-self-distilled-continual-pre-training-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sonar-self-distilled-continual-pre-training-for/</guid>
      <description>音频事件检测 | 7.0/10</description>
    </item>
    <item>
      <title>SPAM: Style Prompt Adherence Metric for Prompt-Based TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spam-style-prompt-adherence-metric-for-prompt/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spam-style-prompt-adherence-metric-for-prompt/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speaking-clearly-a-simplified-whisper-based-codec/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speaking-clearly-a-simplified-whisper-based-codec/</guid>
      <description>语音编码 | 7.5/10</description>
    </item>
    <item>
      <title>Speech Emotion Recognition based on Hierarchical Transformer with Shifted Windows</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-emotion-recognition-based-on-hierarchical/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-emotion-recognition-based-on-hierarchical/</guid>
      <description>语音情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>SpeechMapper: Speech-To-Text Embedding Projector for LLMs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechmapper-speech-to-text-embedding-projector/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechmapper-speech-to-text-embedding-projector/</guid>
      <description>语音大模型 | 7.0/10</description>
    </item>
    <item>
      <title>Syncspeech: Efficient and Low-Latency Text-to-Speech Based on Temporal Masked Transformer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-syncspeech-efficient-and-low-latency-text-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-syncspeech-efficient-and-low-latency-text-to/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>TAGARELA - A Portuguese Speech Dataset from Podcasts</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/</guid>
      <description>语音识别 语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>TASU: Text-only Alignment for Speech Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tasu-text-only-alignment-for-speech-understanding/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tasu-text-only-alignment-for-speech-understanding/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Test Time Adaptation for Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-adaptation-for-speech-emotion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-adaptation-for-speech-emotion/</guid>
      <description>语音情感识别 | 7.0/10</description>
    </item>
    <item>
      <title>Text2Move: Text-To-Moving Sound Generation via Trajectory Prediction and Temporal Alignment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2move-text-to-moving-sound-generation-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2move-text-to-moving-sound-generation-via/</guid>
      <description>空间音频 | 8.0/10</description>
    </item>
    <item>
      <title>The 3rd Clarity Prediction Challenge: A Machine Learning Challenge for Hearing aid Speech Intelligibility Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-3rd-clarity-prediction-challenge-a-machine/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-3rd-clarity-prediction-challenge-a-machine/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>The Synergistic Role of Audio and Large Video-Language Model in Source-Free Video Domain Adaptation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-synergistic-role-of-audio-and-large-video/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-synergistic-role-of-audio-and-large-video/</guid>
      <description>领域适应 | 7.0/10</description>
    </item>
    <item>
      <title>Thinking While Listening: Simple Test Time Scaling for Audio Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-thinking-while-listening-simple-test-time-scaling/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-thinking-while-listening-simple-test-time-scaling/</guid>
      <description>音频分类 | 6.5/10</description>
    </item>
    <item>
      <title>Timbre-Based Pretraining with Pseudo-Labels for Multi-Instrument Automatic Music Transcription</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-based-pretraining-with-pseudo-labels-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-based-pretraining-with-pseudo-labels-for/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>Tldiffgan: A Latent Diffusion-Gan Framework with Temporal Information Fusion for Anomalous Sound Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tldiffgan-a-latent-diffusion-gan-framework-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tldiffgan-a-latent-diffusion-gan-framework-with/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>Tpeformer: Temporal Patch Embedding Transformer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tpeformer-temporal-patch-embedding-transformer/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tpeformer-temporal-patch-embedding-transformer/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>Training-Free Inference-Time Scaling for Audio Source Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-free-inference-time-scaling-for-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-free-inference-time-scaling-for-audio/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Tri-Attention Fusion: Joint Temporal-Spectral and Bidirectional Modeling for Speech Spoofing Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tri-attention-fusion-joint-temporal-spectral-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tri-attention-fusion-joint-temporal-spectral-and/</guid>
      <description>语音伪造检测 | 7.0/10</description>
    </item>
    <item>
      <title>WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavesp-net-learnable-wavelet-domain-sparse-prompt/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavesp-net-learnable-wavelet-domain-sparse-prompt/</guid>
      <description>语音伪造检测 | 8.0/10</description>
    </item>
    <item>
      <title>WavLink: Compact Audio–Text Embeddings with a Global Whisper Token</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavlink-compact-audiotext-embeddings-with-a/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavlink-compact-audiotext-embeddings-with-a/</guid>
      <description>音频检索 | 8.0/10</description>
    </item>
    <item>
      <title>An event-based sequence modeling approach to recognizing non-triad chords with oversegmentation minimization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-an-event-based-sequence-modeling-approach-to/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-an-event-based-sequence-modeling-approach-to/</guid>
      <description>音乐理解 | 7.5/10</description>
    </item>
    <item>
      <title>Scaling Properties of Continuous Diffusion Spoken Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-scaling-properties-of-continuous-diffusion-spoken/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-scaling-properties-of-continuous-diffusion-spoken/</guid>
      <description>语音生成 | 8.0/10</description>
    </item>
    <item>
      <title>DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-diarizen-explained-a-tutorial-for-the-open-source/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-diarizen-explained-a-tutorial-for-the-open-source/</guid>
      <description>说话人分离 | 6.5/10</description>
    </item>
    <item>
      <title>Misinformation Span Detection in Videos via Audio Transcripts</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-misinformation-span-detection-in-videos-via-audio/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-misinformation-span-detection-in-videos-via-audio/</guid>
      <description>音频安全 | 7.5/10</description>
    </item>
    <item>
      <title>Environmental Sound Deepfake Detection Using Deep-Learning Framework</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-environmental-sound-deepfake-detection-using-deep/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-environmental-sound-deepfake-detection-using-deep/</guid>
      <description>1.  **问题**：针对环境声音（包括声音场景和声音事件）的深度伪造检测（ESDD）任务，现有研究不足，且尚不清楚声音场景与声音事件的伪造检测是否需要不同模型。 2.  **方法核心**：提出一个深度学习框架，核心是采用预训练的音频模型（BEATs）作为特征提取器，并结合一种三阶段训练策略（包含对</description>
    </item>
    <item>
      <title>Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-indic-codecfake-meets-satyam-towards-detecting/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-indic-codecfake-meets-satyam-towards-detecting/</guid>
      <description>1.  **问题**：现有针对基于神经音频编解码器的语音深度伪造（CodecFake）检测的研究主要集中在英语和中文，对于语言多样性极高的印度语言缺乏大规模的基准数据集和有效的检测方法。 2.  **方法**：作者构建了首个大规模印度语言CodecFake数据集（ICF），并提出了一个名为SATYA</description>
    </item>
    <item>
      <title>MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-move-translating-laughter-and-tears-via-mixture/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-move-translating-laughter-and-tears-via-mixture/</guid>
      <description>这篇论文旨在解决语音到语音翻译（S2ST）系统普遍丢失源语音中非语言声音（如笑声、哭声）和情感信息的问题，这严重影响了跨语言交流的自然度和准确性。为此，作者提出了三项核心贡献：首先，设计了一个可扩展的自动化数据合成管道，用于生成大规模、高质量的英中富有表现力S2ST平行语料，克服了训练数据稀缺的瓶颈</description>
    </item>
    <item>
      <title>Environmental Sound Deepfake Detection Using Deep-Learning Framework</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-environmental-sound-deepfake-detection-using-deep/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-environmental-sound-deepfake-detection-using-deep/</guid>
      <description>本文针对环境声音（如声音事件、声音场景）的深度伪造检测这一新兴任务，提出了一个系统的深度学习框架。**核心贡献**在于通过大量实验，系统评估了不同频谱图（MEL, CQT, Gammatone）、多种CNN架构（ResNet, Inception等）以及预训练模型（BEATs）在该任务上的表现，并验</description>
    </item>
    <item>
      <title>Qwen3.5-Omni Technical Report</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-qwen35-omni-technical-report/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-qwen35-omni-technical-report/</guid>
      <description>这篇技术报告全面介绍了Qwen3.5-Omni，一个能够统一理解与生成文本、图像、音频和音视频内容的全模态大语言模型。**要解决的问题**是现有模型在实时交互、跨模态推理和自主智能体行为方面的局限性。**采用的方法**是基于“思考者-说话者”架构，引入了多项关键创新：1）思考者和说话者均采用混合注意</description>
    </item>
    <item>
      <title>ProSDD: Learning Prosodic Representations for Speech Deepfake Detection against Expressive and Emotional Attacks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-prosdd-learning-prosodic-representations-for/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-prosdd-learning-prosodic-representations-for/</guid>
      <description>这篇论文旨在解决当前语音深度伪造检测（SDD）系统在面对富有表现力和情感的合成语音攻击时泛化能力不足的核心问题。现有方法过度依赖伪造数据，容易学习数据集特定的伪影，而非自然语音的可迁移特征。为此，作者</description>
    </item>
  </channel>
</rss>
