<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>知识蒸馏 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E7%9F%A5%E8%AF%86%E8%92%B8%E9%A6%8F/</link>
    <description>Recent content in 知识蒸馏 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E7%9F%A5%E8%AF%86%E8%92%B8%E9%A6%8F/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Advancing Speech Summarization in Multi-Modal LLMs with Reinforcement Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-summarization-in-multi-modal/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-summarization-in-multi-modal/</guid>
      <description>音频问答 | 7.0/10</description>
    </item>
    <item>
      <title>AFT: An Exemplar-Free Class Incremental Learning Method for Environmental Sound Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aft-an-exemplar-free-class-incremental-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aft-an-exemplar-free-class-incremental-learning/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>AMBER2: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-amber2-dual-ambiguity-aware-emotion-recognition/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-amber2-dual-ambiguity-aware-emotion-recognition/</guid>
      <description>语音情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>APKD: Aligned And Paced Knowledge Distillation Towards Lightweight Heterogeneous Multimodal Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-apkd-aligned-and-paced-knowledge-distillation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-apkd-aligned-and-paced-knowledge-distillation/</guid>
      <description>情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>Attention-Weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied To Speech Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-weighted-centered-kernel-alignment-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-weighted-centered-kernel-alignment-for/</guid>
      <description>语音情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>Attentive Masked Self-Distillation for Respiratory Sound Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attentive-masked-self-distillation-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attentive-masked-self-distillation-for/</guid>
      <description>音频分类 | 7.5/10</description>
    </item>
    <item>
      <title>AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auv-teaching-audio-universal-vector-quantization/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auv-teaching-audio-universal-vector-quantization/</guid>
      <description>音频生成 | 8.0/10</description>
    </item>
    <item>
      <title>Cross-Architecture Knowledge Distillation of WavLM for Lightweight Speaker Verification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-architecture-knowledge-distillation-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-architecture-knowledge-distillation-of/</guid>
      <description>说话人验证 | 8.0/10</description>
    </item>
    <item>
      <title>Cross-Modal Knowledge Distillation for Speech Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-knowledge-distillation-for-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-knowledge-distillation-for-speech/</guid>
      <description>语音大模型 | 7.0/10</description>
    </item>
    <item>
      <title>Curriculum Learning with Contrastive Loss for Lightweight Speaker Verification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-curriculum-learning-with-contrastive-loss-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-curriculum-learning-with-contrastive-loss-for/</guid>
      <description>说话人验证 | 6.5/10</description>
    </item>
    <item>
      <title>DBFT-SD: Weakly Supervised Multimodal Detection of Sensitive Audio-Visual Content</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dbft-sd-weakly-supervised-multimodal-detection-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dbft-sd-weakly-supervised-multimodal-detection-of/</guid>
      <description>音频事件检测 | 8.0/10</description>
    </item>
    <item>
      <title>Distilling Attention Knowledge for Speaker Verification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-distilling-attention-knowledge-for-speaker/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-distilling-attention-knowledge-for-speaker/</guid>
      <description>说话人验证 | 8.0/10</description>
    </item>
    <item>
      <title>EchoRAG: A Two-Stage Framework for Audio-Text Retrieval and Temporal Grounding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echorag-a-two-stage-framework-for-audio-text/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echorag-a-two-stage-framework-for-audio-text/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>EdgeSpot: Efficient and High-Performance Few-Shot Model for Keyword Spotting</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-edgespot-efficient-and-high-performance-few-shot/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-edgespot-efficient-and-high-performance-few-shot/</guid>
      <description>语音活动检测 | 7.5/10</description>
    </item>
    <item>
      <title>Enabling Multi-Species Bird Classification on Low-Power Bioacoustic Loggers</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enabling-multi-species-bird-classification-on-low/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enabling-multi-species-bird-classification-on-low/</guid>
      <description>生物声学 | 8.0/10</description>
    </item>
    <item>
      <title>Enhancing Speaker Verification with w2v-BERT 2.0 and Knowledge Distillation Guided Structured Pruning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speaker-verification-with-w2v-bert-20/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speaker-verification-with-w2v-bert-20/</guid>
      <description>说话人验证 | 7.5/10</description>
    </item>
    <item>
      <title>FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-focalcodec-stream-streaming-low-bitrate-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-focalcodec-stream-streaming-low-bitrate-speech/</guid>
      <description>语音编码 | 8.0/10</description>
    </item>
    <item>
      <title>From Hallucination to Articulation: Language Model-Driven Losses for Ultra Low-Bitrate Neural Speech Coding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>GLUE: Gradient-free Learning to Unify Experts</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/</guid>
      <description>迁移学习 | 6.5/10</description>
    </item>
    <item>
      <title>Int-MeanFlow: Few-Step Speech Generation with Integral Velocity Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-int-meanflow-few-step-speech-generation-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-int-meanflow-few-step-speech-generation-with/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Learning to Align with Unbalanced Optimal Transport in Linguistic Knowledge Transfer for ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-to-align-with-unbalanced-optimal/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-to-align-with-unbalanced-optimal/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Lightweight and Generalizable Acoustic Scene Representations Via Contrastive Fine-Tuning and Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-generalizable-acoustic-scene/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-generalizable-acoustic-scene/</guid>
      <description>音频场景理解 | 8.0/10</description>
    </item>
    <item>
      <title>MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large Audio-Language Model</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mi-fuse-label-fusion-for-unsupervised-domain/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mi-fuse-label-fusion-for-unsupervised-domain/</guid>
      <description>语音情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mutual-forcing-dual-mode-self-evolution-for-fast/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mutual-forcing-dual-mode-self-evolution-for-fast/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Prompt-Guided Mixture-of-Experts for Robust Multimodal Sentiment Analysis with Missing Modalities</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prompt-guided-mixture-of-experts-for-robust/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prompt-guided-mixture-of-experts-for-robust/</guid>
      <description>语音情感识别 | 8.5/10</description>
    </item>
    <item>
      <title>S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s-sondo-self-supervised-knowledge-distillation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s-sondo-self-supervised-knowledge-distillation/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>Salad-VAE: Semantic Audio Compression with Language-Audio Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-salad-vae-semantic-audio-compression-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-salad-vae-semantic-audio-compression-with/</guid>
      <description>音频压缩 | 7.5/10</description>
    </item>
    <item>
      <title>Semantic Anchor Transfer from Short to Long Speech in a Distillation-Based Summarization Framework</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-anchor-transfer-from-short-to-long/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-anchor-transfer-from-short-to-long/</guid>
      <description>语音摘要 | 7.5/10</description>
    </item>
    <item>
      <title>SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sightsound-r1-cross-modal-reasoning-distillation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sightsound-r1-cross-modal-reasoning-distillation/</guid>
      <description>音频问答 | 7.5/10</description>
    </item>
    <item>
      <title>Sounds that Shape: Audio-Driven 3D Mesh Generation with Attribute-Decoupled Score Distillation Sampling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sounds-that-shape-audio-driven-3d-mesh-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sounds-that-shape-audio-driven-3d-mesh-generation/</guid>
      <description>音频生成 | 7.0/10</description>
    </item>
    <item>
      <title>SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spade-structured-pruning-and-adaptive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spade-structured-pruning-and-adaptive/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>SpeechCT-CLIP: Distilling Text-Image Knowledge to Speech for Voice-Native Multimodal CT Analysis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechct-clip-distilling-text-image-knowledge-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechct-clip-distilling-text-image-knowledge-to/</guid>
      <description>医疗AI | 7.5/10</description>
    </item>
    <item>
      <title>STACodec: Semantic Token Assignment for Balancing Acoustic Fidelity and Semantic Information in Audio Codecs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stacodec-semantic-token-assignment-for-balancing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stacodec-semantic-token-assignment-for-balancing/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization Via Neural Audio Codec and Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stream-voice-anon-enhancing-utility-of-real-time/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stream-voice-anon-enhancing-utility-of-real-time/</guid>
      <description>语音匿名化 | 7.0/10</description>
    </item>
    <item>
      <title>Target-Speaker LLM-ASR with Speaker-Aware Speech Encoder</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-target-speaker-llm-asr-with-speaker-aware-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-target-speaker-llm-asr-with-speaker-aware-speech/</guid>
      <description>语音识别 | 8.8/10</description>
    </item>
    <item>
      <title>Teacher-Guided Pseudo Supervision and Cross-Modal Alignment for Audio-Visual Video Parsing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teacher-guided-pseudo-supervision-and-cross-modal/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teacher-guided-pseudo-supervision-and-cross-modal/</guid>
      <description>音视频 | 7.0/10</description>
    </item>
    <item>
      <title>Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-Wise Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-audio-models-to-reason-a-unified/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-audio-models-to-reason-a-unified/</guid>
      <description>音频问答 | 7.0/10</description>
    </item>
    <item>
      <title>Teaching the Teachers: Boosting Unsupervised Domain Adaptation In Speech Recognition By Ensemble Update</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-the-teachers-boosting-unsupervised/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-the-teachers-boosting-unsupervised/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Temporal Distillation for Music Representation Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-distillation-for-music-representation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-distillation-for-music-representation/</guid>
      <description>音乐信息检索 | 7.5/10</description>
    </item>
    <item>
      <title>The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-impact-of-audio-watermarking-on-audio-anti/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-impact-of-audio-watermarking-on-audio-anti/</guid>
      <description>音频深度伪造检测 | 8.5/10</description>
    </item>
    <item>
      <title>The Synergistic Role of Audio and Large Video-Language Model in Source-Free Video Domain Adaptation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-synergistic-role-of-audio-and-large-video/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-synergistic-role-of-audio-and-large-video/</guid>
      <description>领域适应 | 7.0/10</description>
    </item>
    <item>
      <title>Triage Knowledge Distillation for Speaker Verification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triage-knowledge-distillation-for-speaker/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triage-knowledge-distillation-for-speaker/</guid>
      <description>说话人验证 | 7.5/10</description>
    </item>
    <item>
      <title>What the student learns in knowledge distillation: A subspace view and evidence on Convolutional Recurrent Network</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-what-the-student-learns-in-knowledge-distillation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-what-the-student-learns-in-knowledge-distillation/</guid>
      <description>语音增强 | 6.5/10</description>
    </item>
    <item>
      <title>Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-hallo-live-real-time-streaming-joint-audio-video/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-hallo-live-real-time-streaming-joint-audio-video/</guid>
      <description>音视频 | 8.5/10</description>
    </item>
    <item>
      <title>Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-beyond-acoustic-sparsity-and-linguistic-bias-a/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-beyond-acoustic-sparsity-and-linguistic-bias-a/</guid>
      <description>发音错误检测 | 8.5/10</description>
    </item>
    <item>
      <title>ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-atrie-adaptive-tuning-for-robust-inference-and/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-atrie-adaptive-tuning-for-robust-inference-and/</guid>
      <description>本文针对现有语音合成系统在生成角色驱动、情感丰富的语音时难以同时保持角色身份一致性和情感表达准确性的问题，提出了ATRIE框架。其核心是**Persona-Prosody Dual-Track (P2-DT) 架构**，将语音生成解耦为静态的**音色轨道**（通过标量量化保持身份锚点）和动态的**韵</description>
    </item>
    <item>
      <title>Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-audio-cogito-towards-deep-audio-reasoning-in/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-audio-cogito-towards-deep-audio-reasoning-in/</guid>
      <description>本文旨在解决大型音频语言模型（LALMs）在复杂音频推理任务中能力不足、推理过程不透明的问题。**核心贡献**是提出了一个名为 **Audio-Cogito** 的完全开源解决方案，其核心是一个四阶段的自动化数据构建管道 **Cogito-Pipe**，用于生成高质量、多样化的音频推理链（CoT）数</description>
    </item>
    <item>
      <title>AVRT: Audio-Visual Reasoning Transfer through Single-Modality Teachers</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-avrt-audio-visual-reasoning-transfer-through/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-avrt-audio-visual-reasoning-transfer-through/</guid>
      <description>本文旨在解决多模态大模型在音视频联合推理任务上缺乏高质量训练数据的核心挑战。**核心贡献**是提出了AVRT框架，通过组合单模态专家模型的能力来合成多模态推理数据。**关键方法**分为两步：1）**数据生成**：使用专门的视觉教师（Kimi-VL-Thinking）和音频教师（Audio Flami</description>
    </item>
    <item>
      <title>HARNESS: Lightweight Distilled Arabic Speech Foundation Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-harness-lightweight-distilled-arabic-speech/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-harness-lightweight-distilled-arabic-speech/</guid>
      <description>这篇论文针对阿拉伯语语音识别、方言识别和情感识别中通用多语言/英语模型性能不足、且大模型难以部署的问题，提出了 HArnESS——一个以阿拉伯语为中心的自监督语音模型家族。作者采用 HuBERT 风格的迭代自蒸馏框架，先在大规模阿拉伯语-英语双语数据（约 23K 小时）上训练 24 层的教师模型 H</description>
    </item>
    <item>
      <title>Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-audio-cogito-towards-deep-audio-reasoning-in/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-audio-cogito-towards-deep-audio-reasoning-in/</guid>
      <description>这篇论文旨在解决大型音频语言模型（LALMs）在复杂音频推理任务上能力不足且依赖昂贵闭源数据的问题。作者提出了一个名为**Audio-Cogito**的全开源解决方案，其核心是**Cogito-Pip</description>
    </item>
    <item>
      <title>On the Distillation Loss Functions of Speech VAE for Unified Reconstruction, Understanding, and Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-on-the-distillation-loss-functions-of-speech-vae/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-on-the-distillation-loss-functions-of-speech-vae/</guid>
      <description>本文针对现有语音变分自编码器（VAE）在统一语音重建、理解和生成任务上表现不平衡的问题（尤其是理解能力差），系统性地研究了蒸馏损失函数的设计空间。作者探索了三种将自监督学习（SSL）模型知识蒸馏到VA</description>
    </item>
    <item>
      <title>Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-why-your-tokenizer-fails-in-information-fusion-a/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-why-your-tokenizer-fails-in-information-fusion-a/</guid>
      <description>这篇论文深入探讨了在端到端音频语言模型中，将视觉信息融入音频分词器时普遍存在的“理解提升但重建质量下降”的核心矛盾。作者通过系统性实验，揭示了三个关键发现：融合位置（在量化前还是量化后）至关重要；在离</description>
    </item>
  </channel>
</rss>
