<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>多任务学习 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E5%A4%9A%E4%BB%BB%E5%8A%A1%E5%AD%A6%E4%B9%A0/</link>
    <description>Recent content in 多任务学习 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E5%A4%9A%E4%BB%BB%E5%8A%A1%E5%AD%A6%E4%B9%A0/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Task-Aware Dual-Level Self-Supervised Learning Method for Effective Sound Event Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-task-aware-dual-level-self-supervised-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-task-aware-dual-level-self-supervised-learning/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>ACAVCaps: Enabling Large-Scale Training for Fine-Grained and Diverse Audio Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acavcaps-enabling-large-scale-training-for-fine/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acavcaps-enabling-large-scale-training-for-fine/</guid>
      <description>音频分类 | 8.5/10</description>
    </item>
    <item>
      <title>AccLID: Accent-aware Language Identification for Robust Multilingual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Advanced modeling of interlanguage speech intelligibility benefit with L1-L2 multi-task learning using differentiable K-means for accent-robust discrete token-based ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advanced-modeling-of-interlanguage-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advanced-modeling-of-interlanguage-speech/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>An Envelope Separation Aided Multi-Task Learning Model for Blind Source Counting and Localization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-envelope-separation-aided-multi-task-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-envelope-separation-aided-multi-task-learning/</guid>
      <description>声源定位 | 6.5/10</description>
    </item>
    <item>
      <title>Assessing the Impact of Speaker Identity in Speech Spoofing Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-impact-of-speaker-identity-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-impact-of-speaker-identity-in/</guid>
      <description>音频深度伪造检测 | 8.0/10</description>
    </item>
    <item>
      <title>ATOM: Adaptive Token-Level Optimal Transport Mixup for Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/</guid>
      <description>语音翻译 | 8.0/10</description>
    </item>
    <item>
      <title>Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auden-voice-general-purpose-voice-encoder-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auden-voice-general-purpose-voice-encoder-for/</guid>
      <description>语音编码器 | 7.5/10</description>
    </item>
    <item>
      <title>Audio-Visual Feature Fusion for Calibrating Relevance Scores of Video Moment Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-visual-feature-fusion-for-calibrating/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-visual-feature-fusion-for-calibrating/</guid>
      <description>视频片段检索 | 7.0/10</description>
    </item>
    <item>
      <title>Auxiliary Multi-Label Training For Improving the Robustness of Audio Deepfake Detection on AI-Processed Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auxiliary-multi-label-training-for-improving-the/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auxiliary-multi-label-training-for-improving-the/</guid>
      <description>音频深度伪造检测 | 6.5/10</description>
    </item>
    <item>
      <title>Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Brainprint-Modulated Target Speaker Extraction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-brainprint-modulated-target-speaker-extraction/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-brainprint-modulated-target-speaker-extraction/</guid>
      <description>语音分离 | 8.0/10</description>
    </item>
    <item>
      <title>CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Class-Aware Permutation-Invariant Signal-to-Distortion Ratio for Semantic Segmentation of Sound Scene with Same-Class Sources</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-class-aware-permutation-invariant-signal-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-class-aware-permutation-invariant-signal-to/</guid>
      <description>音频场景理解 | 7.5/10</description>
    </item>
    <item>
      <title>CodeSep: Low-Bitrate Codec-Driven Speech Separation with Base-Token Disentanglement and Auxiliary-Token Serial Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-codesep-low-bitrate-codec-driven-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-codesep-low-bitrate-codec-driven-speech/</guid>
      <description>语音分离 | 7.5/10</description>
    </item>
    <item>
      <title>CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-Spoofing Countermeasures</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-compspoof-a-dataset-and-joint-learning-framework/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-compspoof-a-dataset-and-joint-learning-framework/</guid>
      <description>音频深度伪造检测 | 7.0/10</description>
    </item>
    <item>
      <title>Context-Aware Dynamic Graph Learning for Multimodal Emotion Recognition with Missing Modalities</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-context-aware-dynamic-graph-learning-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-context-aware-dynamic-graph-learning-for/</guid>
      <description>语音情感识别 | 8.8/10</description>
    </item>
    <item>
      <title>Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contextual-biasing-for-asr-in-speech-llm-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contextual-biasing-for-asr-in-speech-llm-with/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Cross-Modal Knowledge Distillation for Speech Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-knowledge-distillation-for-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-knowledge-distillation-for-speech/</guid>
      <description>语音大模型 | 7.0/10</description>
    </item>
    <item>
      <title>Decoder-Only Conformer with Modality-Aware Sparse Mixtures of Experts for ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decoder-only-conformer-with-modality-aware-sparse/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decoder-only-conformer-with-modality-aware-sparse/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>DMP-TTS: Disentangled Multi-Modal Prompting for Controllable Text-to-Speech with Chained Guidance</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dmp-tts-disentangled-multi-modal-prompting-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dmp-tts-disentangled-multi-modal-prompting-for/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>DPO-Regularized Regression for Age Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dpo-regularized-regression-for-age-prediction/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dpo-regularized-regression-for-age-prediction/</guid>
      <description>说话人识别 | 7.5/10</description>
    </item>
    <item>
      <title>DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dspast-disentangled-representations-for-spatial/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dspast-disentangled-representations-for-spatial/</guid>
      <description>音频问答 | 8.0/10</description>
    </item>
    <item>
      <title>Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-data-scaling-for-robust-two-stage-user/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-data-scaling-for-robust-two-stage-user/</guid>
      <description>语音活动检测 | 7.5/10</description>
    </item>
    <item>
      <title>Dual-Strategy-Enhanced Conbimamba for Neural Speaker Diarization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-strategy-enhanced-conbimamba-for-neural/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-strategy-enhanced-conbimamba-for-neural/</guid>
      <description>说话人分离 | 8.0/10</description>
    </item>
    <item>
      <title>Dynamic Balanced Cross-Modal Attention with Gated Sequence Restoration: Towards Robust Multimodal Sentiment Analysis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-balanced-cross-modal-attention-with-gated/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-balanced-cross-modal-attention-with-gated/</guid>
      <description>跨模态 | 7.5/10</description>
    </item>
    <item>
      <title>E2E-AEC: Implementing An End-To-End Neural Network Learning Approach for Acoustic Echo Cancellation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-e2e-aec-implementing-an-end-to-end-neural-network/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-e2e-aec-implementing-an-end-to-end-neural-network/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>EEG and Eye-Tracking Driven Dynamic Target Speaker Extraction with Spontaneous Attention Switching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eeg-and-eye-tracking-driven-dynamic-target/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eeg-and-eye-tracking-driven-dynamic-target/</guid>
      <description>语音分离 | 7.0/10</description>
    </item>
    <item>
      <title>EMG-to-Speech with Fewer Channels</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>EmoTri-RL: Emotion- and Cause-Aware Reinforcement Learning for Multi-Modal Empathetic Dialogue</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotri-rl-emotion-and-cause-aware-reinforcement/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotri-rl-emotion-and-cause-aware-reinforcement/</guid>
      <description>语音情感识别 | 7.0/10</description>
    </item>
    <item>
      <title>Enhancing Speech Intelligibility Prediction for Hearing Aids with Complementary Speech Foundation Model Representations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speech-intelligibility-prediction-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speech-intelligibility-prediction-for/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Estimating Respiratory Effort from Nocturnal Breathing Sounds for Obstructive Sleep Apnoea Screening</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-estimating-respiratory-effort-from-nocturnal/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-estimating-respiratory-effort-from-nocturnal/</guid>
      <description>音频分类 | 6.5/10</description>
    </item>
    <item>
      <title>From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-Modal Understanding in Multimodal LLMS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-contrast-to-commonality-audio-commonality/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-contrast-to-commonality-audio-commonality/</guid>
      <description>音频场景理解 | 7.5/10</description>
    </item>
    <item>
      <title>From Diet to Free Lunch: Estimating Auxiliary Signal Properties Using Dynamic Pruning Masks in Speech Enhancement Networks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-diet-to-free-lunch-estimating-auxiliary/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-diet-to-free-lunch-estimating-auxiliary/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>FUSEMOS: Perceptual Evaluation of Text-to-Music Generation with Dual-Encoder Fusion and Ranking-Aware Composite Loss</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Fusion of Multimodal Estimations by Extended State Hidden Markov Model: Application to Fetal Heart Rate Monitoring</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusion-of-multimodal-estimations-by-extended/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusion-of-multimodal-estimations-by-extended/</guid>
      <description>生物声学 | 7.0/10</description>
    </item>
    <item>
      <title>GLUE: Gradient-free Learning to Unify Experts</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/</guid>
      <description>迁移学习 | 6.5/10</description>
    </item>
    <item>
      <title>GRNet: Graph Reconstruction Network for Robust Multimodal Sentiment Analysis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-grnet-graph-reconstruction-network-for-robust/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-grnet-graph-reconstruction-network-for-robust/</guid>
      <description>多模态情感分析 | 7.5/10</description>
    </item>
    <item>
      <title>Hierarchical Activity Recognition and Captioning from Long-Form Audio</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-activity-recognition-and-captioning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-activity-recognition-and-captioning/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>Improving Binaural Distance Estimation in Reverberant Rooms Through Contrastive And Multi-Task Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-binaural-distance-estimation-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-binaural-distance-estimation-in/</guid>
      <description>声源定位 | 7.0/10</description>
    </item>
    <item>
      <title>In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word level timestamp predictions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-in-sync-adaptation-of-speech-aware-large-language/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-in-sync-adaptation-of-speech-aware-large-language/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>InstructAudio: Unified Speech and Music Generation with Natural Language Instruction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instructaudio-unified-speech-and-music-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instructaudio-unified-speech-and-music-generation/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>It Is Personal: The Importance of Personalization for Recognizing Self-Reported Emotion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-it-is-personal-the-importance-of-personalization/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-it-is-personal-the-importance-of-personalization/</guid>
      <description>语音情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>Joint Autoregressive Modeling of Multi-Talker Overlapped Speech Recognition and Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/</guid>
      <description>语音识别 语音翻译 | 7.0/10</description>
    </item>
    <item>
      <title>Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-Task Multi-Scale Network</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-piano-dynamics-and-metrical/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-piano-dynamics-and-metrical/</guid>
      <description>音乐理解 | 7.5/10</description>
    </item>
    <item>
      <title>Malefa: Multi-Granularity Learning and Effective False Alarm Suppression for Zero-Shot Keyword Spotting</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-malefa-multi-granularity-learning-and-effective/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-malefa-multi-granularity-learning-and-effective/</guid>
      <description>零样本关键词检测 | 7.5/10</description>
    </item>
    <item>
      <title>Matrix-Structured Hierarchical Convolutional Modeling for Pronunciation Assessment and Mispronunciation Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matrix-structured-hierarchical-convolutional/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matrix-structured-hierarchical-convolutional/</guid>
      <description>语音评估 | 8.0/10</description>
    </item>
    <item>
      <title>MC-MRX: Reference- and Midi-Guided Music Source Extraction with Contrastive Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mc-mrx-reference-and-midi-guided-music-source/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mc-mrx-reference-and-midi-guided-music-source/</guid>
      <description>音乐源提取 | 7.0/10</description>
    </item>
    <item>
      <title>Melos: Sentence-To-Section Training with Multi-Task Learning for LLM-Driven Song Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-melos-sentence-to-section-training-with-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-melos-sentence-to-section-training-with-multi/</guid>
      <description>音乐生成 | 6.5/10</description>
    </item>
    <item>
      <title>Mixtures of Lightweight Articulatory Experts for Multilingual Asr</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ml-san-multi-level-speaker-adaptive-network-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ml-san-multi-level-speaker-adaptive-network-for/</guid>
      <description>语音情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mnv-17-a-high-quality-performative-mandarin/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mnv-17-a-high-quality-performative-mandarin/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-Token Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/</guid>
      <description>语音翻译 | 8.5/10</description>
    </item>
    <item>
      <title>Multi-Task Learning For Speech Quality Assessment Using ASR-Derived Entropy Features</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-learning-for-speech-quality-assessment/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-learning-for-speech-quality-assessment/</guid>
      <description>语音质量评估 | 7.5/10</description>
    </item>
    <item>
      <title>Multi-Task Transformer for Explainable Speech Deepfake Detection via Formant Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-transformer-for-explainable-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-transformer-for-explainable-speech/</guid>
      <description>语音伪造检测 | 7.5/10</description>
    </item>
    <item>
      <title>NeuroSIFT: A Biologically-Inspired Framework with Explicit Signal-Noise Separation for Robust Multimodal Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neurosift-a-biologically-inspired-framework-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neurosift-a-biologically-inspired-framework-with/</guid>
      <description>多模态情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>Obstructive Sleep Apnea Endotype Prediction During Wakefulness Using Voice Biomarkers</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-obstructive-sleep-apnea-endotype-prediction/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-obstructive-sleep-apnea-endotype-prediction/</guid>
      <description>语音生物标志物 | 6.5/10</description>
    </item>
    <item>
      <title>OMNI-AVSR: Towards Unified Multimodal Speech Recognition With Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-omni-avsr-towards-unified-multimodal-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-omni-avsr-towards-unified-multimodal-speech/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>One Model–Three Tasks: Discovering a Shared Winning Ticket for Low-Complexity Audio Intelligence</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-one-modelthree-tasks-discovering-a-shared-winning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-one-modelthree-tasks-discovering-a-shared-winning/</guid>
      <description>音频分类 | 7.5/10</description>
    </item>
    <item>
      <title>PC-MCL: Patient-Consistent Multi-Cycle Learning with Multi-Label Bias Correction for Respiratory Sound Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pc-mcl-patient-consistent-multi-cycle-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pc-mcl-patient-consistent-multi-cycle-learning/</guid>
      <description>音频分类 | 7.5/10</description>
    </item>
    <item>
      <title>Peeking Into the Future for Contextual Biasing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-peeking-into-the-future-for-contextual-biasing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-peeking-into-the-future-for-contextual-biasing/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Phonological Tokenizer: Prosody-Aware Phonetic Token Via Multi-Objective Fine-Tuning with Differentiable K-Means</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phonological-tokenizer-prosody-aware-phonetic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phonological-tokenizer-prosody-aware-phonetic/</guid>
      <description>语音表示学习 | 8.0/10</description>
    </item>
    <item>
      <title>Probing Whisper for Dysarthric Speech in Detection and Assessment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-whisper-for-dysarthric-speech-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-whisper-for-dysarthric-speech-in/</guid>
      <description>语音生物标志物 | 6.5/10</description>
    </item>
    <item>
      <title>Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prost-llm-progressively-enhancing-the-speech-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prost-llm-progressively-enhancing-the-speech-to/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-purification-before-fusion-toward-mask-free/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-purification-before-fusion-toward-mask-free/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Reference-Aware SFM Layers for Intrusive Intelligibility Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-aware-sfm-layers-for-intrusive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-aware-sfm-layers-for-intrusive/</guid>
      <description>语音评估 | 7.5/10</description>
    </item>
    <item>
      <title>SEP-ST: Incorporating Speech Entity Prompt Into Large Language Models for Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sep-st-incorporating-speech-entity-prompt-into/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sep-st-incorporating-speech-entity-prompt-into/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Session-Level Spoken Language Assessment with A Multimodal Foundation Model Via Multi-Target Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-session-level-spoken-language-assessment-with-a/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-session-level-spoken-language-assessment-with-a/</guid>
      <description>语音评估 | 7.5/10</description>
    </item>
    <item>
      <title>Shared Representation Learning for Reference-Guided Targeted Sound Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shared-representation-learning-for-reference/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shared-representation-learning-for-reference/</guid>
      <description>音频事件检测 | 8.5/10</description>
    </item>
    <item>
      <title>Stress Prediction from Temporal Emotion Trajectories in Clinical Patient-Physician Conversations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/</guid>
      <description>语音情感识别 | 7.0/10</description>
    </item>
    <item>
      <title>Task-Oriented Sound Privacy Preservation for Sound Event Detection Via End-to-End Adversarial Multi-Task Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-oriented-sound-privacy-preservation-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-oriented-sound-privacy-preservation-for/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>Text2Move: Text-To-Moving Sound Generation via Trajectory Prediction and Temporal Alignment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2move-text-to-moving-sound-generation-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2move-text-to-moving-sound-generation-via/</guid>
      <description>空间音频 | 8.0/10</description>
    </item>
    <item>
      <title>Tokenchain: A Discrete Speech Chain via Semantic Token Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tokenchain-a-discrete-speech-chain-via-semantic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tokenchain-a-discrete-speech-chain-via-semantic/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Transfer Learning for Paediatric Sleep Apnoea Detection using Physiology-Guided Acoustic Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>Triad: Tri-Head with Auxiliary Duplicating Permutation Invariant Training for Multi-Task Sound Event Localization and Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triad-tri-head-with-auxiliary-duplicating/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triad-tri-head-with-auxiliary-duplicating/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>TTA: Transcribe, Translate and Alignment for Cross-Lingual Speech Representation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Vioptt: Violin Technique-Aware Transcription from Synthetic Data Augmentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vioptt-violin-technique-aware-transcription-from/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vioptt-violin-technique-aware-transcription-from/</guid>
      <description>音乐信息检索 | 6.5/10</description>
    </item>
    <item>
      <title>Whisper-FEST: Single-Channel Far-Field Enhanced Speech-to-text without Parallel Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-fest-single-channel-far-field-enhanced/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-fest-single-channel-far-field-enhanced/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Whisper-QF: Leveraging Dual Cross-Attention Q-Former for Speech Emotion Recognition With Multi-Task Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-qf-leveraging-dual-cross-attention-q/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-qf-leveraging-dual-cross-attention-q/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-29</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29/</guid>
      <description>共分析 29 篇语音/AI 论文</description>
    </item>
    <item>
      <title>Explicit Dropout: Deterministic Regularization for Transformer Architectures</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-explicit-dropout-deterministic-regularization-for/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-explicit-dropout-deterministic-regularization-for/</guid>
      <description>这篇论文旨在解决传统Dropout方法依赖随机掩码、正则化效果不透明且难以精确控制的问题。其核心方法是提出一种确定性公式，将Dropout重新表述为一个可直接加入训练损失函数的显式正则化项，并推导出了适用于Transformer架构中注意力机制（Q、K、V）和前馈网络的正则化表达式。与已有方法相比，</description>
    </item>
    <item>
      <title>FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-fastturn-unifying-acoustic-and-streaming-semantic/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-fastturn-unifying-acoustic-and-streaming-semantic/</guid>
      <description>这篇论文针对全双工语音对话系统中需要低延迟、高精度判断用户是否结束发言（轮次检测）的难题，提出了FastTurn统一框架。其核心方法是将流式CTC解码提供的快速部分语义信息，与Conformer编码器提取的声学特征，通过适配器输入给大语言模型（LLM）进行推理，并最终融合声学与语义特征进行轮次预测。</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-23</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23/</guid>
      <description>共分析 27 篇语音/AI 论文</description>
    </item>
    <item>
      <title>Incremental learning for audio classification with Hebbian Deep Neural Networks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-incremental-learning-for-audio-classification/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-incremental-learning-for-audio-classification/</guid>
      <description>本文针对音频分类中的增量学习（持续学习）问题，提出了一种受生物启发的解决方案。核心是解决深度学习模型在学习新任务时对旧知识的“灾难性遗忘”。作者首次将**Hebbian学习**（一种基于神经元同步激活的无监督、无反馈学习规则）与**增量学习**相结合，并设计了一个**核塑性**机制。该机制通过分析训</description>
    </item>
    <item>
      <title>SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-self-emo-emotional-self-evolution-from/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-self-emo-emotional-self-evolution-from/</guid>
      <description>本文旨在解决对话系统中情感识别（ERC）与情感表达能力受限于高质量标注数据稀缺且静态的问题。**核心贡献**是提出了一个心理学动机的自我进化框架 **SELF-EMO**。**关键方法**是构建一个角色扮演的自博弈范式，使模型同时充当“情绪识别者”和“对话响应者”，并通过一个“生成-筛选-重用”的数</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-21</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21/</guid>
      <description>共分析 34 篇语音/AI 论文</description>
    </item>
    <item>
      <title>Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-generalizable-audio-visual-navigation-via/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-generalizable-audio-visual-navigation-via/</guid>
      <description>本文旨在解决音频-视觉导航（AVN）智能体在未见环境和未闻声音类别下泛化能力差的核心问题。作者指出，现有方法性能下降主要源于两个因素：一是音频表征混淆了语义与空间信息，导致对未闻声��定位不准；二是强化学习策略过拟合于训练环境的动态和布局。为此，本文提出了一个名为BDATP的即插即用框架。在感知层面</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-20</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20/</guid>
      <description>共分析 24 篇语音/AI 论文</description>
    </item>
  </channel>
</rss>
