多任务学习 on 语音/音频论文速递

多任务学习 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E5%A4%9A%E4%BB%BB%E5%8A%A1%E5%AD%A6%E4%B9%A0/ Recent content in 多任务学习 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Task-Aware Dual-Level Self-Supervised Learning Method for Effective Sound Event Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-task-aware-dual-level-self-supervised-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-task-aware-dual-level-self-supervised-learning/ 音频事件检测 | 7.5/10 ACAVCaps: Enabling Large-Scale Training for Fine-Grained and Diverse Audio Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acavcaps-enabling-large-scale-training-for-fine/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acavcaps-enabling-large-scale-training-for-fine/ 音频分类 | 8.5/10 AccLID: Accent-aware Language Identification for Robust Multilingual Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/ 语音识别 | 7.0/10 Advanced modeling of interlanguage speech intelligibility benefit with L1-L2 multi-task learning using differentiable K-means for accent-robust discrete token-based ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advanced-modeling-of-interlanguage-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advanced-modeling-of-interlanguage-speech/ 语音识别 | 7.0/10 An Envelope Separation Aided Multi-Task Learning Model for Blind Source Counting and Localization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-envelope-separation-aided-multi-task-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-envelope-separation-aided-multi-task-learning/ 声源定位 | 6.5/10 Assessing the Impact of Speaker Identity in Speech Spoofing Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-impact-of-speaker-identity-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-impact-of-speaker-identity-in/ 音频深度伪造检测 | 8.0/10 ATOM: Adaptive Token-Level Optimal Transport Mixup for Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/ 语音翻译 | 8.0/10 Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auden-voice-general-purpose-voice-encoder-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auden-voice-general-purpose-voice-encoder-for/ 语音编码器 | 7.5/10 Audio-Visual Feature Fusion for Calibrating Relevance Scores of Video Moment Retrieval https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-visual-feature-fusion-for-calibrating/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-visual-feature-fusion-for-calibrating/ 视频片段检索 | 7.0/10 Auxiliary Multi-Label Training For Improving the Robustness of Audio Deepfake Detection on AI-Processed Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auxiliary-multi-label-training-for-improving-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auxiliary-multi-label-training-for-improving-the/ 音频深度伪造检测 | 6.5/10 Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/ 语音合成 | 7.5/10 Brainprint-Modulated Target Speaker Extraction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-brainprint-modulated-target-speaker-extraction/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-brainprint-modulated-target-speaker-extraction/ 语音分离 | 8.0/10 CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/ 语音识别 | 7.5/10 Class-Aware Permutation-Invariant Signal-to-Distortion Ratio for Semantic Segmentation of Sound Scene with Same-Class Sources https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-class-aware-permutation-invariant-signal-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-class-aware-permutation-invariant-signal-to/ 音频场景理解 | 7.5/10 CodeSep: Low-Bitrate Codec-Driven Speech Separation with Base-Token Disentanglement and Auxiliary-Token Serial Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-codesep-low-bitrate-codec-driven-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-codesep-low-bitrate-codec-driven-speech/ 语音分离 | 7.5/10 CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-Spoofing Countermeasures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-compspoof-a-dataset-and-joint-learning-framework/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-compspoof-a-dataset-and-joint-learning-framework/ 音频深度伪造检测 | 7.0/10 Context-Aware Dynamic Graph Learning for Multimodal Emotion Recognition with Missing Modalities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-context-aware-dynamic-graph-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-context-aware-dynamic-graph-learning-for/ 语音情感识别 | 8.8/10 Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contextual-biasing-for-asr-in-speech-llm-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contextual-biasing-for-asr-in-speech-llm-with/ 语音识别 | 7.0/10 Cross-Modal Knowledge Distillation for Speech Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-knowledge-distillation-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-knowledge-distillation-for-speech/ 语音大模型 | 7.0/10 Decoder-Only Conformer with Modality-Aware Sparse Mixtures of Experts for ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decoder-only-conformer-with-modality-aware-sparse/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decoder-only-conformer-with-modality-aware-sparse/ 语音识别 | 7.5/10 DMP-TTS: Disentangled Multi-Modal Prompting for Controllable Text-to-Speech with Chained Guidance https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dmp-tts-disentangled-multi-modal-prompting-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dmp-tts-disentangled-multi-modal-prompting-for/ 语音合成 | 7.5/10 DPO-Regularized Regression for Age Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dpo-regularized-regression-for-age-prediction/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dpo-regularized-regression-for-age-prediction/ 说话人识别 | 7.5/10 DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dspast-disentangled-representations-for-spatial/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dspast-disentangled-representations-for-spatial/ 音频问答 | 8.0/10 Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-data-scaling-for-robust-two-stage-user/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-data-scaling-for-robust-two-stage-user/ 语音活动检测 | 7.5/10 Dual-Strategy-Enhanced Conbimamba for Neural Speaker Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-strategy-enhanced-conbimamba-for-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-strategy-enhanced-conbimamba-for-neural/ 说话人分离 | 8.0/10 Dynamic Balanced Cross-Modal Attention with Gated Sequence Restoration: Towards Robust Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-balanced-cross-modal-attention-with-gated/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-balanced-cross-modal-attention-with-gated/ 跨模态 | 7.5/10 E2E-AEC: Implementing An End-To-End Neural Network Learning Approach for Acoustic Echo Cancellation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-e2e-aec-implementing-an-end-to-end-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-e2e-aec-implementing-an-end-to-end-neural-network/ 语音增强 | 7.5/10 EEG and Eye-Tracking Driven Dynamic Target Speaker Extraction with Spontaneous Attention Switching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eeg-and-eye-tracking-driven-dynamic-target/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eeg-and-eye-tracking-driven-dynamic-target/ 语音分离 | 7.0/10 EMG-to-Speech with Fewer Channels https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/ 语音合成 | 7.5/10 EmoTri-RL: Emotion- and Cause-Aware Reinforcement Learning for Multi-Modal Empathetic Dialogue https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotri-rl-emotion-and-cause-aware-reinforcement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotri-rl-emotion-and-cause-aware-reinforcement/ 语音情感识别 | 7.0/10 Enhancing Speech Intelligibility Prediction for Hearing Aids with Complementary Speech Foundation Model Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speech-intelligibility-prediction-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speech-intelligibility-prediction-for/ 语音增强 | 7.5/10 Estimating Respiratory Effort from Nocturnal Breathing Sounds for Obstructive Sleep Apnoea Screening https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-estimating-respiratory-effort-from-nocturnal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-estimating-respiratory-effort-from-nocturnal/ 音频分类 | 6.5/10 From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-Modal Understanding in Multimodal LLMS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-contrast-to-commonality-audio-commonality/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-contrast-to-commonality-audio-commonality/ 音频场景理解 | 7.5/10 From Diet to Free Lunch: Estimating Auxiliary Signal Properties Using Dynamic Pruning Masks in Speech Enhancement Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-diet-to-free-lunch-estimating-auxiliary/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-diet-to-free-lunch-estimating-auxiliary/ 语音增强 | 7.5/10 FUSEMOS: Perceptual Evaluation of Text-to-Music Generation with Dual-Encoder Fusion and Ranking-Aware Composite Loss https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/ 音乐生成 | 7.5/10 Fusion of Multimodal Estimations by Extended State Hidden Markov Model: Application to Fetal Heart Rate Monitoring https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusion-of-multimodal-estimations-by-extended/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusion-of-multimodal-estimations-by-extended/ 生物声学 | 7.0/10 GLUE: Gradient-free Learning to Unify Experts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/ 迁移学习 | 6.5/10 GRNet: Graph Reconstruction Network for Robust Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-grnet-graph-reconstruction-network-for-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-grnet-graph-reconstruction-network-for-robust/ 多模态情感分析 | 7.5/10 Hierarchical Activity Recognition and Captioning from Long-Form Audio https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-activity-recognition-and-captioning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-activity-recognition-and-captioning/ 音频事件检测 | 7.5/10 Improving Binaural Distance Estimation in Reverberant Rooms Through Contrastive And Multi-Task Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-binaural-distance-estimation-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-binaural-distance-estimation-in/ 声源定位 | 7.0/10 In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word level timestamp predictions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-in-sync-adaptation-of-speech-aware-large-language/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-in-sync-adaptation-of-speech-aware-large-language/ 语音识别 | 7.0/10 InstructAudio: Unified Speech and Music Generation with Natural Language Instruction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instructaudio-unified-speech-and-music-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instructaudio-unified-speech-and-music-generation/ 语音合成 | 7.5/10 It Is Personal: The Importance of Personalization for Recognizing Self-Reported Emotion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-it-is-personal-the-importance-of-personalization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-it-is-personal-the-importance-of-personalization/ 语音情感识别 | 8.0/10 Joint Autoregressive Modeling of Multi-Talker Overlapped Speech Recognition and Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/ 语音识别语音翻译 | 7.0/10 Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-Task Multi-Scale Network https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-piano-dynamics-and-metrical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-piano-dynamics-and-metrical/ 音乐理解 | 7.5/10 Malefa: Multi-Granularity Learning and Effective False Alarm Suppression for Zero-Shot Keyword Spotting https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-malefa-multi-granularity-learning-and-effective/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-malefa-multi-granularity-learning-and-effective/ 零样本关键词检测 | 7.5/10 Matrix-Structured Hierarchical Convolutional Modeling for Pronunciation Assessment and Mispronunciation Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matrix-structured-hierarchical-convolutional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matrix-structured-hierarchical-convolutional/ 语音评估 | 8.0/10 MC-MRX: Reference- and Midi-Guided Music Source Extraction with Contrastive Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mc-mrx-reference-and-midi-guided-music-source/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mc-mrx-reference-and-midi-guided-music-source/ 音乐源提取 | 7.0/10 Melos: Sentence-To-Section Training with Multi-Task Learning for LLM-Driven Song Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-melos-sentence-to-section-training-with-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-melos-sentence-to-section-training-with-multi/ 音乐生成 | 6.5/10 Mixtures of Lightweight Articulatory Experts for Multilingual Asr https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/ 语音识别 | 7.0/10 ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ml-san-multi-level-speaker-adaptive-network-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ml-san-multi-level-speaker-adaptive-network-for/ 语音情感识别 | 8.0/10 MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mnv-17-a-high-quality-performative-mandarin/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mnv-17-a-high-quality-performative-mandarin/ 语音识别 | 7.5/10 MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-Token Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/ 语音翻译 | 8.5/10 Multi-Task Learning For Speech Quality Assessment Using ASR-Derived Entropy Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-learning-for-speech-quality-assessment/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-learning-for-speech-quality-assessment/ 语音质量评估 | 7.5/10 Multi-Task Transformer for Explainable Speech Deepfake Detection via Formant Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-transformer-for-explainable-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-transformer-for-explainable-speech/ 语音伪造检测 | 7.5/10 NeuroSIFT: A Biologically-Inspired Framework with Explicit Signal-Noise Separation for Robust Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neurosift-a-biologically-inspired-framework-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neurosift-a-biologically-inspired-framework-with/ 多模态情感识别 | 8.0/10 Obstructive Sleep Apnea Endotype Prediction During Wakefulness Using Voice Biomarkers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-obstructive-sleep-apnea-endotype-prediction/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-obstructive-sleep-apnea-endotype-prediction/ 语音生物标志物 | 6.5/10 OMNI-AVSR: Towards Unified Multimodal Speech Recognition With Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-omni-avsr-towards-unified-multimodal-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-omni-avsr-towards-unified-multimodal-speech/ 语音识别 | 8.5/10 One Model–Three Tasks: Discovering a Shared Winning Ticket for Low-Complexity Audio Intelligence https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-one-modelthree-tasks-discovering-a-shared-winning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-one-modelthree-tasks-discovering-a-shared-winning/ 音频分类 | 7.5/10 PC-MCL: Patient-Consistent Multi-Cycle Learning with Multi-Label Bias Correction for Respiratory Sound Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pc-mcl-patient-consistent-multi-cycle-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pc-mcl-patient-consistent-multi-cycle-learning/ 音频分类 | 7.5/10 Peeking Into the Future for Contextual Biasing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-peeking-into-the-future-for-contextual-biasing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-peeking-into-the-future-for-contextual-biasing/ 语音识别 | 7.0/10 Phonological Tokenizer: Prosody-Aware Phonetic Token Via Multi-Objective Fine-Tuning with Differentiable K-Means https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phonological-tokenizer-prosody-aware-phonetic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phonological-tokenizer-prosody-aware-phonetic/ 语音表示学习 | 8.0/10 Probing Whisper for Dysarthric Speech in Detection and Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-whisper-for-dysarthric-speech-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-whisper-for-dysarthric-speech-in/ 语音生物标志物 | 6.5/10 Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/ 语音识别 | 6.5/10 PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prost-llm-progressively-enhancing-the-speech-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prost-llm-progressively-enhancing-the-speech-to/ 语音翻译 | 7.5/10 Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-purification-before-fusion-toward-mask-free/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-purification-before-fusion-toward-mask-free/ 语音识别 | 7.5/10 Reference-Aware SFM Layers for Intrusive Intelligibility Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-aware-sfm-layers-for-intrusive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-aware-sfm-layers-for-intrusive/ 语音评估 | 7.5/10 SEP-ST: Incorporating Speech Entity Prompt Into Large Language Models for Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sep-st-incorporating-speech-entity-prompt-into/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sep-st-incorporating-speech-entity-prompt-into/ 语音翻译 | 7.5/10 Session-Level Spoken Language Assessment with A Multimodal Foundation Model Via Multi-Target Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-session-level-spoken-language-assessment-with-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-session-level-spoken-language-assessment-with-a/ 语音评估 | 7.5/10 Shared Representation Learning for Reference-Guided Targeted Sound Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shared-representation-learning-for-reference/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shared-representation-learning-for-reference/ 音频事件检测 | 8.5/10 Stress Prediction from Temporal Emotion Trajectories in Clinical Patient-Physician Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/ 语音情感识别 | 7.0/10 Task-Oriented Sound Privacy Preservation for Sound Event Detection Via End-to-End Adversarial Multi-Task Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-oriented-sound-privacy-preservation-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-oriented-sound-privacy-preservation-for/ 音频事件检测 | 7.5/10 Text2Move: Text-To-Moving Sound Generation via Trajectory Prediction and Temporal Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2move-text-to-moving-sound-generation-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2move-text-to-moving-sound-generation-via/ 空间音频 | 8.0/10 Tokenchain: A Discrete Speech Chain via Semantic Token Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tokenchain-a-discrete-speech-chain-via-semantic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tokenchain-a-discrete-speech-chain-via-semantic/ 语音识别 | 7.0/10 Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/ 语音识别 | 6.5/10 Transfer Learning for Paediatric Sleep Apnoea Detection using Physiology-Guided Acoustic Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/ 音频分类 | 7.0/10 Triad: Tri-Head with Auxiliary Duplicating Permutation Invariant Training for Multi-Task Sound Event Localization and Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triad-tri-head-with-auxiliary-duplicating/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triad-tri-head-with-auxiliary-duplicating/ 音频事件检测 | 7.5/10 TTA: Transcribe, Translate and Alignment for Cross-Lingual Speech Representation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/ 语音识别 | 7.5/10 Vioptt: Violin Technique-Aware Transcription from Synthetic Data Augmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vioptt-violin-technique-aware-transcription-from/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vioptt-violin-technique-aware-transcription-from/ 音乐信息检索 | 6.5/10 Whisper-FEST: Single-Channel Far-Field Enhanced Speech-to-text without Parallel Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-fest-single-channel-far-field-enhanced/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-fest-single-channel-far-field-enhanced/ 语音识别 | 7.5/10 Whisper-QF: Leveraging Dual Cross-Attention Q-Former for Speech Emotion Recognition With Multi-Task Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-qf-leveraging-dual-cross-attention-q/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-qf-leveraging-dual-cross-attention-q/ 语音情感识别 | 7.5/10 语音/音频论文速递 2026-04-29 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29/ 共分析 29 篇语音/AI 论文 Explicit Dropout: Deterministic Regularization for Transformer Architectures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-explicit-dropout-deterministic-regularization-for/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-explicit-dropout-deterministic-regularization-for/ 这篇论文旨在解决传统Dropout方法依赖随机掩码、正则化效果不透明且难以精确控制的问题。其核心方法是提出一种确定性公式，将Dropout重新表述为一个可直接加入训练损失函数的显式正则化项，并推导出了适用于Transformer架构中注意力机制（Q、K、V）和前馈网络的正则化表达式。与已有方法相比， FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-fastturn-unifying-acoustic-and-streaming-semantic/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-fastturn-unifying-acoustic-and-streaming-semantic/ 这篇论文针对全双工语音对话系统中需要低延迟、高精度判断用户是否结束发言（轮次检测）的难题，提出了FastTurn统一框架。其核心方法是将流式CTC解码提供的快速部分语义信息，与Conformer编码器提取的声学特征，通过适配器输入给大语言模型（LLM）进行推理，并最终融合声学与语义特征进行轮次预测。语音/音频论文速递 2026-04-23 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23/ 共分析 27 篇语音/AI 论文 Incremental learning for audio classification with Hebbian Deep Neural Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-incremental-learning-for-audio-classification/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-incremental-learning-for-audio-classification/ 本文针对音频分类中的增量学习（持续学习）问题，提出了一种受生物启发的解决方案。核心是解决深度学习模型在学习新任务时对旧知识的“灾难性遗忘”。作者首次将**Hebbian学习**（一种基于神经元同步激活的无监督、无反馈学习规则）与**增量学习**相结合，并设计了一个**核塑性**机制。该机制通过分析训 SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-self-emo-emotional-self-evolution-from/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-self-emo-emotional-self-evolution-from/ 本文旨在解决对话系统中情感识别（ERC）与情感表达能力受限于高质量标注数据稀缺且静态的问题。**核心贡献**是提出了一个心理学动机的自我进化框架 **SELF-EMO**。**关键方法**是构建一个角色扮演的自博弈范式，使模型同时充当“情绪识别者”和“对话响应者”，并通过一个“生成-筛选-重用”的数语音/音频论文速递 2026-04-21 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21/ 共分析 34 篇语音/AI 论文 Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-generalizable-audio-visual-navigation-via/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-generalizable-audio-visual-navigation-via/ 本文旨在解决音频-视觉导航（AVN）智能体在未见环境和未闻声音类别下泛化能力差的核心问题。作者指出，现有方法性能下降主要源于两个因素：一是音频表征混淆了语义与空间信息，导致对未闻声��定位不准；二是强化学习策略过拟合于训练环境的动态和布局。为此，本文提出了一个名为BDATP的即插即用框架。在感知层面语音/音频论文速递 2026-04-20 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20/ 共分析 24 篇语音/AI 论文