领域适应 on 语音/音频论文速递

领域适应 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E9%A2%86%E5%9F%9F%E9%80%82%E5%BA%94/ Recent content in 领域适应 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Generalization Strategy for Speech Quality Prediction: From Domain-Specific to Unified Datasets https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generalization-strategy-for-speech-quality/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generalization-strategy-for-speech-quality/ 语音质量评估 | 6.5/10 A Robust Multi-Scale Framework with Test-Time Adaptation for sEEG-Based Speech Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-robust-multi-scale-framework-with-test-time/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-robust-multi-scale-framework-with-test-time/ 语音解码 | 7.5/10 A Unsupervised Domain Adaptation Framework For Semi-Supervised Melody Extraction Using Confidence Matrix Replace and Nearest Neighbour Supervision https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-unsupervised-domain-adaptation-framework-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-unsupervised-domain-adaptation-framework-for/ 音乐信息检索 | 8.0/10 AccLID: Accent-aware Language Identification for Robust Multilingual Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/ 语音识别 | 7.0/10 Advancing Semi-Supervised Child Speech Recognition with Omni-Temporal Classification under Label Noise https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-semi-supervised-child-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-semi-supervised-child-speech/ 语音识别 | 7.5/10 Automatic Music Mixing Using a Generative Model of Effect Embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-mixing-using-a-generative-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-mixing-using-a-generative-model/ 音乐生成 | 7.5/10 Bayesian Low-Rank Factorization for Robust Model Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/ 语音识别 | 8.0/10 BEST-RQ-based Self-Supervised Learning for Whisper Domain Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/ 语音识别 | 7.5/10 Beyond Mapping: Domain-Invariant Representations via Spectral Embedding of Optimal Transport Plans https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-mapping-domain-invariant-representations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-mapping-domain-invariant-representations/ 领域适应 | 7.5/10 CCST: Cross-Modal and Consistency-Aware Self-Training for Source-Free Unsupervised Domain Adaptation in Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ccst-cross-modal-and-consistency-aware-self/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ccst-cross-modal-and-consistency-aware-self/ 语音识别 | 7.5/10 Cross-Domain Contrastive Learning with Dynamic Threshold Calibration for Source Speaker Tracing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-domain-contrastive-learning-with-dynamic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-domain-contrastive-learning-with-dynamic/ 说话人验证 | 8.0/10 DDSC: Dynamic Dual-Signal Curriculum for Data-Efficient Acoustic Scene Classification Under Domain Shift https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsc-dynamic-dual-signal-curriculum-for-data/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsc-dynamic-dual-signal-curriculum-for-data/ 音频场景分类 | 7.0/10 DISSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissr-disentangling-speech-representation-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissr-disentangling-speech-representation-for/ 语音增强 | 7.5/10 Domain Partitioning Meets Parameter-Efficient Fine-Tuning: A Novel Method for Improved Language-Queried Audio Source Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-partitioning-meets-parameter-efficient/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-partitioning-meets-parameter-efficient/ 音频分离 | 7.5/10 Domain-Aware Scheduling for ASR Fine-Tuning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-aware-scheduling-for-asr-fine-tuning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-aware-scheduling-for-asr-fine-tuning/ 语音识别 | 6.5/10 Domain-Invariant Representation Learning of Bird Sounds https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-invariant-representation-learning-of-bird/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-invariant-representation-learning-of-bird/ 生物声学 | 6.5/10 Dual Contrastive Learning for Semi-Supervised Domain Adaptation in Bi-Modal Depression Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-contrastive-learning-for-semi-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-contrastive-learning-for-semi-supervised/ 语音生物标志物 | 7.0/10 Dynamic Noise-Aware Multi Lora Framework Towards Real-World Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-noise-aware-multi-lora-framework-towards/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-noise-aware-multi-lora-framework-towards/ 音频深度伪造检测 | 8.0/10 Emo-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emo-tta-improving-test-time-adaptation-of-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emo-tta-improving-test-time-adaptation-of-audio/ 语音情感识别 | 7.0/10 Enhancing Automatic Drum Transcription with Online Dynamic Few-Shot Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-automatic-drum-transcription-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-automatic-drum-transcription-with/ 音乐信息检索 | 7.0/10 FD-ARL: Feature Disentanglement with Adversarial-Reconstruction Learning for Cross-Subject Auditory Attention Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fd-arl-feature-disentanglement-with-adversarial/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fd-arl-feature-disentanglement-with-adversarial/ 听觉注意力解码 | 7.5/10 Fine-Tuning Bigvgan-V2 for Robust Musical Tuning Preservation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-bigvgan-v2-for-robust-musical-tuning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-bigvgan-v2-for-robust-musical-tuning/ 音乐生成 | 7.5/10 Gdiffuse: Diffusion-Based Speech Enhancement with Noise Model Guidance https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gdiffuse-diffusion-based-speech-enhancement-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gdiffuse-diffusion-based-speech-enhancement-with/ 语音增强 | 7.0/10 GLA-GRAD++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gla-grad-an-improved-griffin-lim-guided-diffusion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gla-grad-an-improved-griffin-lim-guided-diffusion/ 语音合成 | 7.5/10 GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gloria-gated-low-rank-interpretable-adaptation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gloria-gated-low-rank-interpretable-adaptation/ 语音识别 | 8.0/10 ICASSP 2026 - 领域适应论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-139/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-139/ 共 2 篇 ICASSP 2026 领域适应方向论文 Improving Anomalous Sound Detection with Attribute-Aware Representation from Domain-Adaptive Pre-Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-anomalous-sound-detection-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-anomalous-sound-detection-with/ 音频事件检测 | 8.0/10 Inverse-Hessian Regularization for Continual Learning in ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inverse-hessian-regularization-for-continual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inverse-hessian-regularization-for-continual/ 语音识别 | 7.5/10 K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-k-function-joint-pronunciation-transcription-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-k-function-joint-pronunciation-transcription-and/ 语音识别 | 7.5/10 Learning Domain-Robust Bioacoustic Representations for Mosquito Species Classification with Contrastive Learning and Distribution Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-domain-robust-bioacoustic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-domain-robust-bioacoustic/ 生物声学 | 7.5/10 Lightweight and Perceptually-Guided Voice Conversion for Electro-Laryngeal Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-perceptually-guided-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-perceptually-guided-voice/ 语音转换 | 7.5/10 Medical ASR Enhancement by Domain-Specific Reinforcement Fine-Tuning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-medical-asr-enhancement-by-domain-specific/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-medical-asr-enhancement-by-domain-specific/ 语音识别 | 6.5/10 MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large Audio-Language Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mi-fuse-label-fusion-for-unsupervised-domain/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mi-fuse-label-fusion-for-unsupervised-domain/ 语音情感识别 | 8.0/10 Optimizing Domain-Adaptive Self-Supervised Learning for Clinical Voice-Based Disease Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-domain-adaptive-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-domain-adaptive-self-supervised/ 语音生物标志物 | 7.0/10 Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/ 语音识别 | 6.5/10 Ranking The Impact of Contextual Specialization in Neural Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ranking-the-impact-of-contextual-specialization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ranking-the-impact-of-contextual-specialization/ 语音增强 | 7.5/10 SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slm-tta-a-framework-for-test-time-adaptation-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slm-tta-a-framework-for-test-time-adaptation-of/ 语音识别 | 7.0/10 SONAR: Self-Distilled Continual Pre-Training for Domain Adaptive Audio Representation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sonar-self-distilled-continual-pre-training-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sonar-self-distilled-continual-pre-training-for/ 音频事件检测 | 7.0/10 SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ssvd-o-parameter-efficient-fine-tuning-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ssvd-o-parameter-efficient-fine-tuning-with/ 语音识别 | 7.0/10 Structure-Aware Diffusion Schrödinger Bridge https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-structure-aware-diffusion-schrdinger-bridge/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-structure-aware-diffusion-schrdinger-bridge/ 数据集对齐 | 7.7/10 Synthetic Data Domain Adaptation for ASR via LLM-Based Text and Phonetic Respelling Augmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-data-domain-adaptation-for-asr-via-llm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-data-domain-adaptation-for-asr-via-llm/ 语音识别 | 8.0/10 Teaching the Teachers: Boosting Unsupervised Domain Adaptation In Speech Recognition By Ensemble Update https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-the-teachers-boosting-unsupervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-the-teachers-boosting-unsupervised/ 语音识别 | 7.0/10 Test Time Adaptation for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-adaptation-for-speech-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-adaptation-for-speech-emotion/ 语音情感识别 | 7.0/10 The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-impact-of-audio-watermarking-on-audio-anti/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-impact-of-audio-watermarking-on-audio-anti/ 音频深度伪造检测 | 8.5/10 The Synergistic Role of Audio and Large Video-Language Model in Source-Free Video Domain Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-synergistic-role-of-audio-and-large-video/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-synergistic-role-of-audio-and-large-video/ 领域适应 | 7.0/10 Towards Fair ASR for Second Language Speakers using Fairness Prompted Finetuning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-fair-asr-for-second-language-speakers/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-fair-asr-for-second-language-speakers/ 语音识别 | 6.5/10 Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/ 语音识别 | 7.5/10 Vioptt: Violin Technique-Aware Transcription from Synthetic Data Augmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vioptt-violin-technique-aware-transcription-from/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vioptt-violin-technique-aware-transcription-from/ 音乐信息检索 | 6.5/10 When Children Talk and Machines Listen: Toward an Interpretable Speech-Based Screener for Dutch Developmental Language Disorder https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-children-talk-and-machines-listen-toward-an/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-children-talk-and-machines-listen-toward-an/ 语音生物标志物 | 7.0/10 Whisper: Courtside Edition - Enhancing ASR Performance through LLM-Driven Context Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-courtside-edition-enhancing-asr/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-courtside-edition-enhancing-asr/ 语音识别 | 6.5/10 Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-prosody-as-supervision-bridging-the-non-verbal/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-prosody-as-supervision-bridging-the-non-verbal/ 语音情感识别 | 8.0/10 Enhancing ASR Performance in the Medical Domain for Dravidian Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/ 这篇论文旨在解决达罗毗荼语言（Telugu和Kannada）在医疗领域自动语音识别（ASR）中面临的标注数据稀缺和语言形态复杂两大挑战。其核心方法是提出一个“置信度感知训练框架”，该框架通过一个混合置信度评分机制（结合静态的感知、声学相似性、WER分数和动态的模型熵），对混合了真实与合成语音的训练数 Enhancing Speaker Verification with Whispered Speech via Post-Processing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-speaker-verification-with-whispered/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-speaker-verification-with-whispered/ 1. **问题**：耳语语音因缺乏声带振动，其声学特征与正常语音差异显著，导致现有的说话人验证系统性能严重下降。这在用户为保护隐私而低语、或因疾病无法正常发声等实际场景中构成挑战。 2. **方法核心**：在预训练的说话人验证骨干网络（ReDimNet-B6）之上，添加一个轻量级的编码器-解码器结构 Tadabur: A Large-Scale Quran Audio Dataset https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-tadabur-a-large-scale-quran-audio-dataset/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-tadabur-a-large-scale-quran-audio-dataset/ 1. **问题**：现有的古兰经语音数据集在规模、诵读者多样性、音频质量和标注深度上存在严重不足，限制了古兰经ASR、诵读者识别等任务的研究进展。 2. **方法核心**：提出Tadabur数据集及其构建流水线。流水线核心是“古兰经经文对齐模块”（AAM），它结合WhisperX进行初步转录，再利用 Tadabur: A Large-Scale Quran Audio Dataset https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-tadabur-a-large-scale-quran-audio-dataset/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-tadabur-a-large-scale-quran-audio-dataset/ 本文旨在解决古兰经语音研究领域缺乏大规模、多样化、细粒度标注数据集的问题。为此，作者提出了**Tadabur**数据集及其自动化构建流水线。该流水线首先从公共平台收集音频，并利用大语言模型（Gemini）从非结构化文本中提取标准化元数据（如章节、朗诵者）。核心步骤是**Ayah Alignment Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/ 这篇论文旨在解决低资源多语言语音情感识别（SER）中标注数据稀缺的核心瓶颈。作者提出了一个颠覆性的范式：**将SER重新定义为无监督的“非言语到言语”迁移问题**。其核心假设是，非言语发声（如笑、哭）中蕴含的韵律情感线索比言语更纯粹、更跨语言，因此可以作为更好的监督源。为此，作者设计了**NOVA- Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-contextual-biasing-for-asr-in-speech-llm-with/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-contextual-biasing-for-asr-in-speech-llm-with/ 这篇论文旨在解决语音大模型（SLLM）在识别训练数据中稀有或未见的“偏置词”时性能不佳的问题。传统方法依赖于为偏置词提供精确的音素序列（通过G2P系统生成），但这对用户有专业要求且工具兼容性差。为此， Who is Speaking or Who is Depressed? A Controlled Study of Speaker Leakage in Speech-Based Depression Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-who-is-speaking-or-who-is-depressed-a-controlled/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-who-is-speaking-or-who-is-depressed-a-controlled/ 这篇论文的核心贡献在于系统性地揭示并量化了语音抑郁症检测模型中普遍存在的“说话人身份泄露”问题。作者指出，当前许多报告高准确率的模型，其性能可能严重依赖于对说话人身份（声纹）的记忆，而非对抑郁相关声学