自监督学习 on 语音/音频论文速递

自监督学习 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E8%87%AA%E7%9B%91%E7%9D%A3%E5%AD%A6%E4%B9%A0/ Recent content in 自监督学习 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-parameter-efficient-multi-scale-convolutional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-parameter-efficient-multi-scale-convolutional/ A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection A Study of Data Selection Strategies for Pre-Training Self-Supervised Speech Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-study-of-data-selection-strategies-for-pre/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-study-of-data-selection-strategies-for-pre/ 语音识别 | 7.5/10 A Superb-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-superb-style-benchmark-of-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-superb-style-benchmark-of-self-supervised/ 音频深度伪造检测 | 7.0/10 A Task-Aware Dual-Level Self-Supervised Learning Method for Effective Sound Event Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-task-aware-dual-level-self-supervised-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-task-aware-dual-level-self-supervised-learning/ 音频事件检测 | 7.5/10 Advanced modeling of interlanguage speech intelligibility benefit with L1-L2 multi-task learning using differentiable K-means for accent-robust discrete token-based ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advanced-modeling-of-interlanguage-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advanced-modeling-of-interlanguage-speech/ 语音识别 | 7.0/10 Advancing Semi-Supervised Child Speech Recognition with Omni-Temporal Classification under Label Noise https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-semi-supervised-child-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-semi-supervised-child-speech/ 语音识别 | 7.5/10 AR&D: A Framework for Retrieving and Describing Concepts for Interpreting AudioLLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ard-a-framework-for-retrieving-and-describing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ard-a-framework-for-retrieving-and-describing/ 音频大模型 | 6.5/10 Ara-BEST-RQ: Multi Dialectal Arabic SSL https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/ 语音识别 | 6.5/10 ARCHI-TTS: A Flow-Matching-Based Text-to-Speech Model with Self-Supervised Semantic Aligner and Accelerated Inference https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/ 语音合成 | 8.0/10 Assessing the Impact of Speaker Identity in Speech Spoofing Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-impact-of-speaker-identity-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-impact-of-speaker-identity-in/ 音频深度伪造检测 | 8.0/10 Attention-Based Encoder-Decoder Target-Speaker Voice Activity Detection for Robust Speaker Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-based-encoder-decoder-target-speaker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-based-encoder-decoder-target-speaker/ 说话人分离 | 8.0/10 Automatic Music Sample Identification with Multi-Track Contrastive Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-sample-identification-with-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-sample-identification-with-multi/ 音频检索 | 7.5/10 AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auv-teaching-audio-universal-vector-quantization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auv-teaching-audio-universal-vector-quantization/ 音频生成 | 8.0/10 Auxiliary Multi-Label Training For Improving the Robustness of Audio Deepfake Detection on AI-Processed Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auxiliary-multi-label-training-for-improving-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auxiliary-multi-label-training-for-improving-the/ 音频深度伪造检测 | 6.5/10 B-GRPO: Unsupervised Speech Emotion Recognition Based on Batched-Group Relative Policy Optimization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-b-grpo-unsupervised-speech-emotion-recognition/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-b-grpo-unsupervised-speech-emotion-recognition/ 语音情感识别 | 6.5/10 BEST-RQ-based Self-Supervised Learning for Whisper Domain Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/ 语音识别 | 7.5/10 BEST-STD 2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-std-20-balanced-and-efficient-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-std-20-balanced-and-efficient-speech/ 音频检索 | 7.5/10 BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/ 语音识别 | 8.0/10 Combining SSL Speech Features, Contextual Transformers and Mamba Models for Realistic Audio Spoofing Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-ssl-speech-features-contextual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-ssl-speech-features-contextual/ 音频深度伪造检测 | 7.5/10 Connecting Layer-Wise Representation of Wavlm with Spectro-Temporal Modulation on Speaker Verification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-connecting-layer-wise-representation-of-wavlm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-connecting-layer-wise-representation-of-wavlm/ 说话人验证 | 6.0/10 Content-Preserving Speech Representation Learning Via Adaptive Segment-Level Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-preserving-speech-representation-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-preserving-speech-representation-learning/ 语音识别 | 7.5/10 Cross-Architecture Knowledge Distillation of WavLM for Lightweight Speaker Verification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-architecture-knowledge-distillation-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-architecture-knowledge-distillation-of/ 说话人验证 | 8.0/10 CTC-DID: CTC-Based Arabic Dialect Identification for Streaming Applications https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/ 语音识别 | 6.5/10 Direct Transfer of Prosody in Speech-to-speech Translation using Disentangled Speech Tokens https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-transfer-of-prosody-in-speech-to-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-transfer-of-prosody-in-speech-to-speech/ 语音翻译 | 7.5/10 Discrete-Continuous Fusion With Adaptive Hierarchical Features For Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-continuous-fusion-with-adaptive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-continuous-fusion-with-adaptive/ 音频深度伪造检测 | 8.0/10 Do Foundational Audio Encoders Understand Music Structure? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-foundational-audio-encoders-understand-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-foundational-audio-encoders-understand-music/ 音乐信息检索 | 7.0/10 Domain-Invariant Representation Learning of Bird Sounds https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-invariant-representation-learning-of-bird/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-invariant-representation-learning-of-bird/ 生物声学 | 6.5/10 Dynamic Spectrogram Analysis with Local-Aware Graph Networks for Audio Anti-Spoofing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-spectrogram-analysis-with-local-aware/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-spectrogram-analysis-with-local-aware/ 音频深度伪造检测 | 8.5/10 ECHO: Frequency-Aware Hierarchical Encoding for Variable-Length Signals https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echo-frequency-aware-hierarchical-encoding-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echo-frequency-aware-hierarchical-encoding-for/ 音频分类 | 9.5/10 ECSA: Dual-Branch Emotion Compensation for Emotion-Consistent Speaker Anonymization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ecsa-dual-branch-emotion-compensation-for-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ecsa-dual-branch-emotion-compensation-for-emotion/ 语音匿名化 | 8.5/10 EdgeSpot: Efficient and High-Performance Few-Shot Model for Keyword Spotting https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-edgespot-efficient-and-high-performance-few-shot/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-edgespot-efficient-and-high-performance-few-shot/ 语音活动检测 | 7.5/10 Efficient Solutions for Mitigating Initialization Bias in Unsupervised Self-Adaptive Auditory Attention Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-solutions-for-mitigating-initialization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-solutions-for-mitigating-initialization/ 听觉注意解码 | 8.5/10 Encoding Emotion Through Self-Supervised Eye Movement Reconstruction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-encoding-emotion-through-self-supervised-eye/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-encoding-emotion-through-self-supervised-eye/ 语音情感识别 | 7.5/10 Enhancing Noise Robustness for Neural Speech Codecs Through Resource-Efficient Progressive Quantization Perturbation Simulation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-noise-robustness-for-neural-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-noise-robustness-for-neural-speech/ 语音增强 | 7.5/10 Evaluating Compositional Structure in Audio Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-compositional-structure-in-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-compositional-structure-in-audio/ 模型评估 | 7.0/10 Exploring SSL Discrete Tokens for Multilingual Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-ssl-discrete-tokens-for-multilingual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-ssl-discrete-tokens-for-multilingual/ 语音识别 | 7.5/10 Expressive Voice Conversion with Controllable Emotional Intensity https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-expressive-voice-conversion-with-controllable/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-expressive-voice-conversion-with-controllable/ 语音转换 | 7.5/10 Fake Speech Wild: Detecting Deepfake Speech on Social Media Platform https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fake-speech-wild-detecting-deepfake-speech-on/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fake-speech-wild-detecting-deepfake-speech-on/ 语音伪造检测 | 7.0/10 Fine-Grained Frame Modeling in Multi-Head Self-Attention for Speech Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-grained-frame-modeling-in-multi-head-self/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-grained-frame-modeling-in-multi-head-self/ 语音伪造检测 | 8.0/10 FinHuBERT: Hierarchical Feature Imitating Networks for Low-Resource Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-finhubert-hierarchical-feature-imitating-networks/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-finhubert-hierarchical-feature-imitating-networks/ 语音识别 | 7.5/10 From Hallucination to Articulation: Language Model-Driven Losses for Ultra Low-Bitrate Neural Speech Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/ 语音合成 | 7.5/10 Frontend Token Enhancement for Token-Based Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frontend-token-enhancement-for-token-based-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frontend-token-enhancement-for-token-based-speech/ 语音识别 | 8.0/10 Hanui: Harnessing Distributional Discrepancies for Singing Voice Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hanui-harnessing-distributional-discrepancies-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hanui-harnessing-distributional-discrepancies-for/ 音频深度伪造检测 | 8.0/10 How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-Resource Transfer https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/ 语音识别 | 6.5/10 Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hybrid-pruning-in-situ-compression-of-self/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hybrid-pruning-in-situ-compression-of-self/ 说话人验证 | 8.0/10 Identifying the Minimal and Maximal Phonetic Subspace of Speech Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identifying-the-minimal-and-maximal-phonetic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identifying-the-minimal-and-maximal-phonetic/ 语音识别 | 8.0/10 Improving Anomalous Sound Detection with Attribute-Aware Representation from Domain-Adaptive Pre-Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-anomalous-sound-detection-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-anomalous-sound-detection-with/ 音频事件检测 | 8.0/10 Improving Audio Event Recognition with Consistency Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-event-recognition-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-event-recognition-with/ 音频事件检测 | 7.0/10 Input-Adaptive Differentiable Filterbanks via Hypernetworks for Robust Speech Processing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-input-adaptive-differentiable-filterbanks-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-input-adaptive-differentiable-filterbanks-via/ 语音识别 | 7.5/10 Is Phase Really Needed for Weakly-Supervised Dereverberation? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-is-phase-really-needed-for-weakly-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-is-phase-really-needed-for-weakly-supervised/ 语音增强 | 6.0/10 KAN We Make Models Simpler for Audio Deepfake Detection with Kolmogorov–Arnold Networks? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-kan-we-make-models-simpler-for-audio-deepfake/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-kan-we-make-models-simpler-for-audio-deepfake/ 音频深度伪造检测 | 7.5/10 Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-audio-visual-data-to-reduce-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-audio-visual-data-to-reduce-the/ 语音识别 | 6.0/10 Leveraging Segment-Level Speech Representations for LLM-Based Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-segment-level-speech-representations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-segment-level-speech-representations/ 语音识别 | 7.0/10 Lightweight and Perceptually-Guided Voice Conversion for Electro-Laryngeal Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-perceptually-guided-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-perceptually-guided-voice/ 语音转换 | 7.5/10 Localizing Speech Deepfakes Beyond Transitions via Segment-Aware Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-localizing-speech-deepfakes-beyond-transitions/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-localizing-speech-deepfakes-beyond-transitions/ 音频深度伪造检测 | 8.0/10 Matrix-Structured Hierarchical Convolutional Modeling for Pronunciation Assessment and Mispronunciation Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matrix-structured-hierarchical-convolutional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matrix-structured-hierarchical-convolutional/ 语音评估 | 8.0/10 Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-measuring-prosody-diversity-in-zero-shot-tts-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-measuring-prosody-diversity-in-zero-shot-tts-a/ 语音合成 | 8.0/10 Mind the Shift: Using Delta SSL Embeddings to Enhance Child ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-the-shift-using-delta-ssl-embeddings-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-the-shift-using-delta-ssl-embeddings-to/ 语音识别 | 7.0/10 MR-FlowDPO: Multi-Reward Direct Preference Optimization for Flow-Matching Text-to-Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/ 音乐生成 | 7.5/10 MT-HuBERT: Self-Supervised Mix-Training for Few-Shot Keyword Spotting in Mixed Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mt-hubert-self-supervised-mix-training-for-few/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mt-hubert-self-supervised-mix-training-for-few/ 关键词检测 | 7.0/10 Multi-Layer Attentive Probing Improves Transfer of Audio Representations for Bioacoustics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-layer-attentive-probing-improves-transfer/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-layer-attentive-probing-improves-transfer/ 生物声学 | 7.5/10 Multi-Scale Physiologically-Motivated Alignment for Auditory Attention Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-scale-physiologically-motivated-alignment/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-scale-physiologically-motivated-alignment/ 听觉注意力解码 | 7.5/10 Multi-View Hierarchical Hypergraph Neural Network for Automatic Stuttering Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-view-hierarchical-hypergraph-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-view-hierarchical-hypergraph-neural-network/ 语音生物标志物 | 7.5/10 On deepfake voice detection - It’s all in the presentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-deepfake-voice-detection-its-all-in-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-deepfake-voice-detection-its-all-in-the/ 音频深度伪造检测 | 8.0/10 Online Register For Dual-Mode Self-Supervised Speech Models: Mitigating the Lack of Future Context https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-online-register-for-dual-mode-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-online-register-for-dual-mode-self-supervised/ 语音识别 | 6.5/10 Optimizing Domain-Adaptive Self-Supervised Learning for Clinical Voice-Based Disease Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-domain-adaptive-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-domain-adaptive-self-supervised/ 语音生物标志物 | 7.0/10 Optimizing Speech Language Models for Acoustic Consistency https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-speech-language-models-for-acoustic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-speech-language-models-for-acoustic/ 语音合成 | 8.0/10 Phonological Tokenizer: Prosody-Aware Phonetic Token Via Multi-Objective Fine-Tuning with Differentiable K-Means https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phonological-tokenizer-prosody-aware-phonetic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phonological-tokenizer-prosody-aware-phonetic/ 语音表示学习 | 8.0/10 Polynomial Mixing for Efficient Self-Supervised Speech Encoders https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/ 语音识别 | 8.0/10 Position-Invariant Fine-Tuning Of Speech Enhancement Models With Self-Supervised Speech Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-position-invariant-fine-tuning-of-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-position-invariant-fine-tuning-of-speech/ 语音增强 | 6.5/10 QE-XVC: Zero-Shot Cross-Lingual Voice Conversion via Query-Enhancement and Conditional Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qe-xvc-zero-shot-cross-lingual-voice-conversion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qe-xvc-zero-shot-cross-lingual-voice-conversion/ 语音转换 | 7.5/10 RASD-SR: A Robust Anomalous Sound Detection Framework with Score Recalibration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rasd-sr-a-robust-anomalous-sound-detection/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rasd-sr-a-robust-anomalous-sound-detection/ 异常声音检测 | 8.5/10 Reading Between the Waves: Robust Topic Segmentation Using Inter-Sentence Audio Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reading-between-the-waves-robust-topic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reading-between-the-waves-robust-topic/ 音频分类 | 7.0/10 Recovering Performance in Speech Emotion Recognition from Discrete Tokens Via Multi-Layer Fusion and Paralinguistic Feature Integration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recovering-performance-in-speech-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recovering-performance-in-speech-emotion/ 语音情感识别 | 6.5/10 Representation-Based Data Quality Audits for Audio https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-based-data-quality-audits-for-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-based-data-quality-audits-for-audio/ 数据集 | 7.5/10 Representation-Diverse Self-Supervision for Cross-Domain Bioacoustic Learning in Low-Resource Settings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/ 生物声学 | 7.0/10 Residual Tokens Enhance Masked Autoencoders for Speech Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-residual-tokens-enhance-masked-autoencoders-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-residual-tokens-enhance-masked-autoencoders-for/ 语音合成 | 7.0/10 Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-accent-identification-via-voice-conversion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-accent-identification-via-voice-conversion/ 语音识别 | 7.5/10 Robust Deepfake Audio Detection via Multi-Level Intermediate Feature Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-deepfake-audio-detection-via-multi-level/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-deepfake-audio-detection-via-multi-level/ 音频深度伪造检测 | 7.5/10 S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s-sondo-self-supervised-knowledge-distillation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s-sondo-self-supervised-knowledge-distillation/ 音频分类 | 7.0/10 SA-SSL-MOS: Self-Supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sa-ssl-mos-self-supervised-learning-mos/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sa-ssl-mos-self-supervised-learning-mos/ 语音质量评估 | 7.0/10 Scaling Spoken Language Models with Syllabic Speech Tokenization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-spoken-language-models-with-syllabic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-spoken-language-models-with-syllabic/ 语音理解 | 7.0/10 SED: Structural Entropy Based Speech Discretization for Discrete Token-Based ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sed-structural-entropy-based-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sed-structural-entropy-based-speech/ 语音识别 | 6.5/10 Self-Supervised Note Tracking and Multi-Pitch Estimation Via Reconstruction-Based Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-self-supervised-note-tracking-and-multi-pitch/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-self-supervised-note-tracking-and-multi-pitch/ 多音高估计音符跟踪 | 8.5/10 Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-Scale Dataset Cleansing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sidon-fast-and-robust-open-source-multilingual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sidon-fast-and-robust-open-source-multilingual/ 语音增强 | 8.5/10 SingMOS-Pro: An Comprehensive Benchmark For Singing Quality Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-singmos-pro-an-comprehensive-benchmark-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-singmos-pro-an-comprehensive-benchmark-for/ 歌唱语音合成 | 7.5/10 SONAR: Self-Distilled Continual Pre-Training for Domain Adaptive Audio Representation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sonar-self-distilled-continual-pre-training-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sonar-self-distilled-continual-pre-training-for/ 音频事件检测 | 7.0/10 Sparse Autoencoders Make Audio Foundation Models More Explainable https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sparse-autoencoders-make-audio-foundation-models/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sparse-autoencoders-make-audio-foundation-models/ 模型评估 | 6.5/10 Sparse-View Visual-Acoustic Latent Learning for Novel-View Audio Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sparse-view-visual-acoustic-latent-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sparse-view-visual-acoustic-latent-learning-for/ 空间音频 | 7.5/10 Spatially Aware Self-Supervised Models for Multi-Channel Neural Speaker Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatially-aware-self-supervised-models-for-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatially-aware-self-supervised-models-for-multi/ 说话人分离 | 8.0/10 Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-quality-based-localization-of-low-quality/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-quality-based-localization-of-low-quality/ 语音质量评估 | 7.0/10 STACodec: Semantic Token Assignment for Balancing Acoustic Fidelity and Semantic Information in Audio Codecs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stacodec-semantic-token-assignment-for-balancing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stacodec-semantic-token-assignment-for-balancing/ 语音识别 | 8.0/10 Temporal Distillation for Music Representation Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-distillation-for-music-representation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-distillation-for-music-representation/ 音乐信息检索 | 7.5/10 Temporal Graph Modeling for Speech Emotion Recognition Using LSTM-Aggregated Multigraph Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-graph-modeling-for-speech-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-graph-modeling-for-speech-emotion/ 语音情感识别 | 7.5/10 Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic Event Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporally-heterogeneous-graph-contrastive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporally-heterogeneous-graph-contrastive/ 音频事件检测 | 8.5/10 The Curious Case of Visual Grounding: Different Effects for Speech-and Text-Based Language Encoders https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-curious-case-of-visual-grounding-different/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-curious-case-of-visual-grounding-different/ 模型评估 | 8.0/10 The Role of Prosodic and Lexical Cues in Turn-Taking with Self-Supervised Speech Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-role-of-prosodic-and-lexical-cues-in-turn/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-role-of-prosodic-and-lexical-cues-in-turn/ 语音对话系统 | 7.5/10 Timbre-Based Pretraining with Pseudo-Labels for Multi-Instrument Automatic Music Transcription https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-based-pretraining-with-pseudo-labels-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-based-pretraining-with-pseudo-labels-for/ 音乐信息检索 | 7.0/10 TinyMU: A Compact Audio-Language Model for Music Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tinymu-a-compact-audio-language-model-for-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tinymu-a-compact-audio-language-model-for-music/ 音乐理解 | 7.5/10 Toward Faithful Explanations in Acoustic Anomaly Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-toward-faithful-explanations-in-acoustic-anomaly/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-toward-faithful-explanations-in-acoustic-anomaly/ 音频事件检测 | 7.5/10 Towards Blind Data Cleaning: A Case Study in Music Source Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-blind-data-cleaning-a-case-study-in-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-blind-data-cleaning-a-case-study-in-music/ 音乐信息检索 | 7.0/10 Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/ 语音识别 | 6.5/10 Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-lightweight-adaptation-of-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-lightweight-adaptation-of-speech/ 语音增强 | 8.5/10 Understanding the Strengths and Weaknesses of SSL Models for Audio Deepfake Model Attribution https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-understanding-the-strengths-and-weaknesses-of-ssl/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-understanding-the-strengths-and-weaknesses-of-ssl/ 音频深度伪造检测 | 7.0/10 Unsupervised Lexicon Learning from Speech is Limited by Representations Rather than Clustering https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-lexicon-learning-from-speech-is/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-lexicon-learning-from-speech-is/ 语音发现 | 8.0/10 VBx for End-to-End Neural and Clustering-Based Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vbx-for-end-to-end-neural-and-clustering-based/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vbx-for-end-to-end-neural-and-clustering-based/ 说话人分离 | 8.5/10 Wave-Trainer-Fit: Neural Vocoder With Trainable Prior And Fixed-Point Iteration Towards High-Quality Speech Generation From SSL Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wave-trainer-fit-neural-vocoder-with-trainable/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wave-trainer-fit-neural-vocoder-with-trainable/ 语音合成 | 7.0/10 WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavesp-net-learnable-wavelet-domain-sparse-prompt/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavesp-net-learnable-wavelet-domain-sparse-prompt/ 语音伪造检测 | 8.0/10 When Audio Matters: A Lightweight, Hierarchical Fusion Model for Speech and Non-Verbal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-audio-matters-a-lightweight-hierarchical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-audio-matters-a-lightweight-hierarchical/ 语音情感识别 | 8.0/10 Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-Resource Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/ 语音识别 | 6.5/10 Speech Enhancement Based on Drifting Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-speech-enhancement-based-on-drifting-models/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-speech-enhancement-based-on-drifting-models/ 语音增强 | 7.5/10 Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-advancing-automatic-speech-recognition-using/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-advancing-automatic-speech-recognition-using/ 语音识别 | 7.0/10 Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-beyond-acoustic-sparsity-and-linguistic-bias-a/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-beyond-acoustic-sparsity-and-linguistic-bias-a/ 发音错误检测 | 8.5/10 Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-identifying-and-typifying-demographic-unfairness/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-identifying-and-typifying-demographic-unfairness/ 语音识别 | 7.0/10 DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-diarizen-explained-a-tutorial-for-the-open-source/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-diarizen-explained-a-tutorial-for-the-open-source/ 说话人分离 | 6.5/10 Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-prosody-as-supervision-bridging-the-non-verbal/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-prosody-as-supervision-bridging-the-non-verbal/ 语音情感识别 | 8.0/10 Time vs. Layer: Locating Predictive Cues for Dysarthric Speech Descriptors in wav2vec 2.0 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-time-vs-layer-locating-predictive-cues-for/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-time-vs-layer-locating-predictive-cues-for/ 语音生物标志物 | 7.0/10 Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-embedding-based-intrusive-evaluation-metrics-for/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-embedding-based-intrusive-evaluation-metrics-for/ 1. **问题**：音乐源分离（MSS）领域常用的客观评估指标（BSS-Eval）与人类感知评分相关性较低，导致模型评估不够准确。 2. **方法核心**：提出两种基于嵌入的侵入式评估指标：在预训练MERT模型的嵌入空间上计算目标与分离信号的均方误差（MSE_MERT）和一种逐曲目的Fréchet音 SAND: The Challenge on Speech Analysis for Neurodegenerative Disease Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-sand-the-challenge-on-speech-analysis-for/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-sand-the-challenge-on-speech-analysis-for/ 1. **解决的问题**：针对神经退行性疾病（特别是肌萎缩侧索硬化症ALS）的早期诊断和监测，缺乏大规模、有临床标注的语音数据集，以及标准化的算法评估框架。 2. **方法核心**：构建并发布了名为SAND的挑战赛，其核心是提供一个扩展的、包含纵向数据的ALS患者与健康对照语音数据集（VOC-A Disentangling Damage from Operational Variability: A Label-Free Self-Supervised Representation Learning Framework for Output-Only Structural Damage Identification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-disentangling-damage-from-operational-variability/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-disentangling-damage-from-operational-variability/ 本文针对结构健康监测中损伤信号易被环境与操作变异掩盖的核心挑战，提出了一种**无标签、自监督的解缠表示学习框架**。该框架采用双流自编码器架构，通过**时间序列重构损失**确保信息完整性，并利用**VICReg自监督损失**（基于假设损伤状态不变的基线期数据）强制损伤敏感表征（`z_dmg`）对操作 Incremental learning for audio classification with Hebbian Deep Neural Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-incremental-learning-for-audio-classification/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-incremental-learning-for-audio-classification/ 本文针对音频分类中的增量学习（持续学习）问题，提出了一种受生物启发的解决方案。核心是解决深度学习模型在学习新任务时对旧知识的“灾难性遗忘”。作者首次将**Hebbian学习**（一种基于神经元同步激活的无监督、无反馈学习规则）与**增量学习**相结合，并设计了一个**核塑性**机制。该机制通过分析训 Neural Encoding Detection is Not All You Need for Synthetic Speech Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-neural-encoding-detection-is-not-all-you-need-for/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-neural-encoding-detection-is-not-all-you-need-for/ 这篇综述论文的核心贡献在于**揭示并论证了当前合成语音检测领域的一个关键误区：过度依赖“神经编码检测”**。论文首先系统回顾了基于SincNet、自监督学习（SSL）和神经编码检测的三类数据驱动方法，指出当前性能最佳的SSL模型实际上主要捕捉的是声码器（vocoder）在波形生成阶段引入的痕迹，而非 Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/ 这篇论文旨在解决低资源多语言语音情感识别（SER）中标注数据稀缺的核心瓶颈。作者提出了一个颠覆性的范式：**将SER重新定义为无监督的“非言语到言语”迁移问题**。其核心假设是，非言语发声（如笑、哭）中蕴含的韵律情感线索比言语更纯粹、更跨语言，因此可以作为更好的监督源。为此，作者设计了**NOVA- Where Do Self-Supervised Speech Models Become Unfair? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-where-do-self-supervised-speech-models-become/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-where-do-self-supervised-speech-models-become/ 这篇论文旨在探究自监督语音模型（S3M）的不公平性究竟在模型的哪个层级产生。研究团队采用了一种轻量级的线性探针方法，在多个S3M（如WavLM, Wav2Vec2, BEST-RQ, Whisper）的每一层嵌入上，同时评估了说话人识别（SID）和自动语音识别（ASR）任务的整体性能及对不同说话人组 HARNESS: Lightweight Distilled Arabic Speech Foundation Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-harness-lightweight-distilled-arabic-speech/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-harness-lightweight-distilled-arabic-speech/ 这篇论文针对阿拉伯语语音识别、方言识别和情感识别中通用多语言/英语模型性能不足、且大模型难以部署的问题，提出了 HArnESS——一个以阿拉伯语为中心的自监督语音模型家族。作者采用 HuBERT 风格的迭代自蒸馏框架，先在大规模阿拉伯语-英语双语数据（约 23K 小时）上训练 24 层的教师模型 H Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-audio-cogito-towards-deep-audio-reasoning-in/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-audio-cogito-towards-deep-audio-reasoning-in/ 这篇论文旨在解决大型音频语言模型（LALMs）在复杂音频推理任务上能力不足且依赖昂贵闭源数据的问题。作者提出了一个名为**Audio-Cogito**的全开源解决方案，其核心是**Cogito-Pip On the Distillation Loss Functions of Speech VAE for Unified Reconstruction, Understanding, and Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-on-the-distillation-loss-functions-of-speech-vae/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-on-the-distillation-loss-functions-of-speech-vae/ 本文针对现有语音变分自编码器（VAE）在统一语音重建、理解和生成任务上表现不平衡的问题（尤其是理解能力差），系统性地研究了蒸馏损失函数的设计空间。作者探索了三种将自监督学习（SSL）模型知识蒸馏到VA ProSDD: Learning Prosodic Representations for Speech Deepfake Detection against Expressive and Emotional Attacks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-prosdd-learning-prosodic-representations-for/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-prosdd-learning-prosodic-representations-for/ 这篇论文旨在解决当前语音深度伪造检测（SDD）系统在面对富有表现力和情感的合成语音攻击时泛化能力不足的核心问题。现有方法过度依赖伪造数据，容易学习数据集特定的伪影，而非自然语音的可迁移特征。为此，作者 Sky-Ear: An Unmanned Aerial Vehicle-Enabled Victim Sound Detection and Localization System https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-sky-ear-an-unmanned-aerial-vehicle-enabled-victim/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-sky-ear-an-unmanned-aerial-vehicle-enabled-victim/ 本文针对无人机搜救任务中视觉系统受遮蔽、能耗高的问题，提出了一个名为“Sky-Ear”的音频驱动受害者检测与定位系统。核心方法是设计了一个基于环形麦克风阵列的两阶段处理框架：在“哨兵阶段”，系统利用单 UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-unipase-a-generative-model-for-universal-speech/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-unipase-a-generative-model-for-universal-speech/ 这篇论文旨在解决通用语音增强（USE）中生成模型面临的“高感知质量”与“低内容幻觉”难以兼得的核心矛盾。作者提出了UniPASE框架，它扩展了其先前的低幻觉PASE模型，以处理包括噪声、混响、丢包、风 X-VC: Zero-shot Streaming Voice Conversion in Codec Space https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-x-vc-zero-shot-streaming-voice-conversion-in/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-x-vc-zero-shot-streaming-voice-conversion-in/ 这篇论文旨在解决零样本语音转换中**高保真说话人迁移**与**低延迟流式推理**难以兼得的核心挑战。作者提出了**X-VC**系统，其核心创新在于**在预训练神经编解码器（SAC）的潜在空间中进行一步