<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>数据增强 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E6%95%B0%E6%8D%AE%E5%A2%9E%E5%BC%BA/</link>
    <description>Recent content in 数据增强 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E6%95%B0%E6%8D%AE%E5%A2%9E%E5%BC%BA/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Consistent Learning Depression Detection Framework Integrating Multi-View Attention</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-consistent-learning-depression-detection/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-consistent-learning-depression-detection/</guid>
      <description>语音生物标志物 | 6.5/10</description>
    </item>
    <item>
      <title>A Framework for Controlled Multi-Speaker Audio Synthesis for Robustness Evaluation of Speaker Diarisation Systems</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-framework-for-controlled-multi-speaker-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-framework-for-controlled-multi-speaker-audio/</guid>
      <description>说话人日志 | 7.5/10</description>
    </item>
    <item>
      <title>A Metric Learning Approach to Heart Murmur Detection from Phonocardiogram Recordings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-metric-learning-approach-to-heart-murmur/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-metric-learning-approach-to-heart-murmur/</guid>
      <description>音频分类 | 7.7/10</description>
    </item>
    <item>
      <title>A Unsupervised Domain Adaptation Framework For Semi-Supervised Melody Extraction Using Confidence Matrix Replace and Nearest Neighbour Supervision</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-unsupervised-domain-adaptation-framework-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-unsupervised-domain-adaptation-framework-for/</guid>
      <description>音乐信息检索 | 8.0/10</description>
    </item>
    <item>
      <title>Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-addressing-gradient-misalignment-in-data/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-addressing-gradient-misalignment-in-data/</guid>
      <description>语音伪造检测 | 7.0/10</description>
    </item>
    <item>
      <title>Advancing Semi-Supervised Child Speech Recognition with Omni-Temporal Classification under Label Noise</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-semi-supervised-child-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-semi-supervised-child-speech/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>ATOM: Adaptive Token-Level Optimal Transport Mixup for Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/</guid>
      <description>语音翻译 | 8.0/10</description>
    </item>
    <item>
      <title>Attentive Masked Self-Distillation for Respiratory Sound Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attentive-masked-self-distillation-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attentive-masked-self-distillation-for/</guid>
      <description>音频分类 | 7.5/10</description>
    </item>
    <item>
      <title>Automatic Music Sample Identification with Multi-Track Contrastive Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-sample-identification-with-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-sample-identification-with-multi/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>Auxiliary Multi-Label Training For Improving the Robustness of Audio Deepfake Detection on AI-Processed Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auxiliary-multi-label-training-for-improving-the/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auxiliary-multi-label-training-for-improving-the/</guid>
      <description>音频深度伪造检测 | 6.5/10</description>
    </item>
    <item>
      <title>Cardiobridge-DM: Bridging Cross-Cohort Heart Sound Synthesis via Rhythm-Aware Semi-Supervised Diffusion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cardiobridge-dm-bridging-cross-cohort-heart-sound/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cardiobridge-dm-bridging-cross-cohort-heart-sound/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Content-Preserving Speech Representation Learning Via Adaptive Segment-Level Alignment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-preserving-speech-representation-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-preserving-speech-representation-learning/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Contrastive Timbre Representations for Musical Instrument And Synthesizer Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contrastive-timbre-representations-for-musical/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contrastive-timbre-representations-for-musical/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>CosyAccent: Duration-Controllable Accent Normalization using Source-Synthesis Training Data</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cosyaccent-duration-controllable-accent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cosyaccent-duration-controllable-accent/</guid>
      <description>语音转换 | 7.8/10</description>
    </item>
    <item>
      <title>CTC-DID: CTC-Based Arabic Dialect Identification for Streaming Applications</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Diff-vs: Efficient Audio-Aware Diffusion U-Net for Vocals Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diff-vs-efficient-audio-aware-diffusion-u-net-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diff-vs-efficient-audio-aware-diffusion-u-net-for/</guid>
      <description>语音分离 | 7.5/10</description>
    </item>
    <item>
      <title>Direct Simultaneous Translation Activation for Large Audio-Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-simultaneous-translation-activation-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-simultaneous-translation-activation-for/</guid>
      <description>语音翻译 | 6.0/10</description>
    </item>
    <item>
      <title>Disentangling Physiology from Fidelity: Latent-Guided Diffusion Models for Cross-Modal Cardiac Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-disentangling-physiology-from-fidelity-latent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-disentangling-physiology-from-fidelity-latent/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissecting-performance-degradation-in-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissecting-performance-degradation-in-audio/</guid>
      <description>音乐源分离 | 7.5/10</description>
    </item>
    <item>
      <title>DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ditsinger-scaling-singing-voice-synthesis-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ditsinger-scaling-singing-voice-synthesis-with/</guid>
      <description>歌唱语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Efficient Depression Detection from Speech via Language-Independent Prompt-Driven Reprogramming</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/</guid>
      <description>语音生物标志物 | 7.5/10</description>
    </item>
    <item>
      <title>EMG-to-Speech with Fewer Channels</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Empowering Multimodal Respiratory Sound Classification with Counterfactual Adversarial Debiasing for Out-of-Distribution Robustness</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-empowering-multimodal-respiratory-sound/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-empowering-multimodal-respiratory-sound/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>Enhancing Dialogue-Related Speech Tasks with Generated Spoken Dialogues</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-dialogue-related-speech-tasks-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-dialogue-related-speech-tasks-with/</guid>
      <description>语音对话系统 | 6.5/10</description>
    </item>
    <item>
      <title>Enhancing Noise Robustness for Neural Speech Codecs Through Resource-Efficient Progressive Quantization Perturbation Simulation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-noise-robustness-for-neural-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-noise-robustness-for-neural-speech/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Expressive Voice Conversion with Controllable Emotional Intensity</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-expressive-voice-conversion-with-controllable/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-expressive-voice-conversion-with-controllable/</guid>
      <description>语音转换 | 7.5/10</description>
    </item>
    <item>
      <title>Fake Speech Wild: Detecting Deepfake Speech on Social Media Platform</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fake-speech-wild-detecting-deepfake-speech-on/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fake-speech-wild-detecting-deepfake-speech-on/</guid>
      <description>语音伪造检测 | 7.0/10</description>
    </item>
    <item>
      <title>FDCNet: Frequency Domain Channel Attention and Convolution for Lipreading</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fdcnet-frequency-domain-channel-attention-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fdcnet-frequency-domain-channel-attention-and/</guid>
      <description>视觉语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>Fine-Tuning Bigvgan-V2 for Robust Musical Tuning Preservation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-bigvgan-v2-for-robust-musical-tuning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-bigvgan-v2-for-robust-musical-tuning/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>Generating Moving 3d Soundscapes with Latent Diffusion Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generating-moving-3d-soundscapes-with-latent/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generating-moving-3d-soundscapes-with-latent/</guid>
      <description>空间音频 | 7.5/10</description>
    </item>
    <item>
      <title>Improving Audio Event Recognition with Consistency Regularization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-event-recognition-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-event-recognition-with/</guid>
      <description>音频事件检测 | 7.0/10</description>
    </item>
    <item>
      <title>Improving Binaural Distance Estimation in Reverberant Rooms Through Contrastive And Multi-Task Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-binaural-distance-estimation-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-binaural-distance-estimation-in/</guid>
      <description>声源定位 | 7.0/10</description>
    </item>
    <item>
      <title>In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word level timestamp predictions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-in-sync-adaptation-of-speech-aware-large-language/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-in-sync-adaptation-of-speech-aware-large-language/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-linearity-in-audio-consistency/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-linearity-in-audio-consistency/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-multiple-speech-enhancers-for-non/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-multiple-speech-enhancers-for-non/</guid>
      <description>模型评估 | 7.5/10</description>
    </item>
    <item>
      <title>Leveraging Text-to-Speech and Voice Conversion as Data Augmentation for Alzheimer&#39;s Disease Detection from Spontaneous Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-text-to-speech-and-voice-conversion-as/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-text-to-speech-and-voice-conversion-as/</guid>
      <description>语音生物标志物 | 7.0/10</description>
    </item>
    <item>
      <title>Lingometer: On-Device Personal Speech Word Counting System</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/</guid>
      <description>语音活动检测 | 8.0/10</description>
    </item>
    <item>
      <title>Localizing Speech Deepfakes Beyond Transitions via Segment-Aware Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-localizing-speech-deepfakes-beyond-transitions/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-localizing-speech-deepfakes-beyond-transitions/</guid>
      <description>音频深度伪造检测 | 8.0/10</description>
    </item>
    <item>
      <title>LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-intra-speaker-variability-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-intra-speaker-variability-in/</guid>
      <description>说话人日志 | 7.0/10</description>
    </item>
    <item>
      <title>Mix2Morph: Learning Sound Morphing from Noisy Mixes</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mix2morph-learning-sound-morphing-from-noisy-mixes/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mix2morph-learning-sound-morphing-from-noisy-mixes/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Multimodal Fusion-Based IPCLIP Network for Mixed Reality Surgical Assistance</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-fusion-based-ipclip-network-for-mixed/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-fusion-based-ipclip-network-for-mixed/</guid>
      <description>多模态模型 | 6.5/10</description>
    </item>
    <item>
      <title>On deepfake voice detection - It’s all in the presentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-deepfake-voice-detection-its-all-in-the/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-deepfake-voice-detection-its-all-in-the/</guid>
      <description>音频深度伪造检测 | 8.0/10</description>
    </item>
    <item>
      <title>PAC: Pronunciation-Aware Contextualized Large Language Model-Based Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pac-pronunciation-aware-contextualized-large/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pac-pronunciation-aware-contextualized-large/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>PC-MCL: Patient-Consistent Multi-Cycle Learning with Multi-Label Bias Correction for Respiratory Sound Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pc-mcl-patient-consistent-multi-cycle-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pc-mcl-patient-consistent-multi-cycle-learning/</guid>
      <description>音频分类 | 7.5/10</description>
    </item>
    <item>
      <title>Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phoneme-level-visual-speech-recognition-via-point/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phoneme-level-visual-speech-recognition-via-point/</guid>
      <description>视觉语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>PromptSep: Generative Audio Separation Via Multimodal Prompting</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-promptsep-generative-audio-separation-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-promptsep-generative-audio-separation-via/</guid>
      <description>语音分离 | 7.5/10</description>
    </item>
    <item>
      <title>Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quantifying-speaker-embedding-phonological-rule/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quantifying-speaker-embedding-phonological-rule/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Refgen: Reference-Guided Synthetic Data Generation for Anomalous Sound Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-refgen-reference-guided-synthetic-data-generation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-refgen-reference-guided-synthetic-data-generation/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-accent-identification-via-voice-conversion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-accent-identification-via-voice-conversion/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>RRPO: Robust Reward Policy Optimization for LLM-Based Emotional TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rrpo-robust-reward-policy-optimization-for-llm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rrpo-robust-reward-policy-optimization-for-llm/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>SA-SSL-MOS: Self-Supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sa-ssl-mos-self-supervised-learning-mos/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sa-ssl-mos-self-supervised-learning-mos/</guid>
      <description>语音质量评估 | 7.0/10</description>
    </item>
    <item>
      <title>Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-ambiguity-augmenting-human-annotation-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-ambiguity-augmenting-human-annotation-in/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-se-dicow-self-enrolled-diarization-conditioned/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-se-dicow-self-enrolled-diarization-conditioned/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>Source Separation For A Cappella Music</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-source-separation-for-a-cappella-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-source-separation-for-a-cappella-music/</guid>
      <description>语音分离 | 6.5/10</description>
    </item>
    <item>
      <title>Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-style-attack-disguise-when-fonts-become-a/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-style-attack-disguise-when-fonts-become-a/</guid>
      <description>对抗样本 | 7.0/10</description>
    </item>
    <item>
      <title>SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synparaspeech-automated-synthesis-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synparaspeech-automated-synthesis-of/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Synthesized Data Selection via Score Distribution Matching for Te Reo Māori Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Synthetic Data Domain Adaptation for ASR via LLM-Based Text and Phonetic Respelling Augmentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-data-domain-adaptation-for-asr-via-llm/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-data-domain-adaptation-for-asr-via-llm/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Three Seconds is Sufficient: A Multi-Pronged Framework for Model-Based Speaker Adaptation in ASR Under Data-Scarce Conditions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Timbre-Aware Audio Difference Captioning for Anomalous Machine Sounds without Paired Training Data via Synthetic Perturbations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-aware-audio-difference-captioning-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-aware-audio-difference-captioning-for/</guid>
      <description>音频分类 | 7.5/10</description>
    </item>
    <item>
      <title>Tldiffgan: A Latent Diffusion-Gan Framework with Temporal Information Fusion for Anomalous Sound Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tldiffgan-a-latent-diffusion-gan-framework-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tldiffgan-a-latent-diffusion-gan-framework-with/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>Towards Blind Data Cleaning: A Case Study in Music Source Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-blind-data-cleaning-a-case-study-in-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-blind-data-cleaning-a-case-study-in-music/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>Towards Distance-Aware Synthetic Audio Mixtures for Universal Sound Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-distance-aware-synthetic-audio-mixtures/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-distance-aware-synthetic-audio-mixtures/</guid>
      <description>语音分离 | 6.5/10</description>
    </item>
    <item>
      <title>Towards Effective Negation Modeling in Joint Audio-Text Models for Music</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-effective-negation-modeling-in-joint/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-effective-negation-modeling-in-joint/</guid>
      <description>音乐理解 | 7.5/10</description>
    </item>
    <item>
      <title>Training-Free Inference-Time Scaling for Audio Source Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-free-inference-time-scaling-for-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-free-inference-time-scaling-for-audio/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>UNMIXX: Untangling Highly Correlated Singing Voices Mixtures</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unmixx-untangling-highly-correlated-singing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unmixx-untangling-highly-correlated-singing/</guid>
      <description>语音分离 | 8.5/10</description>
    </item>
    <item>
      <title>Vioptt: Violin Technique-Aware Transcription from Synthetic Data Augmentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vioptt-violin-technique-aware-transcription-from/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vioptt-violin-technique-aware-transcription-from/</guid>
      <description>音乐信息检索 | 6.5/10</description>
    </item>
    <item>
      <title>WAV2LEV: Predicting Levenshtein Edit Operation Sequences For Fine-Grained Estimation of Automatic Speech Recognition Error</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wav2lev-predicting-levenshtein-edit-operation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wav2lev-predicting-levenshtein-edit-operation/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Meta-Ensemble Learning with Diverse Data Splits for Improved Respiratory Sound Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-meta-ensemble-learning-with-diverse-data-splits/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-meta-ensemble-learning-with-diverse-data-splits/</guid>
      <description>音频分类 | 8.0/10</description>
    </item>
    <item>
      <title>Psychologically-Grounded Graph Modeling for Interpretable Depression Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-psychologically-grounded-graph-modeling-for/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-psychologically-grounded-graph-modeling-for/</guid>
      <description>语音情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-beyond-acoustic-sparsity-and-linguistic-bias-a/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-beyond-acoustic-sparsity-and-linguistic-bias-a/</guid>
      <description>发音错误检测 | 8.5/10</description>
    </item>
    <item>
      <title>Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-transformer-based-rhythm-quantization-of/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-transformer-based-rhythm-quantization-of/</guid>
      <description>音乐信息检索 | 8.0/10</description>
    </item>
    <item>
      <title>Enhancing ASR Performance in the Medical Domain for Dravidian Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/</guid>
      <description>这篇论文旨在解决达罗毗荼语言（Telugu和Kannada）在医疗领域自动语音识别（ASR）中面临的标注数据稀缺和语言形态复杂两大挑战。其核心方法是提出一个“置信度感知训练框架”，该框架通过一个混合置信度评分机制（结合静态的感知、声学相似性、WER分数和动态的模型熵），对混合了真实与合成语音的训练数</description>
    </item>
    <item>
      <title>Enhancing Speaker Verification with Whispered Speech via Post-Processing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-speaker-verification-with-whispered/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-speaker-verification-with-whispered/</guid>
      <description>1. **问题**：耳语语音因缺乏声带振动，其声学特征与正常语音差异显著，导致现有的说话人验证系统性能严重下降。这在用户为保护隐私而低语、或因疾病无法正常发声等实际场景中构成挑战。 2. **方法核心**：在预训练的说话人验证骨干网络（ReDimNet-B6）之上，添加一个轻量级的编码器-解码器结构</description>
    </item>
    <item>
      <title>Audio Spoof Detection with GaborNet</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-audio-spoof-detection-with-gabornet/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-audio-spoof-detection-with-gabornet/</guid>
      <description>本论文旨在解决传统SincNet前端在音频伪造检测中因有限长度sinc函数截断导致的频率泄漏问题。作者提出使用可学习的Gabor滤波器组（GaborNet）替代SincNet，并将其集成到两种先进的端到端检测架构RawNet2和RawGAT-ST中。同时，论文探索了将LEAF（Learnable F</description>
    </item>
    <item>
      <title>Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-benign-fine-tuning-breaks-safety-alignment-in/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-benign-fine-tuning-breaks-safety-alignment-in/</guid>
      <description>这篇论文首次系统研究了良性（无害）音频数据微调对音频大模型安全对齐的破坏作用。**要解决的问题**是：用户出于提升模型性能目的进行的常规微调，是否会无意中破坏模型的安全防护？**方法**上，作者提出了一个基于嵌入空间邻近度的过滤框架，从语义、声学及混合维度，选择性地用与有害内容在表示空间上相近的良性</description>
    </item>
    <item>
      <title>Environmental Sound Deepfake Detection Using Deep-Learning Framework</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-environmental-sound-deepfake-detection-using-deep/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-environmental-sound-deepfake-detection-using-deep/</guid>
      <description>本文针对环境声音（如声音事件、声音场景）的深度伪造检测这一新兴任务，提出了一个系统的深度学习框架。**核心贡献**在于通过大量实验，系统评估了不同频谱图（MEL, CQT, Gammatone）、多种CNN架构（ResNet, Inception等）以及预训练模型（BEATs）在该任务上的表现，并验</description>
    </item>
    <item>
      <title>Still Between Us? Evaluating and Improving Voice Assistant Robustness to Third-Party Interruptions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-still-between-us-evaluating-and-improving-voice/</link>
      <pubDate>Tue, 21 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-still-between-us-evaluating-and-improving-voice/</guid>
      <description>本文旨在解决语音语言模型（SLMs）在真实场景中无法有效区分主要用户与第三方插入语音（Third-Party Interruption, TPI）的问题，这会导致上下文理解失败。为此，作者首先创建了 **TPI-Train**，一个包含8.8万个样本的训练数据集，其核心设计是“说话人感知的难负例”，</description>
    </item>
    <item>
      <title>SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-speakerrpl-v2-robust-open-set-speaker/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-speakerrpl-v2-robust-open-set-speaker/</guid>
      <description>本文旨在解决开放集说话人识别中的鲁棒性问题，即系统在仅有少量目标说话人注册样本的情况下，需同时准确识别已知说话人并可靠拒识未知说话人。作者在先前SpeakerRPL V1框架基础上提出了三项关键改进：</description>
    </item>
  </channel>
</rss>
