<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>对比学习 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E5%AF%B9%E6%AF%94%E5%AD%A6%E4%B9%A0/</link>
    <description>Recent content in 对比学习 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E5%AF%B9%E6%AF%94%E5%AD%A6%E4%B9%A0/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Hybrid Convolution-Mamba Network with Tone-Octave Contrastive Learning for Stratified Semi-Supervised Singing Melody Extraction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-hybrid-convolution-mamba-network-with-tone/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-hybrid-convolution-mamba-network-with-tone/</guid>
      <description>歌唱旋律提取 | 7.5/10</description>
    </item>
    <item>
      <title>A LLM-Driven Acoustic Semantic Enriched Framework for Underwater Acoustic Target Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-llm-driven-acoustic-semantic-enriched-framework/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-llm-driven-acoustic-semantic-enriched-framework/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>A Metric Learning Approach to Heart Murmur Detection from Phonocardiogram Recordings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-metric-learning-approach-to-heart-murmur/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-metric-learning-approach-to-heart-murmur/</guid>
      <description>音频分类 | 7.7/10</description>
    </item>
    <item>
      <title>A Unsupervised Domain Adaptation Framework For Semi-Supervised Melody Extraction Using Confidence Matrix Replace and Nearest Neighbour Supervision</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-unsupervised-domain-adaptation-framework-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-unsupervised-domain-adaptation-framework-for/</guid>
      <description>音乐信息检索 | 8.0/10</description>
    </item>
    <item>
      <title>ACIR-MACL: Effective Multimodal Sentiment Analysis via Attention-Based Causal Intervention Regularization and Multi-Aspect Contrastive Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acir-macl-effective-multimodal-sentiment-analysis/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acir-macl-effective-multimodal-sentiment-analysis/</guid>
      <description>情感分析 | 7.0/10</description>
    </item>
    <item>
      <title>Adaptive Embedding Fusion with Contrastive Learning for Robust Fully Few-Shot Class-Incremental Audio Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-embedding-fusion-with-contrastive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-embedding-fusion-with-contrastive/</guid>
      <description>音频分类 | 7.5/10</description>
    </item>
    <item>
      <title>ADH-VA: Adaptive Directed-Hypergraph Convolution with VA Contrastive Learning for Multimodal Conversational Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adh-va-adaptive-directed-hypergraph-convolution/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adh-va-adaptive-directed-hypergraph-convolution/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>ALMA-Chor: Leveraging Audio-Lyric Alignment with Mamba for Chorus Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-alma-chor-leveraging-audio-lyric-alignment-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-alma-chor-leveraging-audio-lyric-alignment-with/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>An Anomaly-Aware and Audio-Enhanced Dual-Pathway Framework for Alzheimer’s Disease Progression Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-anomaly-aware-and-audio-enhanced-dual-pathway/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-anomaly-aware-and-audio-enhanced-dual-pathway/</guid>
      <description>语音生物标志物 | 7.0/10</description>
    </item>
    <item>
      <title>AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-animalclap-taxonomy-aware-language-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-animalclap-taxonomy-aware-language-audio/</guid>
      <description>音频分类 | 8.0/10</description>
    </item>
    <item>
      <title>ATOM: Adaptive Token-Level Optimal Transport Mixup for Speech Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/</guid>
      <description>语音翻译 | 8.0/10</description>
    </item>
    <item>
      <title>Audio-Guided Multimodal Approach for Fine-Grained Alignment and Boundary Modeling in Active Speaker Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-guided-multimodal-approach-for-fine-grained/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-guided-multimodal-approach-for-fine-grained/</guid>
      <description>说话人检测 | 7.5/10</description>
    </item>
    <item>
      <title>Audio-Visual Deepfake Generation and Detection: An Exploratory Survey</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-visual-deepfake-generation-and-detection-an/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-visual-deepfake-generation-and-detection-an/</guid>
      <description>音频深度伪造检测 | 6.5/10</description>
    </item>
    <item>
      <title>AUDIOCARDS: Structured Metadata Improves Audio Language Models for Sound Design</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiocards-structured-metadata-improves-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiocards-structured-metadata-improves-audio/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>Automatic Music Sample Identification with Multi-Track Contrastive Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-sample-identification-with-multi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-sample-identification-with-multi/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>BEST-STD 2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-std-20-balanced-and-efficient-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-std-20-balanced-and-efficient-speech/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>Bridging the Semantic Gap: Cross-Attentive Fusion for Joint Acoustic-Semantic Speech Quality Assessment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-semantic-gap-cross-attentive-fusion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-semantic-gap-cross-attentive-fusion/</guid>
      <description>语音质量评估 | 8.5/10</description>
    </item>
    <item>
      <title>Caption and Audio-Guided Video Representation Learning with Gated Attention for Partially Relevant Video Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-caption-and-audio-guided-video-representation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-caption-and-audio-guided-video-representation/</guid>
      <description>视频检索 | 7.0/10</description>
    </item>
    <item>
      <title>Contrastive Timbre Representations for Musical Instrument And Synthesizer Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contrastive-timbre-representations-for-musical/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contrastive-timbre-representations-for-musical/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>Controllable Embedding Transformation for Mood-Guided Music Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-controllable-embedding-transformation-for-mood/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-controllable-embedding-transformation-for-mood/</guid>
      <description>音乐检索 | 7.5/10</description>
    </item>
    <item>
      <title>CoVA: Text-Guided Composed Video Retrieval for Audio-Visual Content</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cova-text-guided-composed-video-retrieval-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cova-text-guided-composed-video-retrieval-for/</guid>
      <description>跨模态检索 | 6.5/10</description>
    </item>
    <item>
      <title>Cross-Domain Contrastive Learning with Dynamic Threshold Calibration for Source Speaker Tracing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-domain-contrastive-learning-with-dynamic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-domain-contrastive-learning-with-dynamic/</guid>
      <description>说话人验证 | 8.0/10</description>
    </item>
    <item>
      <title>Curriculum Learning with Contrastive Loss for Lightweight Speaker Verification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-curriculum-learning-with-contrastive-loss-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-curriculum-learning-with-contrastive-loss-for/</guid>
      <description>说话人验证 | 6.5/10</description>
    </item>
    <item>
      <title>DBFT-SD: Weakly Supervised Multimodal Detection of Sensitive Audio-Visual Content</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dbft-sd-weakly-supervised-multimodal-detection-of/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dbft-sd-weakly-supervised-multimodal-detection-of/</guid>
      <description>音频事件检测 | 8.0/10</description>
    </item>
    <item>
      <title>DDSR-Net: Robust Multimodal Sentiment Analysis via Dynamic Modality Reliability Assessment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsr-net-robust-multimodal-sentiment-analysis-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsr-net-robust-multimodal-sentiment-analysis-via/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>Diffemotalk: Audio-Driven Facial Animation with Fine-Grained Emotion Control via Diffusion Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffemotalk-audio-driven-facial-animation-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffemotalk-audio-driven-facial-animation-with/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>Disentangled Authenticity Representation for Partially Deepfake Audio Localization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-disentangled-authenticity-representation-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-disentangled-authenticity-representation-for/</guid>
      <description>音频深度伪造检测 | 6.5/10</description>
    </item>
    <item>
      <title>DISSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissr-disentangling-speech-representation-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissr-disentangling-speech-representation-for/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>DMP-TTS: Disentangled Multi-Modal Prompting for Controllable Text-to-Speech with Chained Guidance</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dmp-tts-disentangled-multi-modal-prompting-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dmp-tts-disentangled-multi-modal-prompting-for/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Domain-Invariant Representation Learning of Bird Sounds</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-invariant-representation-learning-of-bird/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-invariant-representation-learning-of-bird/</guid>
      <description>生物声学 | 6.5/10</description>
    </item>
    <item>
      <title>DPT-Net: Dual-Path Transformer Network with Hierarchical Fusion for EEG-based Envelope Reconstruction</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dpt-net-dual-path-transformer-network-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dpt-net-dual-path-transformer-network-with/</guid>
      <description>语音生物标志物 | 7.0/10</description>
    </item>
    <item>
      <title>DSSR: Decoupling Salient and Subtle Representations Under Missing Modalities for Multimodal Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dssr-decoupling-salient-and-subtle/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dssr-decoupling-salient-and-subtle/</guid>
      <description>情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>Dual Contrastive Learning for Semi-Supervised Domain Adaptation in Bi-Modal Depression Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-contrastive-learning-for-semi-supervised/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-contrastive-learning-for-semi-supervised/</guid>
      <description>语音生物标志物 | 7.0/10</description>
    </item>
    <item>
      <title>Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-data-scaling-for-robust-two-stage-user/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-data-scaling-for-robust-two-stage-user/</guid>
      <description>语音活动检测 | 7.5/10</description>
    </item>
    <item>
      <title>Dual-Perspective Multimodal Sentiment Analysis with MoE Fusion: Representation Learning via Semantic Resonance and Divergence</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-perspective-multimodal-sentiment-analysis/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-perspective-multimodal-sentiment-analysis/</guid>
      <description>多模态情感分析 | 7.0/10</description>
    </item>
    <item>
      <title>EchoRAG: A Two-Stage Framework for Audio-Text Retrieval and Temporal Grounding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echorag-a-two-stage-framework-for-audio-text/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echorag-a-two-stage-framework-for-audio-text/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>Empowering Multimodal Respiratory Sound Classification with Counterfactual Adversarial Debiasing for Out-of-Distribution Robustness</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-empowering-multimodal-respiratory-sound/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-empowering-multimodal-respiratory-sound/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>Face-Voice Association with Inductive Bias for Maximum Class Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-face-voice-association-with-inductive-bias-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-face-voice-association-with-inductive-bias-for/</guid>
      <description>说话人验证 | 7.0/10</description>
    </item>
    <item>
      <title>FUSEMOS: Perceptual Evaluation of Text-to-Music Generation with Dual-Encoder Fusion and Ranking-Aware Composite Loss</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/</guid>
      <description>音乐生成 | 7.5/10</description>
    </item>
    <item>
      <title>GLAP: General Contrastive Audio-Text Pretraining Across Domains and Languages</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glap-general-contrastive-audio-text-pretraining/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glap-general-contrastive-audio-text-pretraining/</guid>
      <description>音频检索 | 8.5/10</description>
    </item>
    <item>
      <title>GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Constrative and Generative Pretraining</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gms-cavp-improving-audio-video-correspondence/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gms-cavp-improving-audio-video-correspondence/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>Graph-Based Emotion Consensus Perception Learning for Multimodal Emotion Recognition in Conversation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-based-emotion-consensus-perception-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-based-emotion-consensus-perception-learning/</guid>
      <description>多模态情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>Graph-based Modality Alignment for Robustness in Conversational Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-based-modality-alignment-for-robustness-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-based-modality-alignment-for-robustness-in/</guid>
      <description>语音情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>HarmoNet: Music Grounding by Short Video via Harmonic Resample and Dynamic Sparse Alignment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-harmonet-music-grounding-by-short-video-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-harmonet-music-grounding-by-short-video-via/</guid>
      <description>音乐检索 | 7.0/10</description>
    </item>
    <item>
      <title>HD-PPT: Hierarchical Decoding of Content- and Prompt-Preference Tokens for Instruction-Based TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hd-ppt-hierarchical-decoding-of-content-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hd-ppt-hierarchical-decoding-of-content-and/</guid>
      <description>语音合成 | 8.0/10</description>
    </item>
    <item>
      <title>Improving Binaural Distance Estimation in Reverberant Rooms Through Contrastive And Multi-Task Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-binaural-distance-estimation-in/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-binaural-distance-estimation-in/</guid>
      <description>声源定位 | 7.0/10</description>
    </item>
    <item>
      <title>Inter-Dialog Contrastive Learning for Multimodal Emotion Recognition in Conversations</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inter-dialog-contrastive-learning-for-multimodal/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inter-dialog-contrastive-learning-for-multimodal/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>Learning Domain-Robust Bioacoustic Representations for Mosquito Species Classification with Contrastive Learning and Distribution Alignment</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-domain-robust-bioacoustic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-domain-robust-bioacoustic/</guid>
      <description>生物声学 | 7.5/10</description>
    </item>
    <item>
      <title>LETPAV: Lexicon-Enhanced Text with Progressive Audio-Visual Fusion for Multimodal Sentiment Analysis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-letpav-lexicon-enhanced-text-with-progressive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-letpav-lexicon-enhanced-text-with-progressive/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>Leveraging Whisper Embeddings For Audio-Based Lyrics Matching</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-whisper-embeddings-for-audio-based/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-whisper-embeddings-for-audio-based/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>Lightweight and Generalizable Acoustic Scene Representations Via Contrastive Fine-Tuning and Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-generalizable-acoustic-scene/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-generalizable-acoustic-scene/</guid>
      <description>音频场景理解 | 8.0/10</description>
    </item>
    <item>
      <title>Look, Listen and Segment: Towards Weakly Supervised Audio-Visual Semantic Segmentation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-look-listen-and-segment-towards-weakly-supervised/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-look-listen-and-segment-towards-weakly-supervised/</guid>
      <description>音视频 | 7.0/10</description>
    </item>
    <item>
      <title>MAG: Multi-Modal Aligned Autoregressive Co-Speech Gesture Generation Without Vector Quantization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mag-multi-modal-aligned-autoregressive-co-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mag-multi-modal-aligned-autoregressive-co-speech/</guid>
      <description>音频生成 | 8.0/10</description>
    </item>
    <item>
      <title>Malefa: Multi-Granularity Learning and Effective False Alarm Suppression for Zero-Shot Keyword Spotting</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-malefa-multi-granularity-learning-and-effective/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-malefa-multi-granularity-learning-and-effective/</guid>
      <description>零样本关键词检测 | 7.5/10</description>
    </item>
    <item>
      <title>MC-MRX: Reference- and Midi-Guided Music Source Extraction with Contrastive Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mc-mrx-reference-and-midi-guided-music-source/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mc-mrx-reference-and-midi-guided-music-source/</guid>
      <description>音乐源提取 | 7.0/10</description>
    </item>
    <item>
      <title>Mitigating Language Prior-Induced Hallucinations via Bi-Level Contrastive Decoding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-language-prior-induced-hallucinations/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-language-prior-induced-hallucinations/</guid>
      <description>多模态模型 | 7.5/10</description>
    </item>
    <item>
      <title>Mitigating Shared-Private Branch Imbalance via Dual-Branch Rebalancing for Multimodal Sentiment Analysis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-shared-private-branch-imbalance-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-shared-private-branch-imbalance-via/</guid>
      <description>多模态模型 | 7.5/10</description>
    </item>
    <item>
      <title>Motionbeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-motionbeat-motion-aligned-music-representation/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-motionbeat-motion-aligned-music-representation/</guid>
      <description>舞蹈生成 | 7.5/10</description>
    </item>
    <item>
      <title>Multi-Scale Physiologically-Motivated Alignment for Auditory Attention Decoding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-scale-physiologically-motivated-alignment/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-scale-physiologically-motivated-alignment/</guid>
      <description>听觉注意力解码 | 7.5/10</description>
    </item>
    <item>
      <title>Noise-Robust Contrastive Learning with an MFCC-Conformer for Coronary Artery Disease Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-robust-contrastive-learning-with-an-mfcc/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-robust-contrastive-learning-with-an-mfcc/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>PADAM: Perceptual Audio Defect Assessment Model</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-padam-perceptual-audio-defect-assessment-model/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-padam-perceptual-audio-defect-assessment-model/</guid>
      <description>音频分类 | 7.0/10</description>
    </item>
    <item>
      <title>Prototype-Guided Cross-Modal Contrastive Learning for Continual Audio-Visual Sound Separation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prototype-guided-cross-modal-contrastive-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prototype-guided-cross-modal-contrastive-learning/</guid>
      <description>语音分离 | 7.5/10</description>
    </item>
    <item>
      <title>Rationale-Guided Learning for Multimodal Emotion Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rationale-guided-learning-for-multimodal-emotion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rationale-guided-learning-for-multimodal-emotion/</guid>
      <description>语音情感识别 | 7.0/10</description>
    </item>
    <item>
      <title>RCAL: Reinforced Cross-Modal Alignment for Multimodal Sentiment Analysis with Sparse Visual Frames</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rcal-reinforced-cross-modal-alignment-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rcal-reinforced-cross-modal-alignment-for/</guid>
      <description>多模态模型 | 8.5/10</description>
    </item>
    <item>
      <title>Representation-Based Data Quality Audits for Audio</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-based-data-quality-audits-for-audio/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-based-data-quality-audits-for-audio/</guid>
      <description>数据集 | 7.5/10</description>
    </item>
    <item>
      <title>Representation-Diverse Self-Supervision for Cross-Domain Bioacoustic Learning in Low-Resource Settings</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/</guid>
      <description>生物声学 | 7.0/10</description>
    </item>
    <item>
      <title>Rethinking Entity Disambiguation in Complex Modalities</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rethinking-entity-disambiguation-in-complex/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rethinking-entity-disambiguation-in-complex/</guid>
      <description>实体消歧 | 8.0/10</description>
    </item>
    <item>
      <title>Salad-VAE: Semantic Audio Compression with Language-Audio Distillation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-salad-vae-semantic-audio-compression-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-salad-vae-semantic-audio-compression-with/</guid>
      <description>音频压缩 | 7.5/10</description>
    </item>
    <item>
      <title>Semantic-Guided Pseudo-Feature Attention Network for Audio-Visual Zero-Shot Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-guided-pseudo-feature-attention-network/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-guided-pseudo-feature-attention-network/</guid>
      <description>音频分类 零样本学习 | 7.0/10</description>
    </item>
    <item>
      <title>SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slap-scalable-language-audio-pretraining-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slap-scalable-language-audio-pretraining-with/</guid>
      <description>音频检索 | 8.0/10</description>
    </item>
    <item>
      <title>SmoothCLAP: Soft-Target Enhanced Contrastive Language-Audio Pretraining for Affective Computing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-smoothclap-soft-target-enhanced-contrastive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-smoothclap-soft-target-enhanced-contrastive/</guid>
      <description>语音情感识别 | 6.5/10</description>
    </item>
    <item>
      <title>SPAM: Style Prompt Adherence Metric for Prompt-Based TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spam-style-prompt-adherence-metric-for-prompt/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spam-style-prompt-adherence-metric-for-prompt/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Spatial-CLAP: Learning Spatially-Aware Audio–Text Embeddings for Multi-Source Conditions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatial-clap-learning-spatially-aware-audiotext/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatial-clap-learning-spatially-aware-audiotext/</guid>
      <description>空间音频 | 8.5/10</description>
    </item>
    <item>
      <title>Speech Emotion Recognition based on Hierarchical Transformer with Shifted Windows</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-emotion-recognition-based-on-hierarchical/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-emotion-recognition-based-on-hierarchical/</guid>
      <description>语音情感识别 | 8.0/10</description>
    </item>
    <item>
      <title>SpeechCT-CLIP: Distilling Text-Image Knowledge to Speech for Voice-Native Multimodal CT Analysis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechct-clip-distilling-text-image-knowledge-to/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechct-clip-distilling-text-image-knowledge-to/</guid>
      <description>医疗AI | 7.5/10</description>
    </item>
    <item>
      <title>Style-Disentangled Diffusion for Controllable and Identity-Generalized Speech-Driven Body Motion Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-style-disentangled-diffusion-for-controllable-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-style-disentangled-diffusion-for-controllable-and/</guid>
      <description>语音驱动动作生成 | 7.0/10</description>
    </item>
    <item>
      <title>SynaSpot: A Lightweight, Streaming Multi-modal Framework for Keyword Spotting with Audio-Text Synergy</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synaspot-a-lightweight-streaming-multi-modal/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synaspot-a-lightweight-streaming-multi-modal/</guid>
      <description>关键词检测 | 7.5/10</description>
    </item>
    <item>
      <title>Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic Event Classification</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporally-heterogeneous-graph-contrastive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporally-heterogeneous-graph-contrastive/</guid>
      <description>音频事件检测 | 8.5/10</description>
    </item>
    <item>
      <title>The Curious Case of Visual Grounding: Different Effects for Speech-and Text-Based Language Encoders</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-curious-case-of-visual-grounding-different/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-curious-case-of-visual-grounding-different/</guid>
      <description>模型评估 | 8.0/10</description>
    </item>
    <item>
      <title>Towards Effective Negation Modeling in Joint Audio-Text Models for Music</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-effective-negation-modeling-in-joint/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-effective-negation-modeling-in-joint/</guid>
      <description>音乐理解 | 7.5/10</description>
    </item>
    <item>
      <title>TTA: Transcribe, Translate and Alignment for Cross-Lingual Speech Representation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>WavLink: Compact Audio–Text Embeddings with a Global Whisper Token</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavlink-compact-audiotext-embeddings-with-a/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavlink-compact-audiotext-embeddings-with-a/</guid>
      <description>音频检索 | 8.0/10</description>
    </item>
    <item>
      <title>Robust Audio-Text Retrieval via Cross-Modal Attention and Hybrid Loss</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-robust-audio-text-retrieval-via-cross-modal/</link>
      <pubDate>Tue, 28 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-robust-audio-text-retrieval-via-cross-modal/</guid>
      <description>音频检索 | 7.5/10</description>
    </item>
    <item>
      <title>ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Materialistic RIR: Material Conditioned Realistic RIR Generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-materialistic-rir-material-conditioned-realistic/</link>
      <pubDate>Fri, 24 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-materialistic-rir-material-conditioned-realistic/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>ATIR: Towards Audio-Text Interleaved Contextual Retrieval</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-atir-towards-audio-text-interleaved-contextual/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-atir-towards-audio-text-interleaved-contextual/</guid>
      <description>这篇论文旨在解决现有音频-文本检索方法无法处理查询和文档中音频与文本交错出现（如多轮对话、混合输入）的局限性。为此，作者定义了音频-文本交错上下文检索（ATIR）任务，并构建了一个包含约8.8万对样本的大规模基准。为解决直接应用多模态大语言模型（MLLM）时音频token冗余导致的效率和精度问题，论</description>
    </item>
    <item>
      <title>Deep Supervised Contrastive Learning of Pitch Contours for Robust Pitch Accent Classification in Seoul Korean</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-deep-supervised-contrastive-learning-of-pitch/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-deep-supervised-contrastive-learning-of-pitch/</guid>
      <description>这篇论文旨在解决将连续变化的基频（F0）曲线映射到首尔韩语中离散、不变的音高重音类别（如LHLH, HHLH）这一难题。传统方法易受F0测量噪声和说话人差异的影响。为此，作者提出了**Dual-Glob**，一个深度监督对比学习框架。其核心是通过一个**双分支（干净视图和增强视图）编码器**，在共享</description>
    </item>
    <item>
      <title>语音/音频论文速递 2026-04-22</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/</guid>
      <description>共分析 21 篇语音/AI 论文</description>
    </item>
    <item>
      <title>ProSDD: Learning Prosodic Representations for Speech Deepfake Detection against Expressive and Emotional Attacks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-prosdd-learning-prosodic-representations-for/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-prosdd-learning-prosodic-representations-for/</guid>
      <description>这篇论文旨在解决当前语音深度伪造检测（SDD）系统在面对富有表现力和情感的合成语音攻击时泛化能力不足的核心问题。现有方法过度依赖伪造数据，容易学习数据集特定的伪影，而非自然语音的可迁移特征。为此，作者</description>
    </item>
  </channel>
</rss>
