<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>端到端 on 语音/音频论文速递</title>
    <link>https://nanless.github.io/audio-paper-digest-blog/tags/%E7%AB%AF%E5%88%B0%E7%AB%AF/</link>
    <description>Recent content in 端到端 on 语音/音频论文速递</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Wed, 29 Apr 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://nanless.github.io/audio-paper-digest-blog/tags/%E7%AB%AF%E5%88%B0%E7%AB%AF/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>A Speech-Driven Paradigm for Physics-Informed Modeling of Coupled Micro-Speakers</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-speech-driven-paradigm-for-physics-informed/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-speech-driven-paradigm-for-physics-informed/</guid>
      <description>音频生成 | 7.0/10</description>
    </item>
    <item>
      <title>Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adapting-diarization-conditioned-whisper-for-end/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adapting-diarization-conditioned-whisper-for-end/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Advancing LLM-Based Multi-Channel Multi-Speaker Speech Recognition with Global Cross-Channel Attention and Sentence-Ordered First-In First-Out Serialized Output Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-llm-based-multi-channel-multi-speaker/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-llm-based-multi-channel-multi-speaker/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>ALMA-Chor: Leveraging Audio-Lyric Alignment with Mamba for Chorus Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-alma-chor-leveraging-audio-lyric-alignment-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-alma-chor-leveraging-audio-lyric-alignment-with/</guid>
      <description>音乐信息检索 | 7.0/10</description>
    </item>
    <item>
      <title>An End-to-End Multimodal System for Subtitle Recognition and Chinese-Japanese Translation in Short Dramas</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-end-to-end-multimodal-system-for-subtitle/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-end-to-end-multimodal-system-for-subtitle/</guid>
      <description>多模态模型 | 7.0/10</description>
    </item>
    <item>
      <title>An Envelope Separation Aided Multi-Task Learning Model for Blind Source Counting and Localization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-envelope-separation-aided-multi-task-learning/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-envelope-separation-aided-multi-task-learning/</guid>
      <description>声源定位 | 6.5/10</description>
    </item>
    <item>
      <title>Audio Deepfake Detection at the First Greeting: &#34;Hi!&#34;</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-deepfake-detection-at-the-first-greeting-hi/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-deepfake-detection-at-the-first-greeting-hi/</guid>
      <description>音频深度伪造检测 | 7.5/10</description>
    </item>
    <item>
      <title>Audio-to-Score Jazz Solo Transcription with the Rhythm Perceiver</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-to-score-jazz-solo-transcription-with-the/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-to-score-jazz-solo-transcription-with-the/</guid>
      <description>音乐信息检索 | 7.5/10</description>
    </item>
    <item>
      <title>Auditory-Inspired Transformer for Binaural Speech Enhancement and Spatial Cue Preservation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auditory-inspired-transformer-for-binaural-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auditory-inspired-transformer-for-binaural-speech/</guid>
      <description>语音增强 | 7.0/10</description>
    </item>
    <item>
      <title>CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Chunk-Wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunk-wise-attention-transducers-for-fast-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunk-wise-attention-transducers-for-fast-and/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Chunkwise Aligners for Streaming Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunkwise-aligners-for-streaming-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunkwise-aligners-for-streaming-speech/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Content Anonymization for Privacy in Long-Form Audio</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-anonymization-for-privacy-in-long-form/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-anonymization-for-privacy-in-long-form/</guid>
      <description>语音匿名化 | 7.5/10</description>
    </item>
    <item>
      <title>Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-dubbing-end-to-end-auto-audiobook-system/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-dubbing-end-to-end-auto-audiobook-system/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Direct Transfer of Prosody in Speech-to-speech Translation using Disentangled Speech Tokens</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-transfer-of-prosody-in-speech-to-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-transfer-of-prosody-in-speech-to-speech/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>Discrete-Continuous Fusion With Adaptive Hierarchical Features For Audio Deepfake Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-continuous-fusion-with-adaptive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-continuous-fusion-with-adaptive/</guid>
      <description>音频深度伪造检测 | 8.0/10</description>
    </item>
    <item>
      <title>DSRMS-TransUnet: A Decentralized Non-Shifted Transunet for Shallow Water Acoustic Source Range Estimation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dsrms-transunet-a-decentralized-non-shifted/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dsrms-transunet-a-decentralized-non-shifted/</guid>
      <description>声源定位 | 8.0/10</description>
    </item>
    <item>
      <title>Dual-Strategy-Enhanced Conbimamba for Neural Speaker Diarization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-strategy-enhanced-conbimamba-for-neural/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-strategy-enhanced-conbimamba-for-neural/</guid>
      <description>说话人分离 | 8.0/10</description>
    </item>
    <item>
      <title>E2E-AEC: Implementing An End-To-End Neural Network Learning Approach for Acoustic Echo Cancellation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-e2e-aec-implementing-an-end-to-end-neural-network/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-e2e-aec-implementing-an-end-to-end-neural-network/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detection Using Self-Attention Attractors</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eend-saa-enrollment-less-main-speaker-voice/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eend-saa-enrollment-less-main-speaker-voice/</guid>
      <description>语音活动检测 | 7.5/10</description>
    </item>
    <item>
      <title>Exploring SSL Discrete Tokens for Multilingual Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-ssl-discrete-tokens-for-multilingual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-ssl-discrete-tokens-for-multilingual/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection with Multichannel Audio and Multiscale Visual Cues</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-havt-ivd-heterogeneity-aware-cross-modal-network/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-havt-ivd-heterogeneity-aware-cross-modal-network/</guid>
      <description>音频事件检测 | 8.0/10</description>
    </item>
    <item>
      <title>HCGAN: Harmonic-Coupled Generative Adversarial Network for Speech Super-Resolution in Low-Bandwidth Scenarios</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hcgan-harmonic-coupled-generative-adversarial/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hcgan-harmonic-coupled-generative-adversarial/</guid>
      <description>语音增强 | 8.0/10</description>
    </item>
    <item>
      <title>HVAC-EAR: Eavesdropping Human Speech Using HVAC Systems</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hvac-ear-eavesdropping-human-speech-using-hvac/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hvac-ear-eavesdropping-human-speech-using-hvac/</guid>
      <description>音频安全 | 8.5/10</description>
    </item>
    <item>
      <title>HyFlowSE: Hybrid End-To-End Flow-Matching Speech Enhancement via Generative-Discriminative Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hyflowse-hybrid-end-to-end-flow-matching-speech/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hyflowse-hybrid-end-to-end-flow-matching-speech/</guid>
      <description>语音增强 | 8.0/10</description>
    </item>
    <item>
      <title>Improving Contextual Asr Via Multi-Grained Fusion With Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-contextual-asr-via-multi-grained-fusion/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-contextual-asr-via-multi-grained-fusion/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>Joint Autoregressive Modeling of Multi-Talker Overlapped Speech Recognition and Translation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/</guid>
      <description>语音识别 语音翻译 | 7.0/10</description>
    </item>
    <item>
      <title>Joint Deep Secondary Path Estimation and Adaptive Control for Active Noise Cancellation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-deep-secondary-path-estimation-and-adaptive/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-deep-secondary-path-estimation-and-adaptive/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-Task Multi-Scale Network</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-piano-dynamics-and-metrical/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-piano-dynamics-and-metrical/</guid>
      <description>音乐理解 | 7.5/10</description>
    </item>
    <item>
      <title>K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-k-function-joint-pronunciation-transcription-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-k-function-joint-pronunciation-transcription-and/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>Language-Infused Retrieval-Augmented CTC with Adaptive Soft-Hard Gating for Robust Code-Switching ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-language-infused-retrieval-augmented-ctc-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-language-infused-retrieval-augmented-ctc-with/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Lattice-Guided Consistency Regularization of Dual-Mode Transducers for Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lattice-guided-consistency-regularization-of-dual/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lattice-guided-consistency-regularization-of-dual/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Learning to Align with Unbalanced Optimal Transport in Linguistic Knowledge Transfer for ASR</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-to-align-with-unbalanced-optimal/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-to-align-with-unbalanced-optimal/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>Lightweight Implicit Neural Network for Binaural Audio Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-implicit-neural-network-for-binaural/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-implicit-neural-network-for-binaural/</guid>
      <description>空间音频 | 7.0/10</description>
    </item>
    <item>
      <title>Lingometer: On-Device Personal Speech Word Counting System</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/</guid>
      <description>语音活动检测 | 8.0/10</description>
    </item>
    <item>
      <title>Low-Bandwidth High-Fidelity Speech Transmission with Generative Latent Joint Source-Channel Coding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-bandwidth-high-fidelity-speech-transmission/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-bandwidth-high-fidelity-speech-transmission/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>MELA-TTS: Joint Transformer-Diffusion Model with Representation Alignment for Speech Synthesis</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mela-tts-joint-transformer-diffusion-model-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mela-tts-joint-transformer-diffusion-model-with/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Mixture of Experts for Recognizing Depression from Interview and Reading Tasks</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-for-recognizing-depression/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-for-recognizing-depression/</guid>
      <description>语音生物标志物 | 6.0/10</description>
    </item>
    <item>
      <title>MSANET: Multi-Scale Semantic Aggregation Network for Brain-Assisted Speech Enhancement in Multi-Speaker Conditions</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msanet-multi-scale-semantic-aggregation-network/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msanet-multi-scale-semantic-aggregation-network/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Musicdetr: A Position-Aware Spectral Note Detection Model for Singing Transcription</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musicdetr-a-position-aware-spectral-note/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musicdetr-a-position-aware-spectral-note/</guid>
      <description>歌唱语音转录 | 8.5/10</description>
    </item>
    <item>
      <title>Peeking Into the Future for Contextual Biasing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-peeking-into-the-future-for-contextual-biasing/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-peeking-into-the-future-for-contextual-biasing/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Polynomial Mixing for Efficient Self-Supervised Speech Encoders</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-purification-before-fusion-toward-mask-free/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-purification-before-fusion-toward-mask-free/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>QFOCUS: Controllable Synthesis for Automated Speech Stress Editing to Deliver Human-Like Emphatic Intent</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qfocus-controllable-synthesis-for-automated/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qfocus-controllable-synthesis-for-automated/</guid>
      <description>语音合成 | 7.5/10</description>
    </item>
    <item>
      <title>Revisiting Direct Speech-to-Text Translation with Speech LLMS: Better Scaling than Cot Prompting?</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-revisiting-direct-speech-to-text-translation-with/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-revisiting-direct-speech-to-text-translation-with/</guid>
      <description>语音翻译 | 7.5/10</description>
    </item>
    <item>
      <title>RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rlbr-reinforcement-learning-with-biasing-rewards/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rlbr-reinforcement-learning-with-biasing-rewards/</guid>
      <description>语音识别 | 8.0/10</description>
    </item>
    <item>
      <title>SAASDNet: An EEG-Based Streaming Auditory Attention Switch Decoding Network for Self-Initiated Attention Switching in Mixed Speech</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saasdnet-an-eeg-based-streaming-auditory/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saasdnet-an-eeg-based-streaming-auditory/</guid>
      <description>脑机接口 | 8.0/10</description>
    </item>
    <item>
      <title>Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-multi-talker-asr-with-speaker-agnostic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-multi-talker-asr-with-speaker-agnostic/</guid>
      <description>语音识别 | 8.5/10</description>
    </item>
    <item>
      <title>Semantic Anchor Transfer from Short to Long Speech in a Distillation-Based Summarization Framework</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-anchor-transfer-from-short-to-long/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-anchor-transfer-from-short-to-long/</guid>
      <description>语音摘要 | 7.5/10</description>
    </item>
    <item>
      <title>Session-Level Spoken Language Assessment with A Multimodal Foundation Model Via Multi-Target Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-session-level-spoken-language-assessment-with-a/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-session-level-spoken-language-assessment-with-a/</guid>
      <description>语音评估 | 7.5/10</description>
    </item>
    <item>
      <title>SpatialNet-Echo: Real-Time Acoustic Echo Cancellation via Integrated Narrow-Band and Cross-Band Processing</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatialnet-echo-real-time-acoustic-echo/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatialnet-echo-real-time-acoustic-echo/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streaming-speech-recognition-with-decoder-only/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streaming-speech-recognition-with-decoder-only/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>StreamMark: A Deep Learning-Based Semi-Fragile Audio Watermarking for Proactive Deepfake Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streammark-a-deep-learning-based-semi-fragile/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streammark-a-deep-learning-based-semi-fragile/</guid>
      <description>音频深度伪造检测 | 8.0/10</description>
    </item>
    <item>
      <title>Sunac: Source-Aware Unified Neural Audio Codec</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sunac-source-aware-unified-neural-audio-codec/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sunac-source-aware-unified-neural-audio-codec/</guid>
      <description>音频生成 | 7.5/10</description>
    </item>
    <item>
      <title>T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Phone TTS</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-mimi-a-transformer-based-mimi-decoder-for-real/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-mimi-a-transformer-based-mimi-decoder-for-real/</guid>
      <description>语音合成 | 7.0/10</description>
    </item>
    <item>
      <title>Task-Oriented Sound Privacy Preservation for Sound Event Detection Via End-to-End Adversarial Multi-Task Learning</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-oriented-sound-privacy-preservation-for/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-oriented-sound-privacy-preservation-for/</guid>
      <description>音频事件检测 | 7.5/10</description>
    </item>
    <item>
      <title>TextlessRAG: End-to-End Visual Document RAG by Speech without Text</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-textlessrag-end-to-end-visual-document-rag-by/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-textlessrag-end-to-end-visual-document-rag-by/</guid>
      <description>语音问答 | 8.5/10</description>
    </item>
    <item>
      <title>Tokenchain: A Discrete Speech Chain via Semantic Token Modeling</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tokenchain-a-discrete-speech-chain-via-semantic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tokenchain-a-discrete-speech-chain-via-semantic/</guid>
      <description>语音识别 | 7.0/10</description>
    </item>
    <item>
      <title>Toward Robust And Efficient Beat Tracking Via Beat-Aware Attention</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-toward-robust-and-efficient-beat-tracking-via/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-toward-robust-and-efficient-beat-tracking-via/</guid>
      <description>音乐理解 | 8.5/10</description>
    </item>
    <item>
      <title>Tpeformer: Temporal Patch Embedding Transformer</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tpeformer-temporal-patch-embedding-transformer/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tpeformer-temporal-patch-embedding-transformer/</guid>
      <description>语音情感识别 | 7.5/10</description>
    </item>
    <item>
      <title>Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-train-short-infer-long-speech-llm-enables-zero/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-train-short-infer-long-speech-llm-enables-zero/</guid>
      <description>说话人分离 | 9.0/10</description>
    </item>
    <item>
      <title>Tri-Attention Fusion: Joint Temporal-Spectral and Bidirectional Modeling for Speech Spoofing Detection</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tri-attention-fusion-joint-temporal-spectral-and/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tri-attention-fusion-joint-temporal-spectral-and/</guid>
      <description>语音伪造检测 | 7.0/10</description>
    </item>
    <item>
      <title>UJCodec: An End-to-end Unet-Style Codec for Joint Speech Compression and Enhancement</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ujcodec-an-end-to-end-unet-style-codec-for-joint/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ujcodec-an-end-to-end-unet-style-codec-for-joint/</guid>
      <description>语音增强 | 7.5/10</description>
    </item>
    <item>
      <title>UMA-SPLIT: Unimodal Aggregation for Both English and Mandarin Non-Autoregressive Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uma-split-unimodal-aggregation-for-both-english/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uma-split-unimodal-aggregation-for-both-english/</guid>
      <description>语音识别 | 7.5/10</description>
    </item>
    <item>
      <title>USVexplorer: Robust Detection of Ultrasonic Vocalizations with Cross Species Generalization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-usvexplorer-robust-detection-of-ultrasonic/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-usvexplorer-robust-detection-of-ultrasonic/</guid>
      <description>音频事件检测 | 8.0/10</description>
    </item>
    <item>
      <title>VBx for End-to-End Neural and Clustering-Based Diarization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vbx-for-end-to-end-neural-and-clustering-based/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vbx-for-end-to-end-neural-and-clustering-based/</guid>
      <description>说话人分离 | 8.5/10</description>
    </item>
    <item>
      <title>VChangeCodec: An Ultra Low-Complexity Neural Speech Codec with Built-In Voice Changer for Customized Real-Time Communication</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vchangecodec-an-ultra-low-complexity-neural/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vchangecodec-an-ultra-low-complexity-neural/</guid>
      <description>语音转换 语音增强 | 8.0/10</description>
    </item>
    <item>
      <title>WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/</guid>
      <description>语音识别 | 6.5/10</description>
    </item>
    <item>
      <title>β-AVSDNET: A Novel End-To-End Neural Network Architecture For Audio-Visual Speaker Diarization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-avsdnet-a-novel-end-to-end-neural-network/</link>
      <pubDate>Wed, 29 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-avsdnet-a-novel-end-to-end-neural-network/</guid>
      <description>说话人分离 | 7.5/10</description>
    </item>
    <item>
      <title>Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-full-duplex-interaction-in-spoken-dialogue/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-full-duplex-interaction-in-spoken-dialogue/</guid>
      <description>语音对话系统 | 6.5/10</description>
    </item>
    <item>
      <title>Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-reducing-the-offline-streaming-gap-for-unified/</link>
      <pubDate>Thu, 23 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-reducing-the-offline-streaming-gap-for-unified/</guid>
      <description>1. **问题**：训练一个既能高精度离线转录又能低延迟流式识别的统一ASR模型极具挑战性，传统方法在低延迟下性能会急剧下降。 2. **方法核心**：提出一个统一的Transducer框架，结合分块注意力（含右上下文）和动态块卷积（DCConv）来适配两种模式。核心创新是引入了模式一致性正则化损失</description>
    </item>
    <item>
      <title>Deep Supervised Contrastive Learning of Pitch Contours for Robust Pitch Accent Classification in Seoul Korean</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-deep-supervised-contrastive-learning-of-pitch/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-deep-supervised-contrastive-learning-of-pitch/</guid>
      <description>这篇论文旨在解决将连续变化的基频（F0）曲线映射到首尔韩语中离散、不变的音高重音类别（如LHLH, HHLH）这一难题。传统方法易受F0测量噪声和说话人差异的影响。为此，作者提出了**Dual-Glob**，一个深度监督对比学习框架。其核心是通过一个**双分支（干净视图和增强视图）编码器**，在共享</description>
    </item>
    <item>
      <title>Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-text-to-speech-with-chain-of-details-modeling/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-text-to-speech-with-chain-of-details-modeling/</guid>
      <description>本文针对文本转语音（TTS）任务，提出了一种名为“细节链”（Chain-of-Details, CoD）的新框架。**要解决的问题**是现有TTS方法在建模语音生成的时域动态（从粗略时序到精细声学细节的渐进过程）方面存在不足。**使用的方法**是将语音生成分解为多个时间分辨率递增的阶段，在每个阶段使</description>
    </item>
    <item>
      <title>MUSCAT: MUltilingual, SCientific ConversATion Benchmark</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-muscat-multilingual-scientific-conversation/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-muscat-multilingual-scientific-conversation/</guid>
      <description>本文提出了 MUSCAT，一个用于评估多语言科学对话场景下自动语音识别（ASR）性能的新基准。数据集包含 6 组双语对话录音（共约 65 分钟，9,066 词），涉及英语与德语、土耳其语、中文、越南语的配对对话；每组对话使用 Meeting Owl 3、ReSpeaker USB 麦克风阵列和 Me</description>
    </item>
    <item>
      <title>VoxMind: An End-to-End Agentic Spoken Dialogue System</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-voxmind-an-end-to-end-agentic-spoken-dialogue/</link>
      <pubDate>Mon, 20 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-voxmind-an-end-to-end-agentic-spoken-dialogue/</guid>
      <description>端到端语音对话模型在自然交互上进步迅速，但普遍缺乏处理复杂任务的agent能力（工具调用、规划、推理）。本文首先形式化定义了&amp;#34;端到端语音智能体&amp;#34;的四大维度——画像（Profile）、记忆（Memory）、规划（Planning）与执行（Action Execution），填补了该领域理论标准的空白。</description>
    </item>
    <item>
      <title>An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-an-ultra-low-latency-end-to-end-streaming-speech/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-an-ultra-low-latency-end-to-end-streaming-speech/</guid>
      <description>这篇论文旨在解决实时交互式语音合成中**推理延迟高**与**声学质量（尤其是高频细节）易受损**的核心矛盾。传统流水线依赖计算密集的神经声码器进行波形重建，且基于连续回归的声学模型易导致频谱过平滑。为</description>
    </item>
    <item>
      <title>WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training</title>
      <link>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-wavalign-enhancing-intelligence-and/</link>
      <pubDate>Sun, 19 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-wavalign-enhancing-intelligence-and/</guid>
      <description>这篇论文旨在解决端到端语音对话模型在智能（IQ）和表达力（EQ）上难以同时提升的核心挑战。作者发现，直接对混合文本-语音序列应用统一的偏好优化（如DPO、GRPO）会导致问题：稀疏的偏好信号被淹没在密</description>
    </item>
  </channel>
</rss>
