实时处理 on 语音/音频论文速递

实时处理 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E5%AE%9E%E6%97%B6%E5%A4%84%E7%90%86/ Recent content in 实时处理 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Lightweight Fourier-Based Network for Binaural Speech Enhancement with Spatial Cue Preservation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-lightweight-fourier-based-network-for-binaural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-lightweight-fourier-based-network-for-binaural/ 语音增强 | 8.5/10 A Personalized Real-Time Proactive Voice Memory Assistant https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-personalized-real-time-proactive-voice-memory/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-personalized-real-time-proactive-voice-memory/ 实时处理 | 7.0/10 A Stabilized Hybrid Active Noise Control Algorithm of GFANC and FxNLMS with Online Clustering https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-stabilized-hybrid-active-noise-control/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-stabilized-hybrid-active-noise-control/ 语音增强 | 7.5/10 Acoustic Feedback Cancellation in Hearing Aids Exploiting an Inertial Sensor https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-feedback-cancellation-in-hearing-aids/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-feedback-cancellation-in-hearing-aids/ 音频分类 | 7.0/10 Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-non-stationarity-objective-assessment/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-non-stationarity-objective-assessment/ 音频分类 | 7.0/10 Ailive Mixer: A Deep Learning Based Zero Latency Automatic Music Mixer for Live Music Performances https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ailive-mixer-a-deep-learning-based-zero-latency/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ailive-mixer-a-deep-learning-based-zero-latency/ 音乐混合 | 7.0/10 AR-BSNet: Towards Ultra-Low Complexity Autoregressive Target Speaker Extraction With Band-Split Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ar-bsnet-towards-ultra-low-complexity/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ar-bsnet-towards-ultra-low-complexity/ 语音分离 | 7.0/10 ASAP: An Azimuth-Priority Strip-Based Search Approach to Planar Microphone Array DOA Estimation in 3D https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asap-an-azimuth-priority-strip-based-search/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asap-an-azimuth-priority-strip-based-search/ 声源定位 | 7.5/10 Atomic Norm Minimization Revisited: Progressive Atom Identification And Refinement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atomic-norm-minimization-revisited-progressive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atomic-norm-minimization-revisited-progressive/ 声源定位 | 7.5/10 Audio Deepfake Detection at the First Greeting: "Hi!" https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-deepfake-detection-at-the-first-greeting-hi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-deepfake-detection-at-the-first-greeting-hi/ 音频深度伪造检测 | 7.5/10 Constraint Optimized Multichannel Mixer-Limiter Design https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-constraint-optimized-multichannel-mixer-limiter/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-constraint-optimized-multichannel-mixer-limiter/ 多通道 | 7.0/10 Deep Learning-Based Joint Optimization of Adaptive Feedback Cancellation and Residual Feedback Suppression for Hearing Aids https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-learning-based-joint-optimization-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-learning-based-joint-optimization-of/ 语音增强 | 8.0/10 Differentiable Grouped Feedback Delay Networks for Learning Direction and Position-Dependent Late Reverberation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-differentiable-grouped-feedback-delay-networks/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-differentiable-grouped-feedback-delay-networks/ 空间音频 | 7.5/10 Distributed Multichannel Active Noise Control with Asynchronous Communication https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-distributed-multichannel-active-noise-control/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-distributed-multichannel-active-noise-control/ 信号处理 | 8.0/10 Enhancing Automatic Drum Transcription with Online Dynamic Few-Shot Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-automatic-drum-transcription-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-automatic-drum-transcription-with/ 音乐信息检索 | 7.0/10 Fast-ULCNet: A Fast and Ultra Low Complexity Network for Single-Channel Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fast-ulcnet-a-fast-and-ultra-low-complexity/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fast-ulcnet-a-fast-and-ultra-low-complexity/ 语音增强 | 7.5/10 FastEnhancer: Speed-Optimized Streaming Neural Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fastenhancer-speed-optimized-streaming-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fastenhancer-speed-optimized-streaming-neural/ 语音增强 | 8.5/10 FlashFoley: Fast Interactive Sketch2audio Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flashfoley-fast-interactive-sketch2audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flashfoley-fast-interactive-sketch2audio/ 音频生成 | 7.5/10 H-nnPBFDAF: Hierarchical Neural Network Partitioned Block Frequency Domain Adaptive Filter with Novel Block Activation Probability https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-h-nnpbfdaf-hierarchical-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-h-nnpbfdaf-hierarchical-neural-network/ 语音增强 | 7.5/10 Huí Sù: Co-constructing a Dual Feedback Apparatus https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hu-s-co-constructing-a-dual-feedback-apparatus/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hu-s-co-constructing-a-dual-feedback-apparatus/ 音乐生成 | 5.5/10 ICASSP 2026 - 实时处理论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-023/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-023/ 共 1 篇 ICASSP 2026 实时处理方向论文 Joint Deep Secondary Path Estimation and Adaptive Control for Active Noise Cancellation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-deep-secondary-path-estimation-and-adaptive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-deep-secondary-path-estimation-and-adaptive/ 语音增强 | 7.5/10 Joint Estimation of Primary and Secondary Paths for Personalized Hearable Applications https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-primary-and-secondary-paths/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-primary-and-secondary-paths/ 主动降噪 | 7.5/10 LAFUFU: Latent Acoustic Features For Ultra-Fast Utterance Restoration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lafufu-latent-acoustic-features-for-ultra-fast/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lafufu-latent-acoustic-features-for-ultra-fast/ 语音增强 | 8.0/10 Lisa: Lightweight Yet Superb Neural Speech Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lisa-lightweight-yet-superb-neural-speech-coding/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lisa-lightweight-yet-superb-neural-speech-coding/ 语音编码 | 8.5/10 Low-Frequency Harmonic Control for Speech Intelligibility in Open-Ear Headphones https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-frequency-harmonic-control-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-frequency-harmonic-control-for-speech/ 语音增强 | 6.5/10 Low-Latency Audio Front-End Region-of-Interest Beamforming for Smart Glasses https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-latency-audio-front-end-region-of-interest/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-latency-audio-front-end-region-of-interest/ 语音增强 | 7.0/10 Matching Reverberant Speech Through Learned Acoustic Embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matching-reverberant-speech-through-learned/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matching-reverberant-speech-through-learned/ 音频生成 | 8.0/10 Meanflow-Accelerated Multimodal Video-to-Audio Synthesis Via One-Step Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflow-accelerated-multimodal-video-to-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflow-accelerated-multimodal-video-to-audio/ 音频生成 | 7.5/10 MeanFlowSE: One-Step Generative Speech Enhancement via Conditional Mean Flow https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflowse-one-step-generative-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflowse-one-step-generative-speech-enhancement/ 语音增强 | 7.5/10 MeanSE: Efficient Generative Speech Enhancement with Mean Flows https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanse-efficient-generative-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanse-efficient-generative-speech-enhancement/ 语音增强 | 6.5/10 MixGAN-based Non-blind Bandwidth Extension for Audio Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixgan-based-non-blind-bandwidth-extension-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixgan-based-non-blind-bandwidth-extension-for/ 音频增强 | 8.0/10 NCF-TTS: Enhancing Flow Matching Based Text-To-Speech with Neighborhood Consistency Flow https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/ 语音合成 | 8.0/10 On The Design of Efficient Neural Methods for Geometry-Agnostic Multichannel Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-the-design-of-efficient-neural-methods-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-the-design-of-efficient-neural-methods-for/ 语音增强 | 6.5/10 ParaGSE: Parallel Generative Speech Enhancement with Group-Vector-Quantization-Based Neural Speech Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-paragse-parallel-generative-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-paragse-parallel-generative-speech-enhancement/ 语音增强 | 7.5/10 Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prosody-guided-harmonic-attention-for-phase/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prosody-guided-harmonic-attention-for-phase/ 语音合成 | 8.0/10 PSTalker: Realistic 3D Talking Head Synthesis via a Semantic-Aware Audio-Driven Point-Based Shape https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pstalker-realistic-3d-talking-head-synthesis-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pstalker-realistic-3d-talking-head-synthesis-via/ 说话人合成 | 7.5/10 Real-Time Streaming MEL Vocoding with Generative Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-real-time-streaming-mel-vocoding-with-generative/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-real-time-streaming-mel-vocoding-with-generative/ 语音合成 | 7.5/10 Robust Online Overdetermined Independent Vector Analysis Based on Bilinear Decomposition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-online-overdetermined-independent-vector/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-online-overdetermined-independent-vector/ 语音分离 | 7.0/10 SFM-TTS: Lightweight and Rapid Speech Synthesis with Flexible Shortcut Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/ 语音合成 | 7.0/10 Shortcut Flow Matching for Speech Enhancement: Step-Invariant Flows via Single Stage Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shortcut-flow-matching-for-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shortcut-flow-matching-for-speech-enhancement/ 语音增强 | 7.0/10 Spring Reverb Emulation with Hybrid Gated Convolutional Networks and State Space Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spring-reverb-emulation-with-hybrid-gated/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spring-reverb-emulation-with-hybrid-gated/ 音频生成 | 7.5/10 Stereophonic Acoustic Echo Cancellation Using an Improved Affine Projection Algorithm with Adaptive Multiple Sub-Filters https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stereophonic-acoustic-echo-cancellation-using-an/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stereophonic-acoustic-echo-cancellation-using-an/ 语音增强 | 6.0/10 Str-DiffSep: Streamable Diffusion Model for Speech Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-str-diffsep-streamable-diffusion-model-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-str-diffsep-streamable-diffusion-model-for-speech/ 语音分离 | 7.5/10 Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization Via Neural Audio Codec and Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stream-voice-anon-enhancing-utility-of-real-time/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stream-voice-anon-enhancing-utility-of-real-time/ 语音匿名化 | 7.0/10 Synchronous Secondary Path Modeling and Kronecker-Factorized Adaptive Algorithm for Multichannel Active Noise Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synchronous-secondary-path-modeling-and-kronecker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synchronous-secondary-path-modeling-and-kronecker/ 主动噪声控制 | 7.0/10 T-Cache: Fast Inference For Masked Generative Transformer-Based TTS Via Prompt-Aware Feature Caching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-cache-fast-inference-for-masked-generative/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-cache-fast-inference-for-masked-generative/ 语音合成 | 9.0/10 T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Phone TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-mimi-a-transformer-based-mimi-decoder-for-real/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-mimi-a-transformer-based-mimi-decoder-for-real/ 语音合成 | 7.0/10 Time-Domain Synthesis of Virtual Sound Source Within Personalized Sound Zone using a Linear Loudspeaker Array https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-time-domain-synthesis-of-virtual-sound-source/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-time-domain-synthesis-of-virtual-sound-source/ 空间音频 | 8.0/10 Towards Real-Time Generative Speech Restoration with Flow-Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-real-time-generative-speech-restoration/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-real-time-generative-speech-restoration/ 语音增强 | 6.0/10 UJCodec: An End-to-end Unet-Style Codec for Joint Speech Compression and Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ujcodec-an-end-to-end-unet-style-codec-for-joint/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ujcodec-an-end-to-end-unet-style-codec-for-joint/ 语音增强 | 7.5/10 VChangeCodec: An Ultra Low-Complexity Neural Speech Codec with Built-In Voice Changer for Customized Real-Time Communication https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vchangecodec-an-ultra-low-complexity-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vchangecodec-an-ultra-low-complexity-neural/ 语音转换语音增强 | 8.0/10 WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/ 语音识别 | 6.5/10 Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-hallo-live-real-time-streaming-joint-audio-video/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-hallo-live-real-time-streaming-joint-audio-video/ 音视频 | 8.5/10 Opening the Design Space: Two Years of Performance with Intelligent Musical Instruments https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-opening-the-design-space-two-years-of-performance/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-opening-the-design-space-two-years-of-performance/ 音乐生成 | 6.5/10 Predictive Directional Selective Fixed-Filter Active Noise Control for Moving Sources via a Convolutional Recurrent Neural Network https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-predictive-directional-selective-fixed-filter/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-predictive-directional-selective-fixed-filter/ 声源定位 | 7.5/10 RTCFake: Speech Deepfake Detection in Real-Time Communication https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-rtcfake-speech-deepfake-detection-in-real-time/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-rtcfake-speech-deepfake-detection-in-real-time/ 语音伪造检测 | 7.0/10 Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-dilated-cnns-for-periodic-signal-processing-a-low/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-dilated-cnns-for-periodic-signal-processing-a-low/ 语音增强 | 6.5/10 Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-full-duplex-interaction-in-spoken-dialogue/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-full-duplex-interaction-in-spoken-dialogue/ 语音对话系统 | 6.5/10 Sema: Semantic Transport for Real-Time Multimodal Agents https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-sema-semantic-transport-for-real-time-multimodal/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-sema-semantic-transport-for-real-time-multimodal/ 实时处理 | 6.5/10 Before the Mic: Physical-Layer Voiceprint Anonymization with Acoustic Metamaterials https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-before-the-mic-physical-layer-voiceprint/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-before-the-mic-physical-layer-voiceprint/ 这篇论文针对在公共场景（如会议、演讲）中，不可信录音设备可能导致声纹泄露且事后无法补救的问题，提出了EchoMask——首个基于声学超材料的物理层实时声纹匿名化系统。其核心方法是在声音到达麦克风前，通过精心设计的被动声学结构对特定低频段（300-700Hz）进行选择性干扰，该频段对说话人识别至关重要 Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-towards-streaming-target-speaker-extraction-via/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-towards-streaming-target-speaker-extraction-via/ 1. **要解决什么问题**：现有基于生成模型（如扩散模型、自回归模型）的目标说话人提取（TSE）方法依赖全局上下文，难以直接用于实时流式场景，强行适配会导致性能严重下降。 2. **方法核心是什么**：提出首个面向流式TSE的自回归（AR）框架，核心是“分块交错拼接范式”。该范式将混合语音分块 BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-beat-tokenizing-and-generating-symbolic-music-by/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-beat-tokenizing-and-generating-symbolic-music-by/ 本文针对符号音乐生成中主流的事件序列（event-based）tokenization方法隐含处理时间规律、导致模型需额外学习时间网格的问题，提出了一种名为**BEAT**的新型网格化tokenization框架。其核心思想是将音乐在时间上均匀离散化为“拍”（beat）作为基本单位，将每拍内每个音高 MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-mtr-duplexbench-towards-a-comprehensive/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-mtr-duplexbench-towards-a-comprehensive/ 这篇论文旨在解决当前全双工语音语言模型（FD-SLMs）评测体系的一个关键缺陷：缺乏对多轮、连续对话能力的系统性评估。现有基准多关注单轮交互或特定对话特性（如打断），忽略了模型在多轮语境下维持指令遵循、安全等核心能力的一致性。为此，作者提出了**MTR-DuplexBench**，一个全新的多轮全双语音/音频论文速递 2026-04-22 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/ 共分析 21 篇语音/AI 论文 Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-full-duplex-bench-v3-benchmarking-tool-use-for/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-full-duplex-bench-v3-benchmarking-tool-use-for/ 这篇论文针对当前全双工语音代理评估缺乏真实性（依赖合成语音）和任务简单性（单步调用）的问题，提出了**Full-Duplex-Bench-v3 (FDB-v3)** 基准。该基准的核心创新在于使用**100条真实人类录音**（含五种不流畅性注释），在四个任务域中设计了需要**多步API链式调用**的 MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-moshirag-asynchronous-knowledge-retrieval-for/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-moshirag-asynchronous-knowledge-retrieval-for/ 本文旨在解决全双工语音语言模型（如Moshi）事实性不足的核心问题，同时不牺牲其高交互性。**问题**：全双工模型能实时打断和回应，但因训练数据规模远小于文本，其知识储备和事实准确性较弱。**方法**：提出了MoshiRAG，一个模块化框架。它在Moshi模型中引入一个特殊的`<ret>`检索触发令 An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-an-ultra-low-latency-end-to-end-streaming-speech/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-an-ultra-low-latency-end-to-end-streaming-speech/ 这篇论文旨在解决实时交互式语音合成中**推理延迟高**与**声学质量（尤其是高频细节）易受损**的核心矛盾。传统流水线依赖计算密集的神经声码器进行波形重建，且基于连续回归的声学模型易导致频谱过平滑。为 Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-dual-axis-generative-reward-model-toward-semantic/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-dual-axis-generative-reward-model-toward-semantic/ 本文旨在解决全双工语音对话模型（SDMs）实现类人交互的核心挑战。现有自动化评估指标流于表面（如统计行为或预测时机准确率），无法为强化学习提供可靠的奖励信号，而人工评估成本高昂且难以扩展。为此，作者提 Four Decades of Digital Waveguides https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-four-decades-of-digital-waveguides/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-four-decades-of-digital-waveguides/ 这篇论文旨在全面回顾数字波导物理建模技术自诞生以来四十年的发展历程、核心应用与最新进展。它要解决的核心问题是，如何在保证物理模拟准确性的同时，实现声波传播模拟的高效计算，以满足实时音频处理（如虚拟乐器