Posts on 语音/音频论文速递

Posts on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/posts/ Recent content in Posts on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 3D Mesh Grid Room Impulse Responses Measured with A Linear Microphone Array And Suppression of Frame Reflections https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-3d-mesh-grid-room-impulse-responses-measured-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-3d-mesh-grid-room-impulse-responses-measured-with/ 空间音频 | 8.3/10 A Bayesian Approach to Singing Skill Evaluation Using Semitone Pitch Histogram and MCMC-Based Generated Quantities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bayesian-approach-to-singing-skill-evaluation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bayesian-approach-to-singing-skill-evaluation/ 音乐理解 | 7.0/10 A Bimodal Approach for Detecting Fatigue Using Speech and Personal Assessments in College Students https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bimodal-approach-for-detecting-fatigue-using/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bimodal-approach-for-detecting-fatigue-using/ A Bimodal Approach for Detecting Fatigue Using Speech and Personal Assessments in College Students A Consistent Learning Depression Detection Framework Integrating Multi-View Attention https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-consistent-learning-depression-detection/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-consistent-learning-depression-detection/ 语音生物标志物 | 6.5/10 A Data-Driven Framework for Personal Sound Zone Control Addressing Loudspeaker Nonlinearities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-data-driven-framework-for-personal-sound-zone/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-data-driven-framework-for-personal-sound-zone/ 空间音频 | 7.5/10 A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-dataset-of-robot-patient-and-doctor-patient/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-dataset-of-robot-patient-and-doctor-patient/ 语音对话系统 | 7.5/10 A Distribution Matching Approach to Neural Piano Transcription with Optimal Transport https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-distribution-matching-approach-to-neural-piano/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-distribution-matching-approach-to-neural-piano/ 音乐转录 | 7.0/10 A Dynamic Gated Cross-Attention Framework for Audio-Text Apparent Personality Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-dynamic-gated-cross-attention-framework-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-dynamic-gated-cross-attention-framework-for/ 音频分类 | 7.0/10 A Feature-Optimized Audio Watermarking Algorithm with Adaptive Embedding Strength https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-feature-optimized-audio-watermarking-algorithm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-feature-optimized-audio-watermarking-algorithm/ 音频安全 | 7.5/10 A Framework for Controlled Multi-Speaker Audio Synthesis for Robustness Evaluation of Speaker Diarisation Systems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-framework-for-controlled-multi-speaker-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-framework-for-controlled-multi-speaker-audio/ 说话人日志 | 7.5/10 A Generalization Strategy for Speech Quality Prediction: From Domain-Specific to Unified Datasets https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generalization-strategy-for-speech-quality/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generalization-strategy-for-speech-quality/ 语音质量评估 | 6.5/10 A Generative-First Neural Audio Autoencoder https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generative-first-neural-audio-autoencoder/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-generative-first-neural-audio-autoencoder/ 音乐生成 | 8.5/10 A Hybrid Convolution-Mamba Network with Tone-Octave Contrastive Learning for Stratified Semi-Supervised Singing Melody Extraction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-hybrid-convolution-mamba-network-with-tone/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-hybrid-convolution-mamba-network-with-tone/ 歌唱旋律提取 | 7.5/10 A Learning-Based Automotive Sound Field Reproduction Method Using Plane-Wave Decomposition and Multi-Position Constraint https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-learning-based-automotive-sound-field/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-learning-based-automotive-sound-field/ 空间音频 | 7.5/10 A Lightweight Fourier-Based Network for Binaural Speech Enhancement with Spatial Cue Preservation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-lightweight-fourier-based-network-for-binaural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-lightweight-fourier-based-network-for-binaural/ 语音增强 | 8.5/10 A LLM-Driven Acoustic Semantic Enriched Framework for Underwater Acoustic Target Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-llm-driven-acoustic-semantic-enriched-framework/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-llm-driven-acoustic-semantic-enriched-framework/ 音频分类 | 7.0/10 A Metric Learning Approach to Heart Murmur Detection from Phonocardiogram Recordings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-metric-learning-approach-to-heart-murmur/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-metric-learning-approach-to-heart-murmur/ 音频分类 | 7.7/10 A New Method and Dataset for Classroom Teaching Stage Segmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-new-method-and-dataset-for-classroom-teaching/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-new-method-and-dataset-for-classroom-teaching/ 课堂阶段分割 | 6.5/10 A Noniterative Phase Retrieval Considering the Zeros of STFT Magnitude https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-noniterative-phase-retrieval-considering-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-noniterative-phase-retrieval-considering-the/ 信号处理 | 7.5/10 A Noval Monte Carlo Gradient Method Based on Meta-Learning for Effective Step-Size Selection in Active Noise Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-noval-monte-carlo-gradient-method-based-on-meta/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-noval-monte-carlo-gradient-method-based-on-meta/ 噪声控制 | 6.5/10 A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-parameter-efficient-multi-scale-convolutional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-parameter-efficient-multi-scale-convolutional/ A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection A Personalized Real-Time Proactive Voice Memory Assistant https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-personalized-real-time-proactive-voice-memory/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-personalized-real-time-proactive-voice-memory/ 实时处理 | 7.0/10 A Robust KNN Approach for Multi-Class Laryngeal Disease Detection using MFCC Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-robust-knn-approach-for-multi-class-laryngeal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-robust-knn-approach-for-multi-class-laryngeal/ 音频分类 | 7.5/10 A Robust Multi-Scale Framework with Test-Time Adaptation for sEEG-Based Speech Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-robust-multi-scale-framework-with-test-time/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-robust-multi-scale-framework-with-test-time/ 语音解码 | 7.5/10 A Speech-Driven Paradigm for Physics-Informed Modeling of Coupled Micro-Speakers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-speech-driven-paradigm-for-physics-informed/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-speech-driven-paradigm-for-physics-informed/ 音频生成 | 7.0/10 A Stabilized Hybrid Active Noise Control Algorithm of GFANC and FxNLMS with Online Clustering https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-stabilized-hybrid-active-noise-control/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-stabilized-hybrid-active-noise-control/ 语音增强 | 7.5/10 A State-Dependent Markov Diffusion Process for Generative Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-state-dependent-markov-diffusion-process-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-state-dependent-markov-diffusion-process-for/ 语音增强 | 6.5/10 A Study of Data Selection Strategies for Pre-Training Self-Supervised Speech Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-study-of-data-selection-strategies-for-pre/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-study-of-data-selection-strategies-for-pre/ 语音识别 | 7.5/10 A Superb-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-superb-style-benchmark-of-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-superb-style-benchmark-of-self-supervised/ 音频深度伪造检测 | 7.0/10 A Task-Aware Dual-Level Self-Supervised Learning Method for Effective Sound Event Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-task-aware-dual-level-self-supervised-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-task-aware-dual-level-self-supervised-learning/ 音频事件检测 | 7.5/10 A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-text-to-text-alignment-algorithm-for-better/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-text-to-text-alignment-algorithm-for-better/ 模型评估 | 7.5/10 A Unified SVD-Modal Solution for Sparse Sound Field Reconstruction with Hybrid Spherical-Linear Microphone Arrays https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-unified-svd-modal-solution-for-sparse-sound/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-unified-svd-modal-solution-for-sparse-sound/ 声源定位 | 6.5/10 A Unsupervised Domain Adaptation Framework For Semi-Supervised Melody Extraction Using Confidence Matrix Replace and Nearest Neighbour Supervision https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-unsupervised-domain-adaptation-framework-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-unsupervised-domain-adaptation-framework-for/ 音乐信息检索 | 8.0/10 ACAVCaps: Enabling Large-Scale Training for Fine-Grained and Diverse Audio Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acavcaps-enabling-large-scale-training-for-fine/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acavcaps-enabling-large-scale-training-for-fine/ 音频分类 | 8.5/10 Accelerating Regularized Attention Kernel Regression for Spectrum Cartography https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-accelerating-regularized-attention-kernel/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-accelerating-regularized-attention-kernel/ 频谱测绘 | 8.5/10 AccLID: Accent-aware Language Identification for Robust Multilingual Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acclid-accent-aware-language-identification-for/ 语音识别 | 7.0/10 ACIR-MACL: Effective Multimodal Sentiment Analysis via Attention-Based Causal Intervention Regularization and Multi-Aspect Contrastive Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acir-macl-effective-multimodal-sentiment-analysis/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acir-macl-effective-multimodal-sentiment-analysis/ 情感分析 | 7.0/10 Acoustic and Facial Markers of Perceived Conversational Success in Spontaneous Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-and-facial-markers-of-perceived/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-and-facial-markers-of-perceived/ 语音情感识别 | 6.0/10 Acoustic Feedback Cancellation in Hearing Aids Exploiting an Inertial Sensor https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-feedback-cancellation-in-hearing-aids/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-feedback-cancellation-in-hearing-aids/ 音频分类 | 7.0/10 Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-non-stationarity-objective-assessment/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-non-stationarity-objective-assessment/ 音频分类 | 7.0/10 Acoustic Teleportation Via Disentangled Neural Audio Codec Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-teleportation-via-disentangled-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-teleportation-via-disentangled-neural/ 语音增强 | 7.0/10 Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adapting-diarization-conditioned-whisper-for-end/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adapting-diarization-conditioned-whisper-for-end/ 语音识别 | 7.5/10 Adaptive Deterministic Flow Matching for Target Speaker Extraction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-deterministic-flow-matching-for-target/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-deterministic-flow-matching-for-target/ 目标说话人提取 | 8.0/10 Adaptive Embedding Fusion with Contrastive Learning for Robust Fully Few-Shot Class-Incremental Audio Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-embedding-fusion-with-contrastive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-embedding-fusion-with-contrastive/ 音频分类 | 7.5/10 Adaptive Per-Channel Energy Normalization Front-End for Robust Audio Signal Processing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-per-channel-energy-normalization-front/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-per-channel-energy-normalization-front/ 音频分类 | 7.5/10 Adaptive Rotary Steering with Joint Autoregression for Robust Extraction of Closely Moving Speakers in Dynamic Scenarios https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-rotary-steering-with-joint/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-rotary-steering-with-joint/ 语音分离 | 8.5/10 Adaptive Spectral Weighting in Sagittal-Plane Sound Localization: A Reliability-Driven Approach https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-spectral-weighting-in-sagittal-plane/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-spectral-weighting-in-sagittal-plane/ 声源定位 | 6.5/10 Adaptive Task-Incremental Learning For Underwater Acoustic Recognition Based on Mixture-of-Experts Adapter https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-task-incremental-learning-for-underwater/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-task-incremental-learning-for-underwater/ 水下声学目标识别 | 7.0/10 Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-addressing-gradient-misalignment-in-data/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-addressing-gradient-misalignment-in-data/ 语音伪造检测 | 7.0/10 ADH-VA: Adaptive Directed-Hypergraph Convolution with VA Contrastive Learning for Multimodal Conversational Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adh-va-adaptive-directed-hypergraph-convolution/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adh-va-adaptive-directed-hypergraph-convolution/ 语音情感识别 | 7.5/10 Advanced modeling of interlanguage speech intelligibility benefit with L1-L2 multi-task learning using differentiable K-means for accent-robust discrete token-based ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advanced-modeling-of-interlanguage-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advanced-modeling-of-interlanguage-speech/ 语音识别 | 7.0/10 Advancing LLM-Based Multi-Channel Multi-Speaker Speech Recognition with Global Cross-Channel Attention and Sentence-Ordered First-In First-Out Serialized Output Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-llm-based-multi-channel-multi-speaker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-llm-based-multi-channel-multi-speaker/ 语音识别 | 7.5/10 Advancing Semi-Supervised Child Speech Recognition with Omni-Temporal Classification under Label Noise https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-semi-supervised-child-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-semi-supervised-child-speech/ 语音识别 | 7.5/10 Advancing Speech Summarization in Multi-Modal LLMs with Reinforcement Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-summarization-in-multi-modal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-summarization-in-multi-modal/ 音频问答 | 7.0/10 Advancing Speech Understanding in Speech-Aware Language Models with GRPO https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-understanding-in-speech-aware/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-advancing-speech-understanding-in-speech-aware/ 语音问答 | 7.0/10 Adversarial Defense via Generative Speech Enhancement Module https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adversarial-defense-via-generative-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adversarial-defense-via-generative-speech/ 语音增强对抗防御 | 7.5/10 Adversarial Fine-Tuning on Speech Foundation Model with Vulnerable Attention Consistency Regularization for Robust Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adversarial-fine-tuning-on-speech-foundation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adversarial-fine-tuning-on-speech-foundation/ 语音识别 | 7.5/10 Adversarial Rivalry Learning for Music Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adversarial-rivalry-learning-for-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adversarial-rivalry-learning-for-music/ 音乐分类 | 6.5/10 Affect-Jigsaw: Integrating Core and Peripheral Emotions for Harmonious Fine-Grained Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-affect-jigsaw-integrating-core-and-peripheral/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-affect-jigsaw-integrating-core-and-peripheral/ 语音情感识别 | 8.0/10 AFT: An Exemplar-Free Class Incremental Learning Method for Environmental Sound Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aft-an-exemplar-free-class-incremental-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aft-an-exemplar-free-class-incremental-learning/ 音频分类 | 7.0/10 AI-Generated Music Detection in Broadcast Monitoring https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ai-generated-music-detection-in-broadcast/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ai-generated-music-detection-in-broadcast/ 音频深度伪造检测 | 7.0/10 Ailive Mixer: A Deep Learning Based Zero Latency Automatic Music Mixer for Live Music Performances https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ailive-mixer-a-deep-learning-based-zero-latency/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ailive-mixer-a-deep-learning-based-zero-latency/ 音乐混合 | 7.0/10 AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Speech Dataset with Speech Recognition Baselines https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aishell6-whisper-a-chinese-mandarin-audio-visual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aishell6-whisper-a-chinese-mandarin-audio-visual/ 语音识别 | 8.3/10 Aligning Generative Speech Enhancement with Perceptual Feedback https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aligning-generative-speech-enhancement-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aligning-generative-speech-enhancement-with/ 语音增强 | 7.5/10 Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aligning-language-models-for-lyric-to-melody/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aligning-language-models-for-lyric-to-melody/ 音乐生成 | 7.5/10 ALMA-Chor: Leveraging Audio-Lyric Alignment with Mamba for Chorus Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-alma-chor-leveraging-audio-lyric-alignment-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-alma-chor-leveraging-audio-lyric-alignment-with/ 音乐信息检索 | 7.0/10 AMBER2: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-amber2-dual-ambiguity-aware-emotion-recognition/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-amber2-dual-ambiguity-aware-emotion-recognition/ 语音情感识别 | 8.0/10 AmbiDrop: Array-Agnostic Speech Enhancement Using Ambisonics Encoding and Dropout-Based Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ambidrop-array-agnostic-speech-enhancement-using/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ambidrop-array-agnostic-speech-enhancement-using/ 语音增强 | 7.0/10 AMBISONIC-DML: A Benchmark Dataset for Dynamic Higher-Order Ambisonics Music with Motion-Aligned Stems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ambisonic-dml-a-benchmark-dataset-for-dynamic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ambisonic-dml-a-benchmark-dataset-for-dynamic/ 数据集 | 7.5/10 An Anomaly-Aware and Audio-Enhanced Dual-Pathway Framework for Alzheimer’s Disease Progression Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-anomaly-aware-and-audio-enhanced-dual-pathway/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-anomaly-aware-and-audio-enhanced-dual-pathway/ 语音生物标志物 | 7.0/10 An Audio-Visual Speech Separation Network with Joint Cross-Attention and Iterative Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-audio-visual-speech-separation-network-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-audio-visual-speech-separation-network-with/ 语音分离 | 7.5/10 An Efficient Neural Network for Modeling Human Auditory Neurograms for Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-efficient-neural-network-for-modeling-human/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-efficient-neural-network-for-modeling-human/ 语音增强 | 7.0/10 An End-to-End Multimodal System for Subtitle Recognition and Chinese-Japanese Translation in Short Dramas https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-end-to-end-multimodal-system-for-subtitle/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-end-to-end-multimodal-system-for-subtitle/ 多模态模型 | 7.0/10 An Envelope Separation Aided Multi-Task Learning Model for Blind Source Counting and Localization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-envelope-separation-aided-multi-task-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-envelope-separation-aided-multi-task-learning/ 声源定位 | 6.5/10 An Event-Based Sequence Modeling Approach to Recognizing Non-Triad Chords with Oversegmentation Minimization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-event-based-sequence-modeling-approach-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-event-based-sequence-modeling-approach-to/ 音乐信息检索 | 7.5/10 An Unsupervised Alignment Feature Fusion System for Spoken Language-Based Dementia Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-unsupervised-alignment-feature-fusion-system/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-unsupervised-alignment-feature-fusion-system/ 语音生物标志物 | 7.0/10 Aneural Forward Filtering for Speaker-Image Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aneural-forward-filtering-for-speaker-image/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aneural-forward-filtering-for-speaker-image/ 语音分离 | 7.5/10 AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-animalclap-taxonomy-aware-language-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-animalclap-taxonomy-aware-language-audio/ 音频分类 | 8.0/10 AnyAccomp: Generalizable Accompaniment Generation Via Quantized Melodic Bottleneck https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-anyaccomp-generalizable-accompaniment-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-anyaccomp-generalizable-accompaniment-generation/ 音乐生成 | 8.0/10 AnyRIR: Robust Non-Intrusive Room Impulse Response Estimation in the Wild https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-anyrir-robust-non-intrusive-room-impulse-response/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-anyrir-robust-non-intrusive-room-impulse-response/ 空间音频 | 7.0/10 APKD: Aligned And Paced Knowledge Distillation Towards Lightweight Heterogeneous Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-apkd-aligned-and-paced-knowledge-distillation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-apkd-aligned-and-paced-knowledge-distillation/ 情感识别 | 7.5/10 AQUA-Bench: Beyond finding answers to knowing when there are None in Audio Question Answering https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aqua-bench-beyond-finding-answers-to-knowing-when/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aqua-bench-beyond-finding-answers-to-knowing-when/ 音频问答 | 7.0/10 AR-BSNet: Towards Ultra-Low Complexity Autoregressive Target Speaker Extraction With Band-Split Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ar-bsnet-towards-ultra-low-complexity/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ar-bsnet-towards-ultra-low-complexity/ 语音分离 | 7.0/10 AR&D: A Framework for Retrieving and Describing Concepts for Interpreting AudioLLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ard-a-framework-for-retrieving-and-describing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ard-a-framework-for-retrieving-and-describing/ 音频大模型 | 6.5/10 Ara-BEST-RQ: Multi Dialectal Arabic SSL https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/ 语音识别 | 6.5/10 Arbitrarily Settable Frame Rate Neural Speech Codec with Content Adaptive Variable Length Segmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-arbitrarily-settable-frame-rate-neural-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-arbitrarily-settable-frame-rate-neural-speech/ 音频生成 | 7.0/10 ARCHI-TTS: A Flow-Matching-Based Text-to-Speech Model with Self-Supervised Semantic Aligner and Accelerated Inference https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-archi-tts-a-flow-matching-based-text-to-speech/ 语音合成 | 8.0/10 Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-are-modern-speech-enhancement-systems-vulnerable/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-are-modern-speech-enhancement-systems-vulnerable/ 语音增强 | 7.5/10 ASAP: An Azimuth-Priority Strip-Based Search Approach to Planar Microphone Array DOA Estimation in 3D https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asap-an-azimuth-priority-strip-based-search/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asap-an-azimuth-priority-strip-based-search/ 声源定位 | 7.5/10 Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-identity-leakage-in-talking-face/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-identity-leakage-in-talking-face/ 说话人脸生成 | 7.5/10 Assessing the Impact of Speaker Identity in Speech Spoofing Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-impact-of-speaker-identity-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-impact-of-speaker-identity-in/ 音频深度伪造检测 | 8.0/10 Assessing The Perceptual Impact of Low-Altitude Aircraft Noise in Cities: An Auralization Framework Using Gaussian Beam Tracing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-perceptual-impact-of-low-altitude/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-the-perceptual-impact-of-low-altitude/ 音频生成 | 8.0/10 Asynchrony-Aware Decoupled Multimodal Control for Cued Speech Video Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/ 语音合成 | 7.5/10 ATOM: Adaptive Token-Level Optimal Transport Mixup for Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/ 语音翻译 | 8.0/10 Atomic Norm Minimization Revisited: Progressive Atom Identification And Refinement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atomic-norm-minimization-revisited-progressive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atomic-norm-minimization-revisited-progressive/ 声源定位 | 7.5/10 Attention-Based Encoder-Decoder Target-Speaker Voice Activity Detection for Robust Speaker Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-based-encoder-decoder-target-speaker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-based-encoder-decoder-target-speaker/ 说话人分离 | 8.0/10 Attention-Weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied To Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-weighted-centered-kernel-alignment-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-weighted-centered-kernel-alignment-for/ 语音情感识别 | 8.0/10 Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-text System https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention2probability-attention-driven/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention2probability-attention-driven/ 语音识别 | 7.0/10 Attentive AV-Fusionnet: Audio-Visual Quality Prediction with Hybrid Attention https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attentive-av-fusionnet-audio-visual-quality/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attentive-av-fusionnet-audio-visual-quality/ 音视频 | 7.0/10 Attentive Masked Self-Distillation for Respiratory Sound Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attentive-masked-self-distillation-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attentive-masked-self-distillation-for/ 音频分类 | 7.5/10 Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auden-voice-general-purpose-voice-encoder-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auden-voice-general-purpose-voice-encoder-for/ 语音编码器 | 7.5/10 Audience-Aware Co-speech Gesture Generation in Public Speaking via Anticipation Tokens https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audience-aware-co-speech-gesture-generation-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audience-aware-co-speech-gesture-generation-in/ 音频生成 | 8.0/10 Audio Classification Models are Vulnerable to Filter Perturbations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-classification-models-are-vulnerable-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-classification-models-are-vulnerable-to/ 音频分类 | 7.5/10 Audio Deepfake Detection at the First Greeting: "Hi!" https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-deepfake-detection-at-the-first-greeting-hi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-deepfake-detection-at-the-first-greeting-hi/ 音频深度伪造检测 | 7.5/10 Audio Effect Estimation with DNN-Based Prediction and Search Algorithm https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-effect-estimation-with-dnn-based-prediction/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-effect-estimation-with-dnn-based-prediction/ 音频效果估计 | 7.0/10 Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-conditioned-diffusion-llms-for-asr-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-conditioned-diffusion-llms-for-asr-and/ 语音识别 | 7.0/10 Audio-Guided Multimodal Approach for Fine-Grained Alignment and Boundary Modeling in Active Speaker Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-guided-multimodal-approach-for-fine-grained/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-guided-multimodal-approach-for-fine-grained/ 说话人检测 | 7.5/10 Audio-Text Jailbreak Attack on Large Audio-Language Models: Towards Generality and Stealthiness https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-text-jailbreak-attack-on-large-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-text-jailbreak-attack-on-large-audio/ 音频安全 | 7.0/10 Audio-to-Score Jazz Solo Transcription with the Rhythm Perceiver https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-to-score-jazz-solo-transcription-with-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-to-score-jazz-solo-transcription-with-the/ 音乐信息检索 | 7.5/10 Audio-Visual Deepfake Generation and Detection: An Exploratory Survey https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-visual-deepfake-generation-and-detection-an/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-visual-deepfake-generation-and-detection-an/ 音频深度伪造检测 | 6.5/10 Audio-Visual Feature Fusion for Calibrating Relevance Scores of Video Moment Retrieval https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-visual-feature-fusion-for-calibrating/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-visual-feature-fusion-for-calibrating/ 视频片段检索 | 7.0/10 AUDIOCARDS: Structured Metadata Improves Audio Language Models for Sound Design https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiocards-structured-metadata-improves-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiocards-structured-metadata-improves-audio/ 音频检索 | 7.5/10 AudioFuse: Unified Spectral-Temporal Learning Via A Hybrid VIT-1D CNN Architecture for Phonocardiogram Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiofuse-unified-spectral-temporal-learning-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiofuse-unified-spectral-temporal-learning-via/ 音频分类 | 7.5/10 AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiogen-omni-a-unified-multimodal-diffusion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiogen-omni-a-unified-multimodal-diffusion/ 音频生成 | 7.5/10 AUDIOGENIE-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiogenie-reasoner-a-training-free-multi-agent/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audiogenie-reasoner-a-training-free-multi-agent/ 音频问答 | 7.0/10 Auditory Illusion Benchmark for Large Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auditory-illusion-benchmark-for-large-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auditory-illusion-benchmark-for-large-audio/ 模型评估 | 7.0/10 Auditory-Inspired Transformer for Binaural Speech Enhancement and Spatial Cue Preservation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auditory-inspired-transformer-for-binaural-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auditory-inspired-transformer-for-binaural-speech/ 语音增强 | 7.0/10 AURA: A Stegaformer-Based Scalable Deep Audio Watermark with Extreme Robustness https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aura-a-stegaformer-based-scalable-deep-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aura-a-stegaformer-based-scalable-deep-audio/ 音频水印 | 7.5/10 Auto-MatchCut: An Audio-Visual Retrieval Framework for Seamless Match Cutting https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auto-matchcut-an-audio-visual-retrieval-framework/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auto-matchcut-an-audio-visual-retrieval-framework/ 跨模态检索 | 7.0/10 Automated Dysphagia Screening Using Noninvasive Neck Acoustic Sensing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automated-dysphagia-screening-using-noninvasive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automated-dysphagia-screening-using-noninvasive/ 音频分类 | 8.0/10 Automatic Estimation of Speaker Diarization Error Rate Based on Features of Audio Quality and Speaker Discriminability https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-estimation-of-speaker-diarization-error/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-estimation-of-speaker-diarization-error/ 说话人分离 | 7.5/10 Automatic Music Mixing Using a Generative Model of Effect Embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-mixing-using-a-generative-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-mixing-using-a-generative-model/ 音乐生成 | 7.5/10 Automatic Music Sample Identification with Multi-Track Contrastive Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-sample-identification-with-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-music-sample-identification-with-multi/ 音频检索 | 7.5/10 AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auv-teaching-audio-universal-vector-quantization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auv-teaching-audio-universal-vector-quantization/ 音频生成 | 8.0/10 Auxiliary Multi-Label Training For Improving the Robustness of Audio Deepfake Detection on AI-Processed Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auxiliary-multi-label-training-for-improving-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auxiliary-multi-label-training-for-improving-the/ 音频深度伪造检测 | 6.5/10 AVATAR: Audio-Visual Adaptive Fusion via Trained Agent Reinforcement for Multimodal Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-avatar-audio-visual-adaptive-fusion-via-trained/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-avatar-audio-visual-adaptive-fusion-via-trained/ 音频深度伪造检测 | 7.5/10 AVO-65: A Large-Scale Hierarchical Audio-Visual Object Dataset https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-avo-65-a-large-scale-hierarchical-audio-visual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-avo-65-a-large-scale-hierarchical-audio-visual/ 音视频 | 7.0/10 B-GRPO: Unsupervised Speech Emotion Recognition Based on Batched-Group Relative Policy Optimization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-b-grpo-unsupervised-speech-emotion-recognition/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-b-grpo-unsupervised-speech-emotion-recognition/ 语音情感识别 | 6.5/10 BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on POP and Classical Music https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bachi-boundary-aware-symbolic-chord-recognition/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bachi-boundary-aware-symbolic-chord-recognition/ 音乐信息检索 | 7.5/10 Bayesian Low-Rank Factorization for Robust Model Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/ 语音识别 | 8.0/10 Bayesian Signal Separation Via Plug-and-Play Diffusion-Within-Gibbs Sampling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-signal-separation-via-plug-and-play/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-signal-separation-via-plug-and-play/ 语音分离 | 7.5/10 BBPE16: UTF-16-Based Byte-Level Byte-Pair Encoding for Improved Multilingual Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bbpe16-utf-16-based-byte-level-byte-pair-encoding/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bbpe16-utf-16-based-byte-level-byte-pair-encoding/ 语音识别 | 7.0/10 Beamforming Using Virtual Microphones for Hearing Aid Applications https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beamforming-using-virtual-microphones-for-hearing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beamforming-using-virtual-microphones-for-hearing/ 语音增强 | 7.5/10 Beat and Downbeat Detection: A Reformulated Approach https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beat-and-downbeat-detection-a-reformulated/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beat-and-downbeat-detection-a-reformulated/ 音乐理解 | 7.5/10 BeatMamba: Bidirectional Selective State-Space Modeling for Efficient Beat Tracking https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beatmamba-bidirectional-selective-state-space/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beatmamba-bidirectional-selective-state-space/ 音乐信息检索 | 7.5/10 Behind the Scenes: Mechanistic Interpretability of Lora-Adapted Whisper for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-behind-the-scenes-mechanistic-interpretability-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-behind-the-scenes-mechanistic-interpretability-of/ 语音情感识别 | 7.5/10 Benchmarking Humans And Machines On Complex Multilingual Speech Understanding Tasks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-humans-and-machines-on-complex/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-humans-and-machines-on-complex/ 音频问答 | 7.5/10 Benchmarking Music Autotagging with MGPHot Expert Annotations vs. Generic Tag Datasets https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-music-autotagging-with-mgphot-expert/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-music-autotagging-with-mgphot-expert/ 音乐信息检索 | 7.5/10 BEST-RQ-based Self-Supervised Learning for Whisper Domain Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/ 语音识别 | 7.5/10 BEST-STD 2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-std-20-balanced-and-efficient-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-std-20-balanced-and-efficient-speech/ 音频检索 | 7.5/10 Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-face-swapping-a-diffusion-based-digital/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-face-swapping-a-diffusion-based-digital/ 音频深度伪造检测 | 8.1/10 Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-global-emotion-fine-grained-emotional/ 语音合成 | 7.5/10 Beyond Isolated Utterances: Cue-Guided Interaction for Context-Dependent Conversational Multimodal Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-isolated-utterances-cue-guided-interaction/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-isolated-utterances-cue-guided-interaction/ 多模态模型 | 7.5/10 Beyond Mapping: Domain-Invariant Representations via Spectral Embedding of Optimal Transport Plans https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-mapping-domain-invariant-representations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-beyond-mapping-domain-invariant-representations/ 领域适应 | 7.5/10 Bimodal Fusion Framework for Dynamic Facial Expression Recognition In-The-Wild https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bimodal-fusion-framework-for-dynamic-facial/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bimodal-fusion-framework-for-dynamic-facial/ 语音情感识别 | 7.0/10 BioSEN: A Bio-Acoustic Signal Enhancement Network for Animal Vocalizations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-biosen-a-bio-acoustic-signal-enhancement-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-biosen-a-bio-acoustic-signal-enhancement-network/ 生物声学 | 7.5/10 BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/ 语音识别 | 8.0/10 Bleed No More: Generative Interference Reduction for Musical Recordings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bleed-no-more-generative-interference-reduction/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bleed-no-more-generative-interference-reduction/ 音乐源分离 | 7.0/10 Bloodroot: When Watermarking Turns Poisonous for Stealthy Backdoor https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bloodroot-when-watermarking-turns-poisonous-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bloodroot-when-watermarking-turns-poisonous-for/ 音频安全 | 7.5/10 Bone-Conduction Guided Multimodal Speech Enhancement with Conditional Diffusion Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bone-conduction-guided-multimodal-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bone-conduction-guided-multimodal-speech/ 语音增强 | 7.5/10 Brainprint-Modulated Target Speaker Extraction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-brainprint-modulated-target-speaker-extraction/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-brainprint-modulated-target-speaker-extraction/ 语音分离 | 8.0/10 Break-the-Beat! Controllable MIDI-to-Drum audio synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-break-the-beat-controllable-midi-to-drum-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-break-the-beat-controllable-midi-to-drum-audio/ 音乐生成 | 7.5/10 BridgeCode: A Dual Speech Representation Paradigm for Autoregressive Zero-Shot Text-to-Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridgecode-a-dual-speech-representation-paradigm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridgecode-a-dual-speech-representation-paradigm/ 语音合成 | 8.0/10 Bridging the Front-End and Back-End for Robust ASR via Cross-Attention-Based U-Net https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-front-end-and-back-end-for-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-front-end-and-back-end-for-robust/ 语音识别 | 7.0/10 Bridging the Measurement–Simulation Gap in Room Acoustics with Real2sim Diffusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-measurementsimulation-gap-in-room/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-measurementsimulation-gap-in-room/ 声源定位 | 8.5/10 Bridging the Semantic Gap: Cross-Attentive Fusion for Joint Acoustic-Semantic Speech Quality Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-semantic-gap-cross-attentive-fusion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridging-the-semantic-gap-cross-attentive-fusion/ 语音质量评估 | 8.5/10 BSMP-SENet:Band-Split Magnitude-Phase Network for Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bsmp-senetband-split-magnitude-phase-network-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bsmp-senetband-split-magnitude-phase-network-for/ 语音增强 | 7.0/10 CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-calm-joint-contextual-acoustic-linguistic/ 语音识别 | 7.5/10 CaMoD: Causal-Aware Modality Denoising for Multimodal Dialogue Intent Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-camod-causal-aware-modality-denoising-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-camod-causal-aware-modality-denoising-for/ 多模态对话意图识别 | 7.5/10 Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-can-hierarchical-cross-modal-fusion-predict-human/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-can-hierarchical-cross-modal-fusion-predict-human/ 模型评估 | 6.0/10 Can Large Audio Language Models Understand Audio Well? Speech, Scene and Events Understanding Benchmark for LALMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-can-large-audio-language-models-understand-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-can-large-audio-language-models-understand-audio/ 基准测试 | 7.0/10 Caption and Audio-Guided Video Representation Learning with Gated Attention for Partially Relevant Video Retrieval https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-caption-and-audio-guided-video-representation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-caption-and-audio-guided-video-representation/ 视频检索 | 7.0/10 Cardiobridge-DM: Bridging Cross-Cohort Heart Sound Synthesis via Rhythm-Aware Semi-Supervised Diffusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cardiobridge-dm-bridging-cross-cohort-heart-sound/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cardiobridge-dm-bridging-cross-cohort-heart-sound/ 音频生成 | 7.5/10 CASTELLA: Long Audio Dataset with Captions and Temporal Boundaries https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-castella-long-audio-dataset-with-captions-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-castella-long-audio-dataset-with-captions-and/ 音频检索 | 8.5/10 CCST: Cross-Modal and Consistency-Aware Self-Training for Source-Free Unsupervised Domain Adaptation in Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ccst-cross-modal-and-consistency-aware-self/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ccst-cross-modal-and-consistency-aware-self/ 语音识别 | 7.5/10 Chunk-Wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunk-wise-attention-transducers-for-fast-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunk-wise-attention-transducers-for-fast-and/ 语音识别 | 7.5/10 Chunkwise Aligners for Streaming Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunkwise-aligners-for-streaming-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-chunkwise-aligners-for-streaming-speech/ 语音识别 | 7.5/10 Class-Aware Permutation-Invariant Signal-to-Distortion Ratio for Semantic Segmentation of Sound Scene with Same-Class Sources https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-class-aware-permutation-invariant-signal-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-class-aware-permutation-invariant-signal-to/ 音频场景理解 | 7.5/10 ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-clawmark-a-living-world-benchmark-for-multi-turn/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-clawmark-a-living-world-benchmark-for-multi-turn/ 基准测试 | 7.0/10 Clue2Emo: A Brain-Inspired Framework for Open-Vocabulary Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-clue2emo-a-brain-inspired-framework-for-open/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-clue2emo-a-brain-inspired-framework-for-open/ 语音情感识别 | 8.5/10 CMSA-Mamba: Hierarchical State Space Modeling for Audio-Based Depression Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cmsa-mamba-hierarchical-state-space-modeling-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cmsa-mamba-hierarchical-state-space-modeling-for/ 语音生物标志物 | 7.0/10 Co-Initialization of Control Filter and Secondary Path via Meta-Learning for Active Noise Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-co-initialization-of-control-filter-and-secondary/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-co-initialization-of-control-filter-and-secondary/ 音频安全 | 7.5/10 CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-codecslime-temporal-redundancy-compression-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-codecslime-temporal-redundancy-compression-of/ 语音编码 | 7.5/10 CodeSep: Low-Bitrate Codec-Driven Speech Separation with Base-Token Disentanglement and Auxiliary-Token Serial Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-codesep-low-bitrate-codec-driven-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-codesep-low-bitrate-codec-driven-speech/ 语音分离 | 7.5/10 Combining Multi-Order Attention and Multi-Resolution Discriminator for High-Fidelity Neural Vocoder https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-multi-order-attention-and-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-multi-order-attention-and-multi/ 语音合成 | 6.5/10 Combining SSL Speech Features, Contextual Transformers and Mamba Models for Realistic Audio Spoofing Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-ssl-speech-features-contextual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-ssl-speech-features-contextual/ 音频深度伪造检测 | 7.5/10 Compression meets Sampling: LZ78-SPA for Efficient Symbolic Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-compression-meets-sampling-lz78-spa-for-efficient/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-compression-meets-sampling-lz78-spa-for-efficient/ 音乐生成 | 7.5/10 CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-Spoofing Countermeasures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-compspoof-a-dataset-and-joint-learning-framework/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-compspoof-a-dataset-and-joint-learning-framework/ 音频深度伪造检测 | 7.0/10 Condition-Invariant fMRI decoding of speech intelligibility with deep state space model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-condition-invariant-fmri-decoding-of-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-condition-invariant-fmri-decoding-of-speech/ 神经解码 | 7.0/10 Conditional Diffusion Models for Mental Health-Preserving Voice Conversion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-conditional-diffusion-models-for-mental-health/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-conditional-diffusion-models-for-mental-health/ 语音转换 | 8.0/10 Confidence-Based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-confidence-based-filtering-for-speech-dataset/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-confidence-based-filtering-for-speech-dataset/ 语音增强 | 6.5/10 Confidence-Guided Error Correction for Disordered Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-confidence-guided-error-correction-for-disordered/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-confidence-guided-error-correction-for-disordered/ 语音识别 | 7.5/10 Connecting Layer-Wise Representation of Wavlm with Spectro-Temporal Modulation on Speaker Verification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-connecting-layer-wise-representation-of-wavlm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-connecting-layer-wise-representation-of-wavlm/ 说话人验证 | 6.0/10 Constraint Optimized Multichannel Mixer-Limiter Design https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-constraint-optimized-multichannel-mixer-limiter/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-constraint-optimized-multichannel-mixer-limiter/ 多通道 | 7.0/10 Constructing Composite Features for Interpretable Music-Tagging https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-constructing-composite-features-for-interpretable/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-constructing-composite-features-for-interpretable/ 音乐信息检索 | 7.5/10 Content Anonymization for Privacy in Long-Form Audio https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-anonymization-for-privacy-in-long-form/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-anonymization-for-privacy-in-long-form/ 语音匿名化 | 7.5/10 Content Leakage in Librispeech and its Impact on the Privacy Evaluation of Speaker Anonymization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-leakage-in-librispeech-and-its-impact-on/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-leakage-in-librispeech-and-its-impact-on/ 语音匿名化 | 7.5/10 Content-Preserving Speech Representation Learning Via Adaptive Segment-Level Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-preserving-speech-representation-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-preserving-speech-representation-learning/ 语音识别 | 7.5/10 Context-Aware Dynamic Graph Learning for Multimodal Emotion Recognition with Missing Modalities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-context-aware-dynamic-graph-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-context-aware-dynamic-graph-learning-for/ 语音情感识别 | 8.8/10 Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contextual-biasing-for-asr-in-speech-llm-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contextual-biasing-for-asr-in-speech-llm-with/ 语音识别 | 7.0/10 Continuation Method for Feedback Delay Network Modal Decomposition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-continuation-method-for-feedback-delay-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-continuation-method-for-feedback-delay-network/ 空间音频 | 6.5/10 Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-continuous-token-diffusion-for-speaker-referenced/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-continuous-token-diffusion-for-speaker-referenced/ 语音合成 | 8.0/10 Contrastive Timbre Representations for Musical Instrument And Synthesizer Retrieval https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contrastive-timbre-representations-for-musical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-contrastive-timbre-representations-for-musical/ 音频检索 | 7.5/10 Controllable Embedding Transformation for Mood-Guided Music Retrieval https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-controllable-embedding-transformation-for-mood/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-controllable-embedding-transformation-for-mood/ 音乐检索 | 7.5/10 Cooperative Multi-Agent Reinforcement Learning for Adaptive Aggregation in Semi-Supervised Federated Learning with non-IID Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cooperative-multi-agent-reinforcement-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cooperative-multi-agent-reinforcement-learning/ 联邦学习 | 7.0/10 CosyAccent: Duration-Controllable Accent Normalization using Source-Synthesis Training Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cosyaccent-duration-controllable-accent/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cosyaccent-duration-controllable-accent/ 语音转换 | 7.8/10 Coupling Acoustic Geometry and Visual Semantics for Robust Depth Estimation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-coupling-acoustic-geometry-and-visual-semantics/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-coupling-acoustic-geometry-and-visual-semantics/ 空间音频 | 7.5/10 CoVA: Text-Guided Composed Video Retrieval for Audio-Visual Content https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cova-text-guided-composed-video-retrieval-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cova-text-guided-composed-video-retrieval-for/ 跨模态检索 | 6.5/10 Cross-Architecture Knowledge Distillation of WavLM for Lightweight Speaker Verification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-architecture-knowledge-distillation-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-architecture-knowledge-distillation-of/ 说话人验证 | 8.0/10 Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-cultural-bias-in-mel-scale-representations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-cultural-bias-in-mel-scale-representations/ 语音识别 | 7.0/10 Cross-Domain Contrastive Learning with Dynamic Threshold Calibration for Source Speaker Tracing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-domain-contrastive-learning-with-dynamic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-domain-contrastive-learning-with-dynamic/ 说话人验证 | 8.0/10 Cross-Lingual Alzheimer’s Disease Detection with Multimodal LLMs via Speech Cue-Augmented Prompting and Instruction Tuning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-alzheimers-disease-detection-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-alzheimers-disease-detection-with/ 语音生物标志物 | 6.5/10 Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-f5-tts-towards-language-agnostic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-f5-tts-towards-language-agnostic/ 语音克隆 | 7.5/10 Cross-Lingual Interleaving for Speech Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-interleaving-for-speech-language/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-lingual-interleaving-for-speech-language/ 语音大模型 | 7.5/10 Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-linguistic-rhythmic-and-spectral-feature/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-linguistic-rhythmic-and-spectral-feature/ Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh Cross-Modal Bottleneck Fusion for Noise Robust Audio-Visual Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-bottleneck-fusion-for-noise-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-bottleneck-fusion-for-noise-robust/ 语音识别 | 7.5/10 Cross-Modal Knowledge Distillation for Speech Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-knowledge-distillation-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-modal-knowledge-distillation-for-speech/ 语音大模型 | 7.0/10 CTC-DID: CTC-Based Arabic Dialect Identification for Streaming Applications https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/ 语音识别 | 6.5/10 Curriculum Learning with Contrastive Loss for Lightweight Speaker Verification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-curriculum-learning-with-contrastive-loss-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-curriculum-learning-with-contrastive-loss-for/ 说话人验证 | 6.5/10 Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cutscene-agent-an-llm-agent-framework-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cutscene-agent-an-llm-agent-framework-for/ 生成模型 | 8.5/10 D3PIA: A Discrete Denoising Diffusion Model for Piano Accompaniment Generation from Lead Sheet https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-d3pia-a-discrete-denoising-diffusion-model-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-d3pia-a-discrete-denoising-diffusion-model-for/ 音乐生成 | 7.5/10 DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-daien-tts-disentangled-audio-infilling-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-daien-tts-disentangled-audio-infilling-for/ 语音合成 | 8.0/10 DAMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-damo-a-data-efficient-multimodal-orchestrator-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-damo-a-data-efficient-multimodal-orchestrator-for/ 视频问答 | 7.0/10 DAT-CFTNet: Speech Enhancement for Cochlear Implant Recipients using Attention-based Dual-Path Recurrent Neural Network https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dat-cftnet-speech-enhancement-for-cochlear/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dat-cftnet-speech-enhancement-for-cochlear/ 语音增强 | 7.0/10 DBFT-SD: Weakly Supervised Multimodal Detection of Sensitive Audio-Visual Content https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dbft-sd-weakly-supervised-multimodal-detection-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dbft-sd-weakly-supervised-multimodal-detection-of/ 音频事件检测 | 8.0/10 DDSC: Dynamic Dual-Signal Curriculum for Data-Efficient Acoustic Scene Classification Under Domain Shift https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsc-dynamic-dual-signal-curriculum-for-data/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsc-dynamic-dual-signal-curriculum-for-data/ 音频场景分类 | 7.0/10 DDSR-Net: Robust Multimodal Sentiment Analysis via Dynamic Modality Reliability Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsr-net-robust-multimodal-sentiment-analysis-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsr-net-robust-multimodal-sentiment-analysis-via/ 语音情感识别 | 6.5/10 DECAF: Dynamic Envelope Context-Aware Fusion for Speech-Envelope Reconstruction from EEG https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decaf-dynamic-envelope-context-aware-fusion-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decaf-dynamic-envelope-context-aware-fusion-for/ 语音增强 | 7.0/10 Decoder-Only Conformer with Modality-Aware Sparse Mixtures of Experts for ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decoder-only-conformer-with-modality-aware-sparse/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decoder-only-conformer-with-modality-aware-sparse/ 语音识别 | 7.5/10 Decorrelation-Enhanced Multiband Subband Adaptive Filtering for RIR Tracking in Sound Field Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decorrelation-enhanced-multiband-subband-adaptive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-decorrelation-enhanced-multiband-subband-adaptive/ 空间音频 | 7.0/10 Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-dubbing-end-to-end-auto-audiobook-system/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-dubbing-end-to-end-auto-audiobook-system/ 语音合成 | 7.5/10 Deep Learning-Based Joint Optimization of Adaptive Feedback Cancellation and Residual Feedback Suppression for Hearing Aids https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-learning-based-joint-optimization-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-learning-based-joint-optimization-of/ 语音增强 | 8.0/10 Deep Spatial Clue Informed Ambisonic Encoding for Irregular Microphone Arrays https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-spatial-clue-informed-ambisonic-encoding-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deep-spatial-clue-informed-ambisonic-encoding-for/ 空间音频 | 7.0/10 Deepaq: A Perceptual Audio Quality Metric Based on Foundational Models and Weakly Supervised Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deepaq-a-perceptual-audio-quality-metric-based-on/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-deepaq-a-perceptual-audio-quality-metric-based-on/ 音频质量评估 | 7.5/10 Denoising Of Stochastic Ray Tracing Room Impulse Responses https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-denoising-of-stochastic-ray-tracing-room-impulse/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-denoising-of-stochastic-ray-tracing-room-impulse/ 空间音频 | 7.5/10 DepthTalk: Few-Shot Talking Head Generation with Depth-Aware 3D Gaussian Field Motion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-depthtalk-few-shot-talking-head-generation-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-depthtalk-few-shot-talking-head-generation-with/ 说话人生成 | 7.0/10 Detecting and Attributing Synthetic Spanish Speech: The HISPASpoof Dataset https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-detecting-and-attributing-synthetic-spanish/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-detecting-and-attributing-synthetic-spanish/ 语音伪造检测 | 7.5/10 DGSDNet: Dual-Graph Spectral Diffusion Network for Incomplete Multimodal Emotion Recognition in Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dgsdnet-dual-graph-spectral-diffusion-network-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dgsdnet-dual-graph-spectral-diffusion-network-for/ 语音情感识别 | 8.0/10 Diff-vs: Efficient Audio-Aware Diffusion U-Net for Vocals Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diff-vs-efficient-audio-aware-diffusion-u-net-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diff-vs-efficient-audio-aware-diffusion-u-net-for/ 语音分离 | 7.5/10 Diffemotalk: Audio-Driven Facial Animation with Fine-Grained Emotion Control via Diffusion Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffemotalk-audio-driven-facial-animation-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffemotalk-audio-driven-facial-animation-with/ 语音情感识别 | 7.5/10 Differentiable Grouped Feedback Delay Networks for Learning Direction and Position-Dependent Late Reverberation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-differentiable-grouped-feedback-delay-networks/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-differentiable-grouped-feedback-delay-networks/ 空间音频 | 7.5/10 Differentiable Pulsetable Synthesis for Wind Instrument Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-differentiable-pulsetable-synthesis-for-wind/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-differentiable-pulsetable-synthesis-for-wind/ 音乐生成 | 7.5/10 Diffusion Timbre Transfer via Mutual Information Guided Inpainting https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffusion-timbre-transfer-via-mutual-information/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diffusion-timbre-transfer-via-mutual-information/ 音乐生成 | 7.5/10 Direct Preference Optimization For Speech Autoregressive Diffusion Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-preference-optimization-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-preference-optimization-for-speech/ 语音合成 | 7.5/10 Direct Simultaneous Translation Activation for Large Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-simultaneous-translation-activation-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-simultaneous-translation-activation-for/ 语音翻译 | 6.0/10 Direct Transfer of Prosody in Speech-to-speech Translation using Disentangled Speech Tokens https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-transfer-of-prosody-in-speech-to-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-direct-transfer-of-prosody-in-speech-to-speech/ 语音翻译 | 7.5/10 Directly Trained Spiking Neural Networks with Adaptive Phase Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-directly-trained-spiking-neural-networks-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-directly-trained-spiking-neural-networks-with/ 音频分类 | 7.0/10 DisContSE: Single-Step Diffusion Speech Enhancement based on Joint Discrete and Continuous Embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discontse-single-step-diffusion-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discontse-single-step-diffusion-speech/ 语音增强 | 8.5/10 Discrete Diffusion for Generative Modeling of Text-Aligned Speech Tokens https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-diffusion-for-generative-modeling-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-diffusion-for-generative-modeling-of/ 语音合成 | 7.5/10 Discrete-Continuous Fusion With Adaptive Hierarchical Features For Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-continuous-fusion-with-adaptive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-continuous-fusion-with-adaptive/ 音频深度伪造检测 | 8.0/10 Disentangled Authenticity Representation for Partially Deepfake Audio Localization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-disentangled-authenticity-representation-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-disentangled-authenticity-representation-for/ 音频深度伪造检测 | 6.5/10 Disentangling Physiology from Fidelity: Latent-Guided Diffusion Models for Cross-Modal Cardiac Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-disentangling-physiology-from-fidelity-latent/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-disentangling-physiology-from-fidelity-latent/ 音频生成 | 7.5/10 Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissecting-performance-degradation-in-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissecting-performance-degradation-in-audio/ 音乐源分离 | 7.5/10 DISSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissr-disentangling-speech-representation-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissr-disentangling-speech-representation-for/ 语音增强 | 7.5/10 Distilling Attention Knowledge for Speaker Verification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-distilling-attention-knowledge-for-speaker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-distilling-attention-knowledge-for-speaker/ 说话人验证 | 8.0/10 Distributed Multichannel Active Noise Control with Asynchronous Communication https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-distributed-multichannel-active-noise-control/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-distributed-multichannel-active-noise-control/ 信号处理 | 8.0/10 DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ditse-high-fidelity-generative-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ditse-high-fidelity-generative-speech-enhancement/ 语音增强 | 8.5/10 DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ditsinger-scaling-singing-voice-synthesis-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ditsinger-scaling-singing-voice-synthesis-with/ 歌唱语音合成 | 7.0/10 Diverse and Few-Step Audio Captioning via Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diverse-and-few-step-audio-captioning-via-flow/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-diverse-and-few-step-audio-captioning-via-flow/ 音频字幕生成 | 6.5/10 DMP-TTS: Disentangled Multi-Modal Prompting for Controllable Text-to-Speech with Chained Guidance https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dmp-tts-disentangled-multi-modal-prompting-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dmp-tts-disentangled-multi-modal-prompting-for/ 语音合成 | 7.5/10 Do Bias Benchmarks Generalise? Evidence from Voice-Based Evaluation of Gender Bias in Speechllms https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-bias-benchmarks-generalise-evidence-from-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-bias-benchmarks-generalise-evidence-from-voice/ 模型评估 | 8.0/10 Do Foundational Audio Encoders Understand Music Structure? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-foundational-audio-encoders-understand-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-foundational-audio-encoders-understand-music/ 音乐信息检索 | 7.0/10 Do Speech LLMs Learn Crossmodal Embedding Spaces? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-speech-llms-learn-crossmodal-embedding-spaces/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-speech-llms-learn-crossmodal-embedding-spaces/ 音频检索 | 6.5/10 Do We Need EMA for Diffusion-Based Speech Enhancement? Toward A Magnitude-Preserving Network Architecture https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-we-need-ema-for-diffusion-based-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-we-need-ema-for-diffusion-based-speech/ 语音增强 | 7.5/10 Do we really need self-attention for streaming automatic speech recognition? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-we-really-need-self-attention-for-streaming/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-we-really-need-self-attention-for-streaming/ 语音识别 | 7.5/10 Do You Hear What I Mean? Quantifying the Instruction-Perception GAP in Instruction-Guided Expressive Text-to-Speech Systems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-you-hear-what-i-mean-quantifying-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-you-hear-what-i-mean-quantifying-the/ 语音合成 | 8.0/10 Does the Pre-Training of an Embedding Influence its Encoding of Age? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-does-the-pre-training-of-an-embedding-influence/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-does-the-pre-training-of-an-embedding-influence/ 语音生物标志物 | 7.0/10 DOMA: Leveraging Diffusion Language Models with Adaptive Prior for Intent Classification and Slot Filling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-doma-leveraging-diffusion-language-models-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-doma-leveraging-diffusion-language-models-with/ 语音对话系统 | 8.5/10 Domain Partitioning Meets Parameter-Efficient Fine-Tuning: A Novel Method for Improved Language-Queried Audio Source Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-partitioning-meets-parameter-efficient/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-partitioning-meets-parameter-efficient/ 音频分离 | 7.5/10 Domain-Aware Scheduling for ASR Fine-Tuning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-aware-scheduling-for-asr-fine-tuning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-aware-scheduling-for-asr-fine-tuning/ 语音识别 | 6.5/10 Domain-Invariant Representation Learning of Bird Sounds https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-invariant-representation-learning-of-bird/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-invariant-representation-learning-of-bird/ 生物声学 | 6.5/10 DPO-Regularized Regression for Age Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dpo-regularized-regression-for-age-prediction/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dpo-regularized-regression-for-age-prediction/ 说话人识别 | 7.5/10 DPT-Net: Dual-Path Transformer Network with Hierarchical Fusion for EEG-based Envelope Reconstruction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dpt-net-dual-path-transformer-network-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dpt-net-dual-path-transformer-network-with/ 语音生物标志物 | 7.0/10 DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dspast-disentangled-representations-for-spatial/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dspast-disentangled-representations-for-spatial/ 音频问答 | 8.0/10 DSRMS-TransUnet: A Decentralized Non-Shifted Transunet for Shallow Water Acoustic Source Range Estimation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dsrms-transunet-a-decentralized-non-shifted/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dsrms-transunet-a-decentralized-non-shifted/ 声源定位 | 8.0/10 DSSR: Decoupling Salient and Subtle Representations Under Missing Modalities for Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dssr-decoupling-salient-and-subtle/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dssr-decoupling-salient-and-subtle/ 情感识别 | 7.5/10 Dual Contrastive Learning for Semi-Supervised Domain Adaptation in Bi-Modal Depression Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-contrastive-learning-for-semi-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-contrastive-learning-for-semi-supervised/ 语音生物标志物 | 7.0/10 Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-data-scaling-for-robust-two-stage-user/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-data-scaling-for-robust-two-stage-user/ 语音活动检测 | 7.5/10 Dual-Perspective Multimodal Sentiment Analysis with MoE Fusion: Representation Learning via Semantic Resonance and Divergence https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-perspective-multimodal-sentiment-analysis/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-perspective-multimodal-sentiment-analysis/ 多模态情感分析 | 7.0/10 Dual-Strategy-Enhanced Conbimamba for Neural Speaker Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-strategy-enhanced-conbimamba-for-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dual-strategy-enhanced-conbimamba-for-neural/ 说话人分离 | 8.0/10 Dynamic Balanced Cross-Modal Attention with Gated Sequence Restoration: Towards Robust Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-balanced-cross-modal-attention-with-gated/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-balanced-cross-modal-attention-with-gated/ 跨模态 | 7.5/10 Dynamic Noise-Aware Multi Lora Framework Towards Real-World Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-noise-aware-multi-lora-framework-towards/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-noise-aware-multi-lora-framework-towards/ 音频深度伪造检测 | 8.0/10 Dynamic Spectrogram Analysis with Local-Aware Graph Networks for Audio Anti-Spoofing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-spectrogram-analysis-with-local-aware/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamic-spectrogram-analysis-with-local-aware/ 音频深度伪造检测 | 8.5/10 Dynamically Slimmable Speech Enhancement Network with Metric-Guided Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamically-slimmable-speech-enhancement-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dynamically-slimmable-speech-enhancement-network/ 语音增强 | 7.5/10 E2E-AEC: Implementing An End-To-End Neural Network Learning Approach for Acoustic Echo Cancellation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-e2e-aec-implementing-an-end-to-end-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-e2e-aec-implementing-an-end-to-end-neural-network/ 语音增强 | 7.5/10 Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-easy-turn-integrating-acoustic-and-linguistic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-easy-turn-integrating-acoustic-and-linguistic/ 语音对话系统 | 7.0/10 ECHO: Frequency-Aware Hierarchical Encoding for Variable-Length Signals https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echo-frequency-aware-hierarchical-encoding-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echo-frequency-aware-hierarchical-encoding-for/ 音频分类 | 9.5/10 EchoFake: A Replay-Aware Dataset For Practical Speech Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echofake-a-replay-aware-dataset-for-practical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echofake-a-replay-aware-dataset-for-practical/ 音频深度伪造检测 | 8.5/10 EchoRAG: A Two-Stage Framework for Audio-Text Retrieval and Temporal Grounding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echorag-a-two-stage-framework-for-audio-text/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echorag-a-two-stage-framework-for-audio-text/ 音频检索 | 7.5/10 ECSA: Dual-Branch Emotion Compensation for Emotion-Consistent Speaker Anonymization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ecsa-dual-branch-emotion-compensation-for-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ecsa-dual-branch-emotion-compensation-for-emotion/ 语音匿名化 | 8.5/10 EdgeSpot: Efficient and High-Performance Few-Shot Model for Keyword Spotting https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-edgespot-efficient-and-high-performance-few-shot/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-edgespot-efficient-and-high-performance-few-shot/ 语音活动检测 | 7.5/10 EEG and Eye-Tracking Driven Dynamic Target Speaker Extraction with Spontaneous Attention Switching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eeg-and-eye-tracking-driven-dynamic-target/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eeg-and-eye-tracking-driven-dynamic-target/ 语音分离 | 7.0/10 EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detection Using Self-Attention Attractors https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eend-saa-enrollment-less-main-speaker-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eend-saa-enrollment-less-main-speaker-voice/ 语音活动检测 | 7.5/10 Efficient Audio-Visual Inference Via Token Clustering And Modality Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-audio-visual-inference-via-token/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-audio-visual-inference-via-token/ 音频问答 | 7.5/10 Efficient Depression Detection from Speech via Language-Independent Prompt-Driven Reprogramming https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/ 语音生物标志物 | 7.5/10 Efficient Solutions for Mitigating Initialization Bias in Unsupervised Self-Adaptive Auditory Attention Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-solutions-for-mitigating-initialization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-solutions-for-mitigating-initialization/ 听觉注意解码 | 8.5/10 EMG-to-Speech with Fewer Channels https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emg-to-speech-with-fewer-channels/ 语音合成 | 7.5/10 Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emilia-nv-a-non-verbal-speech-dataset-with-word/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emilia-nv-a-non-verbal-speech-dataset-with-word/ 语音识别 | 7.5/10 Emo-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emo-tta-improving-test-time-adaptation-of-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emo-tta-improving-test-time-adaptation-of-audio/ 语音情感识别 | 7.0/10 EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emorl-tts-reinforcement-learning-for-fine-grained/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emorl-tts-reinforcement-learning-for-fine-grained/ 语音合成 | 8.5/10 EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emoshift-lightweight-activation-steering-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emoshift-lightweight-activation-steering-for/ 语音合成 | 7.0/10 Emotion-Aligned Generation in Diffusion Text to Speech Models Via Preference-Guided Optimization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotion-aligned-generation-in-diffusion-text-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotion-aligned-generation-in-diffusion-text-to/ 语音合成 | 8.0/10 Emotional Damage: Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-damage-investigating-safety/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-damage-investigating-safety/ 音频安全 | 7.5/10 Emotional Dimension Control in Language Model-Based Text-To-Speech: Spanning a Broad Spectrum of Human Emotions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotional-dimension-control-in-language-model/ 语音合成 | 7.5/10 EmoTri-RL: Emotion- and Cause-Aware Reinforcement Learning for Multi-Modal Empathetic Dialogue https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotri-rl-emotion-and-cause-aware-reinforcement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-emotri-rl-emotion-and-cause-aware-reinforcement/ 语音情感识别 | 7.0/10 Empowering Multimodal Respiratory Sound Classification with Counterfactual Adversarial Debiasing for Out-of-Distribution Robustness https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-empowering-multimodal-respiratory-sound/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-empowering-multimodal-respiratory-sound/ 音频分类 | 7.0/10 Enabling Multi-Species Bird Classification on Low-Power Bioacoustic Loggers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enabling-multi-species-bird-classification-on-low/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enabling-multi-species-bird-classification-on-low/ 生物声学 | 8.0/10 Encoding Emotion Through Self-Supervised Eye Movement Reconstruction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-encoding-emotion-through-self-supervised-eye/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-encoding-emotion-through-self-supervised-eye/ 语音情感识别 | 7.5/10 Enhanced Generative Machine Listener https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhanced-generative-machine-listener/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhanced-generative-machine-listener/ 音频分类 | 7.0/10 Enhancing Audio Question-Answering Performance Through Log-Likelihood Guided Reward Functions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-audio-question-answering-performance/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-audio-question-answering-performance/ 音频问答 | 8.5/10 Enhancing Automatic Drum Transcription with Online Dynamic Few-Shot Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-automatic-drum-transcription-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-automatic-drum-transcription-with/ 音乐信息检索 | 7.0/10 Enhancing Dialogue-Related Speech Tasks with Generated Spoken Dialogues https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-dialogue-related-speech-tasks-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-dialogue-related-speech-tasks-with/ 语音对话系统 | 6.5/10 Enhancing Noise Robustness for Neural Speech Codecs Through Resource-Efficient Progressive Quantization Perturbation Simulation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-noise-robustness-for-neural-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-noise-robustness-for-neural-speech/ 语音增强 | 7.5/10 Enhancing Speaker Verification with w2v-BERT 2.0 and Knowledge Distillation Guided Structured Pruning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speaker-verification-with-w2v-bert-20/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speaker-verification-with-w2v-bert-20/ 说话人验证 | 7.5/10 Enhancing Speech Intelligibility Prediction for Hearing Aids with Complementary Speech Foundation Model Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speech-intelligibility-prediction-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speech-intelligibility-prediction-for/ 语音增强 | 7.5/10 Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-entropy-guided-grvq-for-ultra-low-bitrate-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-entropy-guided-grvq-for-ultra-low-bitrate-neural/ 语音合成 | 7.5/10 Equipping Large Language Model with Directional Speech Understanding Capabilities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-equipping-large-language-model-with-directional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-equipping-large-language-model-with-directional/ 语音识别语音翻译 | 7.0/10 Erasing Your Voice Before it’s Heard: Training-Free Speaker Unlearning for Zero-Shot Text-to-Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-erasing-your-voice-before-its-heard-training-free/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-erasing-your-voice-before-its-heard-training-free/ 语音合成 | 7.5/10 Estimating Hand-Related Features from Speech Using Machine Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-estimating-hand-related-features-from-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-estimating-hand-related-features-from-speech/ 语音生物标志物 | 5.0/10 Estimating Respiratory Effort from Nocturnal Breathing Sounds for Obstructive Sleep Apnoea Screening https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-estimating-respiratory-effort-from-nocturnal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-estimating-respiratory-effort-from-nocturnal/ 音频分类 | 6.5/10 Etude: Piano Cover Generation with a Three-Stage Approach — Extract, Structuralize, and Decode https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-etude-piano-cover-generation-with-a-three-stage/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-etude-piano-cover-generation-with-a-three-stage/ 音乐生成 | 7.0/10 EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eulerodec-a-complex-valued-rvq-vae-for-efficient/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-eulerodec-a-complex-valued-rvq-vae-for-efficient/ 音频生成 | 8.0/10 Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-bias-in-spoken-dialogue-llms-for-real/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-bias-in-spoken-dialogue-llms-for-real/ 模型评估 | 7.0/10 Evaluating Compositional Structure in Audio Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-compositional-structure-in-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-compositional-structure-in-audio/ 模型评估 | 7.0/10 Evaluating Disentangled Representations for Controllable Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-disentangled-representations-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-disentangled-representations-for/ 音乐生成 | 7.5/10 Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-emotion-recognition-in-spoken-language/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-emotion-recognition-in-spoken-language/ 语音情感识别 | 7.5/10 Evaluating High-Resolution Piano Sustain Pedal Depth Estimation with Musically Informed Metrics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-high-resolution-piano-sustain-pedal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-high-resolution-piano-sustain-pedal/ 音乐信息检索 | 8.0/10 Evaluating Pretrained Speech Embedding Systems for Dysarthria Detection Across Heterogenous Datasets https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-pretrained-speech-embedding-systems/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-pretrained-speech-embedding-systems/ 语音生物标志物 | 7.5/10 Event Classification by Physics-Informed Inpainting for Distributed Multichannel Acoustic Sensor with Partially Degraded Channels https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-event-classification-by-physics-informed/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-event-classification-by-physics-informed/ 音频事件检测 | 8.0/10 Exploring Fine-Tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-fine-tuning-of-large-audio-language/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-fine-tuning-of-large-audio-language/ 语音理解 | 8.0/10 Exploring How Audio Effects Alter Emotion with Foundation Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-how-audio-effects-alter-emotion-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-how-audio-effects-alter-emotion-with/ 音乐理解 | 7.0/10 Exploring Resolution-Wise Shared Attention in Hybrid Mamba-U-Nets for Improved Cross-Corpus Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-resolution-wise-shared-attention-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-resolution-wise-shared-attention-in/ 语音增强 | 8.0/10 Exploring SSL Discrete Tokens for Multilingual Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-ssl-discrete-tokens-for-multilingual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-ssl-discrete-tokens-for-multilingual/ 语音识别 | 7.5/10 Expressive Voice Conversion with Controllable Emotional Intensity https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-expressive-voice-conversion-with-controllable/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-expressive-voice-conversion-with-controllable/ 语音转换 | 7.5/10 Exterior Sound Field Estimation Based on Physics-Constrained Kernel https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exterior-sound-field-estimation-based-on-physics/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exterior-sound-field-estimation-based-on-physics/ 空间音频 | 6.5/10 FAC-FACodec: Controllable Zero-Shot Foreign Accent Conversion with Factorized Speech Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fac-facodec-controllable-zero-shot-foreign-accent/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fac-facodec-controllable-zero-shot-foreign-accent/ 语音转换 | 8.0/10 Face-Voice Association with Inductive Bias for Maximum Class Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-face-voice-association-with-inductive-bias-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-face-voice-association-with-inductive-bias-for/ 说话人验证 | 7.0/10 Fake Speech Wild: Detecting Deepfake Speech on Social Media Platform https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fake-speech-wild-detecting-deepfake-speech-on/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fake-speech-wild-detecting-deepfake-speech-on/ 语音伪造检测 | 7.0/10 Fast-ULCNet: A Fast and Ultra Low Complexity Network for Single-Channel Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fast-ulcnet-a-fast-and-ultra-low-complexity/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fast-ulcnet-a-fast-and-ultra-low-complexity/ 语音增强 | 7.5/10 FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fastav-efficient-token-pruning-for-audio-visual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fastav-efficient-token-pruning-for-audio-visual/ 音频问答 | 7.0/10 FastEnhancer: Speed-Optimized Streaming Neural Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fastenhancer-speed-optimized-streaming-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fastenhancer-speed-optimized-streaming-neural/ 语音增强 | 8.5/10 FD-ARL: Feature Disentanglement with Adversarial-Reconstruction Learning for Cross-Subject Auditory Attention Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fd-arl-feature-disentanglement-with-adversarial/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fd-arl-feature-disentanglement-with-adversarial/ 听觉注意力解码 | 7.5/10 FDCNet: Frequency Domain Channel Attention and Convolution for Lipreading https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fdcnet-frequency-domain-channel-attention-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fdcnet-frequency-domain-channel-attention-and/ 视觉语音识别 | 8.5/10 FED-PISA: Federated Voice Cloning Via Personalized Identity-Style Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fed-pisa-federated-voice-cloning-via-personalized/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fed-pisa-federated-voice-cloning-via-personalized/ 语音克隆 | 8.0/10 Feedback-Driven Retrieval-Augmented Audio Generation with Large Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-feedback-driven-retrieval-augmented-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-feedback-driven-retrieval-augmented-audio/ 音频生成 | 6.5/10 Few-Shot Recognition of Audio Deepfake Generators using Graph-Based Prototype Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-few-shot-recognition-of-audio-deepfake-generators/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-few-shot-recognition-of-audio-deepfake-generators/ 音频深度伪造检测 | 7.5/10 FIDIC:Fine-Grained Conversational Emotion Recognition via Individual Differences in Inertia and Contagion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fidicfine-grained-conversational-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fidicfine-grained-conversational-emotion/ 语音情感识别 | 7.5/10 Fine-Grained Frame Modeling in Multi-Head Self-Attention for Speech Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-grained-frame-modeling-in-multi-head-self/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-grained-frame-modeling-in-multi-head-self/ 语音伪造检测 | 8.0/10 Fine-Tuning Bigvgan-V2 for Robust Musical Tuning Preservation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-bigvgan-v2-for-robust-musical-tuning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-bigvgan-v2-for-robust-musical-tuning/ 音乐生成 | 7.5/10 Fine-Tuning Large Audio-Language Models with Lora for Precise Temporal Localization of Prolonged Exposure Therapy Elements https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-large-audio-language-models-with-lora/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-large-audio-language-models-with-lora/ 音频事件检测 | 6.5/10 Fine-Tuning Large Multimodal Models for Automatic Pronunciation Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-large-multimodal-models-for-automatic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-large-multimodal-models-for-automatic/ 语音评估 | 7.0/10 FinHuBERT: Hierarchical Feature Imitating Networks for Low-Resource Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-finhubert-hierarchical-feature-imitating-networks/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-finhubert-hierarchical-feature-imitating-networks/ 语音识别 | 7.5/10 FlashFoley: Fast Interactive Sketch2audio Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flashfoley-fast-interactive-sketch2audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flashfoley-fast-interactive-sketch2audio/ 音频生成 | 7.5/10 Flexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning for Speech and Reasoning Tasks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flexi-lora-with-input-adaptive-ranks-efficient/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flexi-lora-with-input-adaptive-ranks-efficient/ 语音识别 | 7.5/10 Flexio: Flexible Single- and Multi-Channel Speech Separation and Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flexio-flexible-single-and-multi-channel-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flexio-flexible-single-and-multi-channel-speech/ 语音分离 | 8.0/10 FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flowse-grpo-training-flow-matching-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flowse-grpo-training-flow-matching-speech/ 语音增强 | 7.5/10 FOCA: Multimodal Malware Classification via Hyperbolic Cross-Attention https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-foca-multimodal-malware-classification-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-foca-multimodal-malware-classification-via/ 音频分类 | 7.5/10 FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-focalcodec-stream-streaming-low-bitrate-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-focalcodec-stream-streaming-low-bitrate-speech/ 语音编码 | 8.0/10 FODGE : High-Fidelity Dance Generation via Full-Body Optimization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fodge-high-fidelity-dance-generation-via-full/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fodge-high-fidelity-dance-generation-via-full/ 音频生成 | 6.5/10 FoleyBench: A Benchmark for Video-to-Audio Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-foleybench-a-benchmark-for-video-to-audio-models/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-foleybench-a-benchmark-for-video-to-audio-models/ 音频生成 | 7.5/10 Forward Convolutive Prediction for Frame Online Monaural Speech Dereverberation based on Kronecker Product Decomposition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-forward-convolutive-prediction-for-frame-online/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-forward-convolutive-prediction-for-frame-online/ 语音增强 | 7.5/10 Frame-Stacked Local Transformers for Efficient Multi-Codebook Speech Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frame-stacked-local-transformers-for-efficient/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frame-stacked-local-transformers-for-efficient/ 语音合成 | 7.5/10 Frequency-Independent Ambisonics Upscaling Using Deep Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frequency-independent-ambisonics-upscaling-using/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frequency-independent-ambisonics-upscaling-using/ 空间音频 | 6.5/10 From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-Modal Understanding in Multimodal LLMS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-contrast-to-commonality-audio-commonality/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-contrast-to-commonality-audio-commonality/ 音频场景理解 | 7.5/10 From Diet to Free Lunch: Estimating Auxiliary Signal Properties Using Dynamic Pruning Masks in Speech Enhancement Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-diet-to-free-lunch-estimating-auxiliary/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-diet-to-free-lunch-estimating-auxiliary/ 语音增强 | 7.5/10 From Hallucination to Articulation: Language Model-Driven Losses for Ultra Low-Bitrate Neural Speech Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/ 语音合成 | 7.5/10 From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-human-speech-to-ocean-signals-transferring/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-human-speech-to-ocean-signals-transferring/ 水下声学目标识别 | 7.0/10 Frontend Token Enhancement for Token-Based Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frontend-token-enhancement-for-token-based-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-frontend-token-enhancement-for-token-based-speech/ 语音识别 | 8.0/10 Full Band Denoising of Room Impulse Response in the Wavelet Domain with Dictionary Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-full-band-denoising-of-room-impulse-response-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-full-band-denoising-of-room-impulse-response-in/ 房间脉冲响应去噪 | 7.5/10 FUN-SSL: Full-Band Layer Followed by U-Net With Narrow-Band Layers for Multiple Moving Sound Source Localization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fun-ssl-full-band-layer-followed-by-u-net-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fun-ssl-full-band-layer-followed-by-u-net-with/ 声源定位 | 8.0/10 FUSEMOS: Perceptual Evaluation of Text-to-Music Generation with Dual-Encoder Fusion and Ranking-Aware Composite Loss https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/ 音乐生成 | 7.5/10 Fusion of Multimodal Estimations by Extended State Hidden Markov Model: Application to Fetal Heart Rate Monitoring https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusion-of-multimodal-estimations-by-extended/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusion-of-multimodal-estimations-by-extended/ 生物声学 | 7.0/10 FxSearcher: Gradient-Free Text-Driven Audio Transformation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fxsearcher-gradient-free-text-driven-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fxsearcher-gradient-free-text-driven-audio/ 音频生成 | 7.0/10 Game-Time: Evaluating Temporal Dynamics in Spoken Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-game-time-evaluating-temporal-dynamics-in-spoken/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-game-time-evaluating-temporal-dynamics-in-spoken/ 语音对话系统 | 7.5/10 Gdiffuse: Diffusion-Based Speech Enhancement with Noise Model Guidance https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gdiffuse-diffusion-based-speech-enhancement-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gdiffuse-diffusion-based-speech-enhancement-with/ 语音增强 | 7.0/10 Gelina: Unified Speech and Gesture Synthesis Via Interleaved Token Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gelina-unified-speech-and-gesture-synthesis-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gelina-unified-speech-and-gesture-synthesis-via/ 语音合成 | 7.0/10 Gen-SER: When the Generative Model Meets Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gen-ser-when-the-generative-model-meets-speech/ 语音情感识别 | 6.5/10 Generalizability of Predictive and Generative Speech Enhancement Models to Pathological Speakers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generalizability-of-predictive-and-generative/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generalizability-of-predictive-and-generative/ 语音增强 | 7.0/10 Generating Localized Audible Zones Using a Single-Channel Parametric Loudspeaker https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generating-localized-audible-zones-using-a-single/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generating-localized-audible-zones-using-a-single/ 空间音频 | 6.5/10 Generating Moving 3d Soundscapes with Latent Diffusion Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generating-moving-3d-soundscapes-with-latent/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generating-moving-3d-soundscapes-with-latent/ 空间音频 | 7.5/10 Generative Audio Extension and Morphing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generative-audio-extension-and-morphing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generative-audio-extension-and-morphing/ 音频生成 | 7.5/10 Generative UI as an Accessibility Bridge: Lessons from C2C E-Commerce https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generative-ui-as-an-accessibility-bridge-lessons/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generative-ui-as-an-accessibility-bridge-lessons/ 无障碍 | 6.5/10 GLA-GRAD++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gla-grad-an-improved-griffin-lim-guided-diffusion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gla-grad-an-improved-griffin-lim-guided-diffusion/ 语音合成 | 7.5/10 GLAP: General Contrastive Audio-Text Pretraining Across Domains and Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glap-general-contrastive-audio-text-pretraining/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glap-general-contrastive-audio-text-pretraining/ 音频检索 | 8.5/10 GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gloria-gated-low-rank-interpretable-adaptation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gloria-gated-low-rank-interpretable-adaptation/ 语音识别 | 8.0/10 GLUE: Gradient-free Learning to Unify Experts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/ 迁移学习 | 6.5/10 GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Constrative and Generative Pretraining https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gms-cavp-improving-audio-video-correspondence/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-gms-cavp-improving-audio-video-correspondence/ 音频生成 | 7.5/10 Graph-Based Emotion Consensus Perception Learning for Multimodal Emotion Recognition in Conversation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-based-emotion-consensus-perception-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-based-emotion-consensus-perception-learning/ 多模态情感识别 | 7.5/10 Graph-based Modality Alignment for Robustness in Conversational Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-based-modality-alignment-for-robustness-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-based-modality-alignment-for-robustness-in/ 语音情感识别 | 8.0/10 Graph-Biased EEG Transformers for Silent Speech Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-biased-eeg-transformers-for-silent-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-graph-biased-eeg-transformers-for-silent-speech/ 语音生物标志物 | 6.5/10 Grey-Box Prompt Tuning With Graph Alignment for Speech-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-grey-box-prompt-tuning-with-graph-alignment-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-grey-box-prompt-tuning-with-graph-alignment-for/ 语音识别 | 8.0/10 GRNet: Graph Reconstruction Network for Robust Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-grnet-graph-reconstruction-network-for-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-grnet-graph-reconstruction-network-for-robust/ 多模态情感分析 | 7.5/10 Group Relative Policy Optimization for Text-to-Speech with Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-group-relative-policy-optimization-for-text-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-group-relative-policy-optimization-for-text-to/ 语音合成 | 8.0/10 Group-Sparse Gaussian Process Regression for Inhomogeneous Sound Field Estimation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-group-sparse-gaussian-process-regression-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-group-sparse-gaussian-process-regression-for/ 声场估计 | 7.5/10 H-nnPBFDAF: Hierarchical Neural Network Partitioned Block Frequency Domain Adaptive Filter with Novel Block Activation Probability https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-h-nnpbfdaf-hierarchical-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-h-nnpbfdaf-hierarchical-neural-network/ 语音增强 | 7.5/10 Hair Noise Analysis and Mitigation for Smart Glasses Audio Captures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hair-noise-analysis-and-mitigation-for-smart/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hair-noise-analysis-and-mitigation-for-smart/ 语音增强 | 7.5/10 Hanui: Harnessing Distributional Discrepancies for Singing Voice Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hanui-harnessing-distributional-discrepancies-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hanui-harnessing-distributional-discrepancies-for/ 音频深度伪造检测 | 8.0/10 HarmoNet: Music Grounding by Short Video via Harmonic Resample and Dynamic Sparse Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-harmonet-music-grounding-by-short-video-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-harmonet-music-grounding-by-short-video-via/ 音乐检索 | 7.0/10 Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hashing-baseline-rethinking-hashing-in-the-age-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hashing-baseline-rethinking-hashing-in-the-age-of/ 音频检索音频分类 | 8.0/10 HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection with Multichannel Audio and Multiscale Visual Cues https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-havt-ivd-heterogeneity-aware-cross-modal-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-havt-ivd-heterogeneity-aware-cross-modal-network/ 音频事件检测 | 8.0/10 HCGAN: Harmonic-Coupled Generative Adversarial Network for Speech Super-Resolution in Low-Bandwidth Scenarios https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hcgan-harmonic-coupled-generative-adversarial/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hcgan-harmonic-coupled-generative-adversarial/ 语音增强 | 8.0/10 HD-PPT: Hierarchical Decoding of Content- and Prompt-Preference Tokens for Instruction-Based TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hd-ppt-hierarchical-decoding-of-content-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hd-ppt-hierarchical-decoding-of-content-and/ 语音合成 | 8.0/10 HergNet: A Fast Neural Surrogate Model for Sound Field Predictions Via Superposition of Plane Waves https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hergnet-a-fast-neural-surrogate-model-for-sound/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hergnet-a-fast-neural-surrogate-model-for-sound/ 空间音频 | 7.0/10 HFSQVAE: Hierarchical Vector Quantization with Residuals for Frequency-Specific Embedding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hfsqvae-hierarchical-vector-quantization-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hfsqvae-hierarchical-vector-quantization-with/ 音频生成 | 7.0/10 Hierarchical Activity Recognition and Captioning from Long-Form Audio https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-activity-recognition-and-captioning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-activity-recognition-and-captioning/ 音频事件检测 | 7.5/10 Hierarchical Discrete Flow Matching For Multi-Codebook Codec-Based Text-To-Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-discrete-flow-matching-for-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-discrete-flow-matching-for-multi/ 语音合成 | 7.5/10 Hierarchical Tokenization of Multimodal Music Data for Generative Music Retrieval https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-tokenization-of-multimodal-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hierarchical-tokenization-of-multimodal-music/ 音乐检索 | 7.0/10 HiFi-HARP: A High-Fidelity 7th-Order Ambisonic Room Impulse Response Dataset https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hifi-harp-a-high-fidelity-7th-order-ambisonic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hifi-harp-a-high-fidelity-7th-order-ambisonic/ 数据集 | 7.5/10 High-Fidelity Speech Enhancement Via Discrete Audio Tokens https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-high-fidelity-speech-enhancement-via-discrete/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-high-fidelity-speech-enhancement-via-discrete/ 语音增强 | 7.5/10 How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-Resource Transfer https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/ 语音识别 | 6.5/10 How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-to-label-resynthesized-audio-the-dual-role-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-to-label-resynthesized-audio-the-dual-role-of/ 音频深度伪造检测 | 7.5/10 Huí Sù: Co-constructing a Dual Feedback Apparatus https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hu-s-co-constructing-a-dual-feedback-apparatus/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hu-s-co-constructing-a-dual-feedback-apparatus/ 音乐生成 | 5.5/10 Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-human-1-by-josh-talks-a-full-duplex/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-human-1-by-josh-talks-a-full-duplex/ 语音对话系统 | 7.5/10 HVAC-EAR: Eavesdropping Human Speech Using HVAC Systems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hvac-ear-eavesdropping-human-speech-using-hvac/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hvac-ear-eavesdropping-human-speech-using-hvac/ 音频安全 | 8.5/10 Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hybrid-pruning-in-situ-compression-of-self/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hybrid-pruning-in-situ-compression-of-self/ 说话人验证 | 8.0/10 HyFlowSE: Hybrid End-To-End Flow-Matching Speech Enhancement via Generative-Discriminative Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hyflowse-hybrid-end-to-end-flow-matching-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hyflowse-hybrid-end-to-end-flow-matching-speech/ 语音增强 | 8.0/10 I-DCCRN-VAE: An Improved Deep Representation Learning Framework for Complex VAE-Based Single-Channel Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-i-dccrn-vae-an-improved-deep-representation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-i-dccrn-vae-an-improved-deep-representation/ 语音增强 | 7.5/10 IBPCodec : A Low-Bitrate Lightweight Speech Codec With Inter-Band Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ibpcodec-a-low-bitrate-lightweight-speech-codec/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ibpcodec-a-low-bitrate-lightweight-speech-codec/ 语音编码 | 7.0/10 ICASSP 2026 - 主动噪声控制论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-000/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-000/ 共 1 篇 ICASSP 2026 主动噪声控制方向论文 ICASSP 2026 - 主动降噪论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-001/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-001/ 共 1 篇 ICASSP 2026 主动降噪方向论文 ICASSP 2026 - 主题建模论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-002/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-002/ 共 1 篇 ICASSP 2026 主题建模方向论文 ICASSP 2026 - 信号处理论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-003/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-003/ 共 2 篇 ICASSP 2026 信号处理方向论文 ICASSP 2026 - 关键词检测论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-004/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-004/ 共 2 篇 ICASSP 2026 关键词检测方向论文 ICASSP 2026 - 医疗AI 论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-005/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-005/ 共 1 篇 ICASSP 2026 医疗AI 方向论文 ICASSP 2026 - 听觉注意力解码论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-006/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-006/ 共 2 篇 ICASSP 2026 听觉注意力解码方向论文 ICASSP 2026 - 听觉注意解码论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-007/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-007/ 共 1 篇 ICASSP 2026 听觉注意解码方向论文 ICASSP 2026 - 噪声控制论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-008/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-008/ 共 1 篇 ICASSP 2026 噪声控制方向论文 ICASSP 2026 - 回声消除论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-009/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-009/ 共 1 篇 ICASSP 2026 回声消除方向论文 ICASSP 2026 - 基准测试论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-010/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-010/ 共 5 篇 ICASSP 2026 基准测试方向论文 ICASSP 2026 - 基频估计论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-011/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-011/ 共 1 篇 ICASSP 2026 基频估计方向论文 ICASSP 2026 - 声场估计论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-012/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-012/ 共 1 篇 ICASSP 2026 声场估计方向论文 ICASSP 2026 - 声学建模论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-013/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-013/ 共 1 篇 ICASSP 2026 声学建模方向论文 ICASSP 2026 - 声源定位论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-014/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-014/ 共 15 篇 ICASSP 2026 声源定位方向论文 ICASSP 2026 - 多模态学习论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-015/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-015/ 共 1 篇 ICASSP 2026 多模态学习方向论文 ICASSP 2026 - 多模态对话意图识别论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-016/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-016/ 共 1 篇 ICASSP 2026 多模态对话意图识别方向论文 ICASSP 2026 - 多模态情感分析论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-017/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-017/ 共 2 篇 ICASSP 2026 多模态情感分析方向论文 ICASSP 2026 - 多模态情感识别论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-018/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-018/ 共 2 篇 ICASSP 2026 多模态情感识别方向论文 ICASSP 2026 - 多模态模型论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-019/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-019/ 共 6 篇 ICASSP 2026 多模态模型方向论文 ICASSP 2026 - 多通道论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-020/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-020/ 共 1 篇 ICASSP 2026 多通道方向论文 ICASSP 2026 - 多音高估计 #音符跟踪论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-021/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-021/ 共 1 篇 ICASSP 2026 多音高估计 #音符跟踪方向论文 ICASSP 2026 - 实体消歧论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-022/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-022/ 共 1 篇 ICASSP 2026 实体消歧方向论文 ICASSP 2026 - 实时处理论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-023/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-023/ 共 1 篇 ICASSP 2026 实时处理方向论文 ICASSP 2026 - 对抗样本论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-024/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-024/ 共 1 篇 ICASSP 2026 对抗样本方向论文 ICASSP 2026 - 异常声音检测论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-025/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-025/ 共 1 篇 ICASSP 2026 异常声音检测方向论文 ICASSP 2026 - 情感分析论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-026/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-026/ 共 3 篇 ICASSP 2026 情感分析方向论文 ICASSP 2026 - 情感识别论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-027/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-027/ 共 2 篇 ICASSP 2026 情感识别方向论文 ICASSP 2026 - 房间脉冲响应论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-028/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-028/ 共 1 篇 ICASSP 2026 房间脉冲响应方向论文 ICASSP 2026 - 房间脉冲响应去噪论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-029/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-029/ 共 1 篇 ICASSP 2026 房间脉冲响应去噪方向论文 ICASSP 2026 - 数据集论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-030/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-030/ 共 3 篇 ICASSP 2026 数据集方向论文 ICASSP 2026 - 数据集对齐论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-031/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-031/ 共 1 篇 ICASSP 2026 数据集对齐方向论文 ICASSP 2026 - 槽填充论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-032/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-032/ 共 1 篇 ICASSP 2026 槽填充方向论文 ICASSP 2026 - 模型评估论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-033/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-033/ 共 16 篇 ICASSP 2026 模型评估方向论文 ICASSP 2026 - 歌唱旋律提取论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-034/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-034/ 共 1 篇 ICASSP 2026 歌唱旋律提取方向论文 ICASSP 2026 - 歌唱语音合成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-035/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-035/ 共 5 篇 ICASSP 2026 歌唱语音合成方向论文 ICASSP 2026 - 歌唱语音转录论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-036/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-036/ 共 1 篇 ICASSP 2026 歌唱语音转录方向论文 ICASSP 2026 - 歌唱语音转换论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-037/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-037/ 共 3 篇 ICASSP 2026 歌唱语音转换方向论文 ICASSP 2026 - 水下声学目标识别论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-038/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-038/ 共 2 篇 ICASSP 2026 水下声学目标识别方向论文 ICASSP 2026 - 生物声学论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-039/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-039/ 共 12 篇 ICASSP 2026 生物声学方向论文 ICASSP 2026 - 目标说话人提取论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-040/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-040/ 共 1 篇 ICASSP 2026 目标说话人提取方向论文 ICASSP 2026 - 神经解码论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-041/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-041/ 共 1 篇 ICASSP 2026 神经解码方向论文 ICASSP 2026 - 空间音频论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-042/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-042/ 共 31 篇 ICASSP 2026 空间音频方向论文 ICASSP 2026 - 联邦学习论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-043/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-043/ 共 1 篇 ICASSP 2026 联邦学习方向论文 ICASSP 2026 - 脑信号编码论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-044/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-044/ 共 1 篇 ICASSP 2026 脑信号编码方向论文 ICASSP 2026 - 脑机接口论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-045/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-045/ 共 1 篇 ICASSP 2026 脑机接口方向论文 ICASSP 2026 - 舞蹈生成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-046/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-046/ 共 1 篇 ICASSP 2026 舞蹈生成方向论文 ICASSP 2026 - 视觉语音识别论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-047/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-047/ 共 2 篇 ICASSP 2026 视觉语音识别方向论文 ICASSP 2026 - 视频到音频生成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-048/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-048/ 共 1 篇 ICASSP 2026 视频到音频生成方向论文 ICASSP 2026 - 视频检索论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-049/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-049/ 共 1 篇 ICASSP 2026 视频检索方向论文 ICASSP 2026 - 视频片段检索论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-050/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-050/ 共 1 篇 ICASSP 2026 视频片段检索方向论文 ICASSP 2026 - 视频理解论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-051/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-051/ 共 1 篇 ICASSP 2026 视频理解方向论文 ICASSP 2026 - 视频生成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-052/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-052/ 共 2 篇 ICASSP 2026 视频生成方向论文 ICASSP 2026 - 视频设备识别论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-053/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-053/ 共 1 篇 ICASSP 2026 视频设备识别方向论文 ICASSP 2026 - 视频问答论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-054/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-054/ 共 1 篇 ICASSP 2026 视频问答方向论文 ICASSP 2026 - 视频高光检测论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-055/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-055/ 共 1 篇 ICASSP 2026 视频高光检测方向论文 ICASSP 2026 - 语音伪造检测论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-056/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-056/ 共 8 篇 ICASSP 2026 语音伪造检测方向论文 ICASSP 2026 - 语音克隆论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-057/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-057/ 共 4 篇 ICASSP 2026 语音克隆方向论文 ICASSP 2026 - 语音分离论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-058/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-058/ 共 25 篇 ICASSP 2026 语音分离方向论文 ICASSP 2026 - 语音匿名化论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-059/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-059/ 共 10 篇 ICASSP 2026 语音匿名化方向论文 ICASSP 2026 - 语音发现论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-060/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-060/ 共 1 篇 ICASSP 2026 语音发现方向论文 ICASSP 2026 - 语音合成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-061/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-061/ 共 63 篇 ICASSP 2026 语音合成方向论文 ICASSP 2026 - 语音增强 #对抗防御论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-063/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-063/ 共 1 篇 ICASSP 2026 语音增强 #对抗防御方向论文 ICASSP 2026 - 语音增强论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-062/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-062/ 共 75 篇 ICASSP 2026 语音增强方向论文 ICASSP 2026 - 语音大模型论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-064/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-064/ 共 3 篇 ICASSP 2026 语音大模型方向论文 ICASSP 2026 - 语音对话系统论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-065/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-065/ 共 10 篇 ICASSP 2026 语音对话系统方向论文 ICASSP 2026 - 语音情感识别论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-066/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-066/ 共 49 篇 ICASSP 2026 语音情感识别方向论文 ICASSP 2026 - 语音摘要论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-067/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-067/ 共 1 篇 ICASSP 2026 语音摘要方向论文 ICASSP 2026 - 语音活动检测论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-068/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-068/ 共 5 篇 ICASSP 2026 语音活动检测方向论文 ICASSP 2026 - 语音理解论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-069/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-069/ 共 2 篇 ICASSP 2026 语音理解方向论文 ICASSP 2026 - 语音生成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-070/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-070/ 共 1 篇 ICASSP 2026 语音生成方向论文 ICASSP 2026 - 语音生物标志物论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-071/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-071/ 共 24 篇 ICASSP 2026 语音生物标志物方向论文 ICASSP 2026 - 语音编码论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-072/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-072/ 共 5 篇 ICASSP 2026 语音编码方向论文 ICASSP 2026 - 语音编码器论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-073/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-073/ 共 1 篇 ICASSP 2026 语音编码器方向论文 ICASSP 2026 - 语音翻译论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-074/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-074/ 共 8 篇 ICASSP 2026 语音翻译方向论文 ICASSP 2026 - 语音表示学习论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-075/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-075/ 共 1 篇 ICASSP 2026 语音表示学习方向论文 ICASSP 2026 - 语音解码论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-076/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-076/ 共 1 篇 ICASSP 2026 语音解码方向论文 ICASSP 2026 - 语音评估论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-077/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-077/ 共 5 篇 ICASSP 2026 语音评估方向论文 ICASSP 2026 - 语音识别 #语音合成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-079/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-079/ 共 1 篇 ICASSP 2026 语音识别 #语音合成方向论文 ICASSP 2026 - 语音识别 #语音翻译论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-080/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-080/ 共 3 篇 ICASSP 2026 语音识别 #语音翻译方向论文 ICASSP 2026 - 语音识别论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-078/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-078/ 共 102 篇 ICASSP 2026 语音识别方向论文 ICASSP 2026 - 语音质量评估论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-081/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-081/ 共 8 篇 ICASSP 2026 语音质量评估方向论文 ICASSP 2026 - 语音转换 #语音增强论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-083/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-083/ 共 1 篇 ICASSP 2026 语音转换 #语音增强方向论文 ICASSP 2026 - 语音转换论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-082/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-082/ 共 9 篇 ICASSP 2026 语音转换方向论文 ICASSP 2026 - 语音问答论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-084/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-084/ 共 3 篇 ICASSP 2026 语音问答方向论文 ICASSP 2026 - 语音驱动动作生成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-085/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-085/ 共 1 篇 ICASSP 2026 语音驱动动作生成方向论文 ICASSP 2026 - 说话人分离论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-086/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-086/ 共 9 篇 ICASSP 2026 说话人分离方向论文 ICASSP 2026 - 说话人合成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-087/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-087/ 共 1 篇 ICASSP 2026 说话人合成方向论文 ICASSP 2026 - 说话人日志 #语音分离论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-089/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-089/ 共 1 篇 ICASSP 2026 说话人日志 #语音分离方向论文 ICASSP 2026 - 说话人日志论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-088/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-088/ 共 2 篇 ICASSP 2026 说话人日志方向论文 ICASSP 2026 - 说话人检测论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-090/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-090/ 共 1 篇 ICASSP 2026 说话人检测方向论文 ICASSP 2026 - 说话人生成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-091/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-091/ 共 1 篇 ICASSP 2026 说话人生成方向论文 ICASSP 2026 - 说话人脸生成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-092/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-092/ 共 1 篇 ICASSP 2026 说话人脸生成方向论文 ICASSP 2026 - 说话人识别论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-093/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-093/ 共 1 篇 ICASSP 2026 说话人识别方向论文 ICASSP 2026 - 说话人验证论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-094/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-094/ 共 10 篇 ICASSP 2026 说话人验证方向论文 ICASSP 2026 - 课堂阶段分割论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-095/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-095/ 共 1 篇 ICASSP 2026 课堂阶段分割方向论文 ICASSP 2026 - 跨模态论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-096/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-096/ 共 2 篇 ICASSP 2026 跨模态方向论文 ICASSP 2026 - 跨模态检索论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-097/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-097/ 共 2 篇 ICASSP 2026 跨模态检索方向论文 ICASSP 2026 - 轻度认知障碍检测论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-098/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-098/ 共 1 篇 ICASSP 2026 轻度认知障碍检测方向论文 ICASSP 2026 - 迁移学习论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-099/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-099/ 共 1 篇 ICASSP 2026 迁移学习方向论文 ICASSP 2026 - 零样本关键词检测论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-100/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-100/ 共 1 篇 ICASSP 2026 零样本关键词检测方向论文 ICASSP 2026 - 音乐信息检索论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-101/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-101/ 共 26 篇 ICASSP 2026 音乐信息检索方向论文 ICASSP 2026 - 音乐分离论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-102/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-102/ 共 1 篇 ICASSP 2026 音乐分离方向论文 ICASSP 2026 - 音乐分类论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-103/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-103/ 共 1 篇 ICASSP 2026 音乐分类方向论文 ICASSP 2026 - 音乐推荐论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-104/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-104/ 共 1 篇 ICASSP 2026 音乐推荐方向论文 ICASSP 2026 - 音乐检索论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-105/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-105/ 共 3 篇 ICASSP 2026 音乐检索方向论文 ICASSP 2026 - 音乐混合论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-106/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-106/ 共 1 篇 ICASSP 2026 音乐混合方向论文 ICASSP 2026 - 音乐源分离论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-107/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-107/ 共 2 篇 ICASSP 2026 音乐源分离方向论文 ICASSP 2026 - 音乐源提取论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-108/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-108/ 共 1 篇 ICASSP 2026 音乐源提取方向论文 ICASSP 2026 - 音乐理解论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-109/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-109/ 共 11 篇 ICASSP 2026 音乐理解方向论文 ICASSP 2026 - 音乐生成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-110/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-110/ 共 31 篇 ICASSP 2026 音乐生成方向论文 ICASSP 2026 - 音乐转录论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-111/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-111/ 共 1 篇 ICASSP 2026 音乐转录方向论文 ICASSP 2026 - 音视频论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-112/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-112/ 共 6 篇 ICASSP 2026 音视频方向论文 ICASSP 2026 - 音视频实例分割论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-113/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-113/ 共 1 篇 ICASSP 2026 音视频实例分割方向论文 ICASSP 2026 - 音频事件检测论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-114/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-114/ 共 21 篇 ICASSP 2026 音频事件检测方向论文 ICASSP 2026 - 音频信号处理论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-115/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-115/ 共 1 篇 ICASSP 2026 音频信号处理方向论文 ICASSP 2026 - 音频分离论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-116/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-116/ 共 1 篇 ICASSP 2026 音频分离方向论文 ICASSP 2026 - 音频分类 #零样本学习论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-118/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-118/ 共 1 篇 ICASSP 2026 音频分类 #零样本学习方向论文 ICASSP 2026 - 音频分类论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-117/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-117/ 共 39 篇 ICASSP 2026 音频分类方向论文 ICASSP 2026 - 音频压缩论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-119/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-119/ 共 2 篇 ICASSP 2026 音频压缩方向论文 ICASSP 2026 - 音频场景分类论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-120/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-120/ 共 1 篇 ICASSP 2026 音频场景分类方向论文 ICASSP 2026 - 音频场景理解论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-121/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-121/ 共 3 篇 ICASSP 2026 音频场景理解方向论文 ICASSP 2026 - 音频增强论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-122/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-122/ 共 3 篇 ICASSP 2026 音频增强方向论文 ICASSP 2026 - 音频大模型论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-123/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-123/ 共 1 篇 ICASSP 2026 音频大模型方向论文 ICASSP 2026 - 音频字幕生成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-124/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-124/ 共 1 篇 ICASSP 2026 音频字幕生成方向论文 ICASSP 2026 - 音频安全论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-125/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-125/ 共 11 篇 ICASSP 2026 音频安全方向论文 ICASSP 2026 - 音频描述论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-126/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-126/ 共 1 篇 ICASSP 2026 音频描述方向论文 ICASSP 2026 - 音频效果估计论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-127/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-127/ 共 1 篇 ICASSP 2026 音频效果估计方向论文 ICASSP 2026 - 音频无损编码论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-128/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-128/ 共 1 篇 ICASSP 2026 音频无损编码方向论文 ICASSP 2026 - 音频检索 #音频分类论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-130/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-130/ 共 1 篇 ICASSP 2026 音频检索 #音频分类方向论文 ICASSP 2026 - 音频检索论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-129/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-129/ 共 11 篇 ICASSP 2026 音频检索方向论文 ICASSP 2026 - 音频水印论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-131/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-131/ 共 1 篇 ICASSP 2026 音频水印方向论文 ICASSP 2026 - 音频深度伪造检测论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-132/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-132/ 共 29 篇 ICASSP 2026 音频深度伪造检测方向论文 ICASSP 2026 - 音频生成论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-133/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-133/ 共 39 篇 ICASSP 2026 音频生成方向论文 ICASSP 2026 - 音频编辑论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-134/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-134/ 共 1 篇 ICASSP 2026 音频编辑方向论文 ICASSP 2026 - 音频质量评估论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-135/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-135/ 共 1 篇 ICASSP 2026 音频质量评估方向论文 ICASSP 2026 - 音频超分辨率论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-136/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-136/ 共 1 篇 ICASSP 2026 音频超分辨率方向论文 ICASSP 2026 - 音频问答论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-137/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-137/ 共 15 篇 ICASSP 2026 音频问答方向论文 ICASSP 2026 - 预训练论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-138/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-138/ 共 1 篇 ICASSP 2026 预训练方向论文 ICASSP 2026 - 领域适应论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-139/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-139/ 共 2 篇 ICASSP 2026 领域适应方向论文 ICASSP 2026 语音/音频论文详细分析 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-summary/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-summary/ 共分析 898 篇 ICASSP 2026 论文 Identifying Birdsong Syllables without Labelled Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identifying-birdsong-syllables-without-labelled/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identifying-birdsong-syllables-without-labelled/ 生物声学 | 7.0/10 Identifying the Minimal and Maximal Phonetic Subspace of Speech Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identifying-the-minimal-and-maximal-phonetic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identifying-the-minimal-and-maximal-phonetic/ 语音识别 | 8.0/10 Identity Leakage Through Accent Cues in Voice Anonymisation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identity-leakage-through-accent-cues-in-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identity-leakage-through-accent-cues-in-voice/ 语音匿名化 | 7.0/10 Impact of Phonetics on Speaker Identity in Adversarial Voice Attack https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-impact-of-phonetics-on-speaker-identity-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-impact-of-phonetics-on-speaker-identity-in/ 说话人验证 | 7.0/10 Improving Active Learning for Melody Estimation by Disentangling Uncertainties https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-active-learning-for-melody-estimation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-active-learning-for-melody-estimation/ 音乐信息检索 | 7.5/10 Improving Anomalous Sound Detection with Attribute-Aware Representation from Domain-Adaptive Pre-Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-anomalous-sound-detection-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-anomalous-sound-detection-with/ 音频事件检测 | 8.0/10 Improving Audio Event Recognition with Consistency Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-event-recognition-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-event-recognition-with/ 音频事件检测 | 7.0/10 Improving Audio Question Answering with Variational Inference https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-question-answering-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-question-answering-with/ 音频问答 | 7.5/10 Improving Automatic Speech Recognition by Mitigating Distortions Introduced by Speech Enhancement Under Drone Noise https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-automatic-speech-recognition-by/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-automatic-speech-recognition-by/ 语音识别 | 6.5/10 Improving Binaural Distance Estimation in Reverberant Rooms Through Contrastive And Multi-Task Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-binaural-distance-estimation-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-binaural-distance-estimation-in/ 声源定位 | 7.0/10 Improving Contextual Asr Via Multi-Grained Fusion With Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-contextual-asr-via-multi-grained-fusion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-contextual-asr-via-multi-grained-fusion/ 语音识别 | 8.5/10 Improving Interpretability in Generative Multitimbral DDSP Frameworks via Semantically-Disentangled Musical Attributes https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-interpretability-in-generative/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-interpretability-in-generative/ 音频生成 | 7.5/10 Improving Multimodal Brain Encoding Model with Dynamic Subject-Awareness Routing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-multimodal-brain-encoding-model-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-multimodal-brain-encoding-model-with/ 脑信号编码 | 8.0/10 Improving the Speaker Anonymization Evaluation’s Robustness to Target Speakers with Adversarial Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-the-speaker-anonymization-evaluations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-the-speaker-anonymization-evaluations/ 语音匿名化 | 7.5/10 In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word level timestamp predictions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-in-sync-adaptation-of-speech-aware-large-language/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-in-sync-adaptation-of-speech-aware-large-language/ 语音识别 | 7.0/10 InconVAD: A Two-Stage Dual-Tower Framework for Multimodal Emotion Inconsistency Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inconvad-a-two-stage-dual-tower-framework-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inconvad-a-two-stage-dual-tower-framework-for/ 语音情感识别 | 7.5/10 Incremental Learning for Audio Classification with Hebbian Deep Neural Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-incremental-learning-for-audio-classification/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-incremental-learning-for-audio-classification/ 音频分类 | 7.5/10 Independent-Component-Based Encoding Models of Brain Activity During Story Comprehension https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-independent-component-based-encoding-models-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-independent-component-based-encoding-models-of/ 神经编码 | 7.5/10 Individualize the HRTF Neural Field Using Anthropometric Parameters Weighted by Direction-Attention https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-individualize-the-hrtf-neural-field-using/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-individualize-the-hrtf-neural-field-using/ 空间音频 | 7.0/10 Influence of Clean Speech Characteristics on Speech Enhancement Performance https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-influence-of-clean-speech-characteristics-on/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-influence-of-clean-speech-characteristics-on/ 语音增强 | 8.0/10 Influence-Aware Curation and Active Selection for Industrial and Surveillance Sound Events https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-influence-aware-curation-and-active-selection-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-influence-aware-curation-and-active-selection-for/ 音频事件检测 | 7.0/10 Input-Adaptive Differentiable Filterbanks via Hypernetworks for Robust Speech Processing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-input-adaptive-differentiable-filterbanks-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-input-adaptive-differentiable-filterbanks-via/ 语音识别 | 7.5/10 InstructAudio: Unified Speech and Music Generation with Natural Language Instruction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instructaudio-unified-speech-and-music-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instructaudio-unified-speech-and-music-generation/ 语音合成 | 7.5/10 Instrument Generation Through Distributional Flow Matching and Test-Time Search https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instrument-generation-through-distributional-flow/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-instrument-generation-through-distributional-flow/ 音乐生成 | 7.0/10 Int-MeanFlow: Few-Step Speech Generation with Integral Velocity Distillation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-int-meanflow-few-step-speech-generation-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-int-meanflow-few-step-speech-generation-with/ 语音合成 | 7.5/10 Integrating Speaker Embeddings and LLM-Derived Semantic Representations for Streaming Speaker Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-integrating-speaker-embeddings-and-llm-derived/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-integrating-speaker-embeddings-and-llm-derived/ 说话人分离 | 6.5/10 Inter-Dialog Contrastive Learning for Multimodal Emotion Recognition in Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inter-dialog-contrastive-learning-for-multimodal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inter-dialog-contrastive-learning-for-multimodal/ 语音情感识别 | 7.5/10 Interpretable Music Harmonic Analysis Through Multilinear Mixture of Experts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-interpretable-music-harmonic-analysis-through/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-interpretable-music-harmonic-analysis-through/ 音乐理解 | 7.5/10 Interval-Aware Retrieval Framework For Speech-Based Automatic Alzheimer’s Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-interval-aware-retrieval-framework-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-interval-aware-retrieval-framework-for-speech/ 语音生物标志物 | 8.5/10 Inverse-Hessian Regularization for Continual Learning in ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inverse-hessian-regularization-for-continual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-inverse-hessian-regularization-for-continual/ 语音识别 | 7.5/10 Investigating Modality Contribution in Audio LLMs for Music https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-investigating-modality-contribution-in-audio-llms/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-investigating-modality-contribution-in-audio-llms/ 模型评估 | 6.5/10 Investigating The Effect Of Sentence-Level Syntactic Structure On Information Loss In The Human Auditory System https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-investigating-the-effect-of-sentence-level/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-investigating-the-effect-of-sentence-level/ 语音识别 | 7.0/10 Is Phase Really Needed for Weakly-Supervised Dereverberation? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-is-phase-really-needed-for-weakly-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-is-phase-really-needed-for-weakly-supervised/ 语音增强 | 6.0/10 It Is Personal: The Importance of Personalization for Recognizing Self-Reported Emotion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-it-is-personal-the-importance-of-personalization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-it-is-personal-the-importance-of-personalization/ 语音情感识别 | 8.0/10 Joint Autoregressive Modeling of Multi-Talker Overlapped Speech Recognition and Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-autoregressive-modeling-of-multi-talker/ 语音识别语音翻译 | 7.0/10 Joint Deep Secondary Path Estimation and Adaptive Control for Active Noise Cancellation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-deep-secondary-path-estimation-and-adaptive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-deep-secondary-path-estimation-and-adaptive/ 语音增强 | 7.5/10 Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-Task Multi-Scale Network https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-piano-dynamics-and-metrical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-piano-dynamics-and-metrical/ 音乐理解 | 7.5/10 Joint Estimation of Primary and Secondary Paths for Personalized Hearable Applications https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-primary-and-secondary-paths/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-primary-and-secondary-paths/ 主动降噪 | 7.5/10 Joint Multichannel Acoustic Feedback Cancellation and Speaker Extraction via Kalman Filter and Deep Non-Linear Spatial Filter https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-multichannel-acoustic-feedback-cancellation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-multichannel-acoustic-feedback-cancellation/ 语音增强 | 7.0/10 K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-k-function-joint-pronunciation-transcription-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-k-function-joint-pronunciation-transcription-and/ 语音识别 | 7.5/10 KAN We Make Models Simpler for Audio Deepfake Detection with Kolmogorov–Arnold Networks? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-kan-we-make-models-simpler-for-audio-deepfake/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-kan-we-make-models-simpler-for-audio-deepfake/ 音频深度伪造检测 | 7.5/10 Keeping Models Listening: Segment- and time-aware attention rescaling at decoding time https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-keeping-models-listening-segment-and-time-aware/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-keeping-models-listening-segment-and-time-aware/ 音频问答 | 7.5/10 Korean aegyo speech shows systematic F1 increase to signal childlike qualities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-korean-aegyo-speech-shows-systematic-f1-increase/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-korean-aegyo-speech-shows-systematic-f1-increase/ 语音情感识别 | 6.0/10 KSDIFF: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ksdiff-keyframe-augmented-speech-aware-dual-path/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ksdiff-keyframe-augmented-speech-aware-dual-path/ 音频生成 | 7.5/10 LAFUFU: Latent Acoustic Features For Ultra-Fast Utterance Restoration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lafufu-latent-acoustic-features-for-ultra-fast/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lafufu-latent-acoustic-features-for-ultra-fast/ 语音增强 | 8.0/10 LAMB: LLM-Based Audio Captioning with Modality Gap Bridging Via Cauchy-Schwarz Divergence https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lamb-llm-based-audio-captioning-with-modality-gap/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lamb-llm-based-audio-captioning-with-modality-gap/ 音频描述 | 7.0/10 Language-Infused Retrieval-Augmented CTC with Adaptive Soft-Hard Gating for Robust Code-Switching ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-language-infused-retrieval-augmented-ctc-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-language-infused-retrieval-augmented-ctc-with/ 语音识别 | 8.0/10 Lattice-Guided Consistency Regularization of Dual-Mode Transducers for Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lattice-guided-consistency-regularization-of-dual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lattice-guided-consistency-regularization-of-dual/ 语音识别 | 8.0/10 Learnable Mel-Frontend for Robust Underwater Acoustic Target Detection under Non-Target Interference https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learnable-mel-frontend-for-robust-underwater/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learnable-mel-frontend-for-robust-underwater/ 音频分类 | 6.5/10 Learning Domain-Robust Bioacoustic Representations for Mosquito Species Classification with Contrastive Learning and Distribution Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-domain-robust-bioacoustic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-domain-robust-bioacoustic/ 生物声学 | 7.5/10 Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-linearity-in-audio-consistency/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-linearity-in-audio-consistency/ 音频生成 | 7.5/10 Learning Piezoelectric Hysteresis in In-Ear MEMS Loudspeakers from Acoustic Measurements https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-piezoelectric-hysteresis-in-in-ear-mems/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-piezoelectric-hysteresis-in-in-ear-mems/ 音频信号处理 | 7.0/10 Learning to Align with Unbalanced Optimal Transport in Linguistic Knowledge Transfer for ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-to-align-with-unbalanced-optimal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-to-align-with-unbalanced-optimal/ 语音识别 | 6.5/10 Learning Vocal-Tract Area And Radiation With A Physics-Informed Webster Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-vocal-tract-area-and-radiation-with-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-vocal-tract-area-and-radiation-with-a/ 歌唱语音合成 | 7.0/10 Learning What to Hear: Boosting Sound-Source Association for Robust Audiovisual Instance Segmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-what-to-hear-boosting-sound-source/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-what-to-hear-boosting-sound-source/ 音视频实例分割 | 7.5/10 LenslessMic: Audio Encryption and Authentication via Lensless Computational Imaging https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lenslessmic-audio-encryption-and-authentication/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lenslessmic-audio-encryption-and-authentication/ 音频安全 | 7.5/10 LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-less-large-language-model-enhanced-semi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-less-large-language-model-enhanced-semi/ 语音识别语音翻译 | 7.5/10 LETPAV: Lexicon-Enhanced Text with Progressive Audio-Visual Fusion for Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-letpav-lexicon-enhanced-text-with-progressive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-letpav-lexicon-enhanced-text-with-progressive/ 语音情感识别 | 7.5/10 Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-audio-visual-data-to-reduce-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-audio-visual-data-to-reduce-the/ 语音识别 | 6.0/10 Leveraging Diffusion U-Net Features for Predominant Instrument Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-diffusion-u-net-features-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-diffusion-u-net-features-for/ 音乐信息检索 | 8.0/10 Leveraging Large Multimodal Models for Audio-Video Deepfake Detection: A Pilot Study https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-large-multimodal-models-for-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-large-multimodal-models-for-audio/ 音频深度伪造检测 | 7.0/10 Leveraging Large Speech Language Models as Evaluators for Expressive Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-large-speech-language-models-as/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-large-speech-language-models-as/ 语音情感识别 | 6.5/10 Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-multiple-speech-enhancers-for-non/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-multiple-speech-enhancers-for-non/ 模型评估 | 7.5/10 Leveraging prediction entropy for Automatic prompt weighting in Zero-Shot Audio-Language Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-prediction-entropy-for-automatic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-prediction-entropy-for-automatic/ 音频分类 | 7.5/10 Leveraging Segment-Level Speech Representations for LLM-Based Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-segment-level-speech-representations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-segment-level-speech-representations/ 语音识别 | 7.0/10 Leveraging Text-to-Speech and Voice Conversion as Data Augmentation for Alzheimer's Disease Detection from Spontaneous Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-text-to-speech-and-voice-conversion-as/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-text-to-speech-and-voice-conversion-as/ 语音生物标志物 | 7.0/10 Leveraging Whisper Embeddings For Audio-Based Lyrics Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-whisper-embeddings-for-audio-based/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-whisper-embeddings-for-audio-based/ 音乐信息检索 | 7.0/10 Lightweight and Generalizable Acoustic Scene Representations Via Contrastive Fine-Tuning and Distillation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-generalizable-acoustic-scene/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-generalizable-acoustic-scene/ 音频场景理解 | 8.0/10 Lightweight and Perceptually-Guided Voice Conversion for Electro-Laryngeal Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-perceptually-guided-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-perceptually-guided-voice/ 语音转换 | 7.5/10 Lightweight Implicit Neural Network for Binaural Audio Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-implicit-neural-network-for-binaural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-implicit-neural-network-for-binaural/ 空间音频 | 7.0/10 Lightweight Phoneme-Conditioned Bandwidth Extension for Body-Conducted Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-phoneme-conditioned-bandwidth/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-phoneme-conditioned-bandwidth/ 语音增强 | 7.5/10 Lingometer: On-Device Personal Speech Word Counting System https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/ 语音活动检测 | 8.0/10 Linguard: Authenticating Speech Recordings Using Speech Recognition and Watermark https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-linguard-authenticating-speech-recordings-using/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-linguard-authenticating-speech-recordings-using/ 音频安全 | 6.5/10 LipsAM: Lipschitz-Continuous Amplitude Modifier for Audio Signal Processing and its Application to Plug-And-Play Dereverberation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lipsam-lipschitz-continuous-amplitude-modifier/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lipsam-lipschitz-continuous-amplitude-modifier/ 语音增强 | 7.5/10 Lisa: Lightweight Yet Superb Neural Speech Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lisa-lightweight-yet-superb-neural-speech-coding/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lisa-lightweight-yet-superb-neural-speech-coding/ 语音编码 | 8.5/10 Listen, But Don't Leak: Sensitive Data Protection for Privacy Aware Automatic Speech Recognition with Acoustic Triggers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-listen-but-dont-leak-sensitive-data-protection/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-listen-but-dont-leak-sensitive-data-protection/ 语音识别 | 7.5/10 LLAC: Learned Lossless Audio Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llac-learned-lossless-audio-codec/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llac-learned-lossless-audio-codec/ 音频无损编码 | 7.5/10 LLM-Based Post-ASR Error Correction for Disordered Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llm-based-post-asr-error-correction-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llm-based-post-asr-error-correction-for/ 语音识别 | 7.5/10 Localizing Speech Deepfakes Beyond Transitions via Segment-Aware Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-localizing-speech-deepfakes-beyond-transitions/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-localizing-speech-deepfakes-beyond-transitions/ 音频深度伪造检测 | 8.0/10 LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-longspeech-a-scalable-benchmark-for-transcription/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-longspeech-a-scalable-benchmark-for-transcription/ 基准测试 | 7.8/10 Look, Listen and Segment: Towards Weakly Supervised Audio-Visual Semantic Segmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-look-listen-and-segment-towards-weakly-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-look-listen-and-segment-towards-weakly-supervised/ 音视频 | 7.0/10 Loose Coupling of Spectral and Spatial Models for Multi-Channel Diarization and Enhancement of Meetings in Dynamic Environments https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-loose-coupling-of-spectral-and-spatial-models-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-loose-coupling-of-spectral-and-spatial-models-for/ 说话人日志语音分离 | 7.2/10 LOTUSDIS: A Thai Far-Field Meeting Corpus for Robust Conversational ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lotusdis-a-thai-far-field-meeting-corpus-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lotusdis-a-thai-far-field-meeting-corpus-for/ 语音识别 | 7.5/10 Low-Bandwidth High-Fidelity Speech Transmission with Generative Latent Joint Source-Channel Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-bandwidth-high-fidelity-speech-transmission/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-bandwidth-high-fidelity-speech-transmission/ 语音增强 | 7.5/10 Low-Frequency Harmonic Control for Speech Intelligibility in Open-Ear Headphones https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-frequency-harmonic-control-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-frequency-harmonic-control-for-speech/ 语音增强 | 6.5/10 Low-Latency Audio Front-End Region-of-Interest Beamforming for Smart Glasses https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-latency-audio-front-end-region-of-interest/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-latency-audio-front-end-region-of-interest/ 语音增强 | 7.0/10 Low-Resource Guidance for Controllable Latent Audio Diffusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-guidance-for-controllable-latent/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-guidance-for-controllable-latent/ 音乐生成 | 8.5/10 Low-Resource Speech-Based Early Alzheimers Detection via Cross-Lingual and Few-Shot Transfer Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/ 语音生物标志物 | 7.5/10 LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/ 语音合成 | 7.0/10 MAG: Multi-Modal Aligned Autoregressive Co-Speech Gesture Generation Without Vector Quantization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mag-multi-modal-aligned-autoregressive-co-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mag-multi-modal-aligned-autoregressive-co-speech/ 音频生成 | 8.0/10 MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mage-a-coarse-to-fine-speech-enhancer-with-masked/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mage-a-coarse-to-fine-speech-enhancer-with-masked/ 语音增强 | 8.0/10 Malefa: Multi-Granularity Learning and Effective False Alarm Suppression for Zero-Shot Keyword Spotting https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-malefa-multi-granularity-learning-and-effective/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-malefa-multi-granularity-learning-and-effective/ 零样本关键词检测 | 7.5/10 Mambaformer: State-Space Augmented Self-Attention with Downup Sampling for Monaural Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mambaformer-state-space-augmented-self-attention/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mambaformer-state-space-augmented-self-attention/ 语音增强 | 7.0/10 Marco-Voice: A Unified Framework for Expressive Speech Synthesis with Voice Cloning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-marco-voice-a-unified-framework-for-expressive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-marco-voice-a-unified-framework-for-expressive/ 语音合成 | 8.0/10 MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion with Increased Controllability via Multiple Guidances https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-maskvct-masked-voice-codec-transformer-for-zero/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-maskvct-masked-voice-codec-transformer-for-zero/ 语音转换 | 6.5/10 Matching Reverberant Speech Through Learned Acoustic Embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matching-reverberant-speech-through-learned/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matching-reverberant-speech-through-learned/ 音频生成 | 8.0/10 Matrix-Structured Hierarchical Convolutional Modeling for Pronunciation Assessment and Mispronunciation Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matrix-structured-hierarchical-convolutional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-matrix-structured-hierarchical-convolutional/ 语音评估 | 8.0/10 Maximum Likelihood Measurement Noise Estimation for Block-Time Domain Kalman Filters https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-maximum-likelihood-measurement-noise-estimation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-maximum-likelihood-measurement-noise-estimation/ 回声消除 | 7.0/10 MC-MRX: Reference- and Midi-Guided Music Source Extraction with Contrastive Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mc-mrx-reference-and-midi-guided-music-source/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mc-mrx-reference-and-midi-guided-music-source/ 音乐源提取 | 7.0/10 MCF: Text LLMS for Multimodal Emotional Causality https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mcf-text-llms-for-multimodal-emotional-causality/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mcf-text-llms-for-multimodal-emotional-causality/ 情感分析 | 8.0/10 MCI-OTFusion: A Multimodal Model for MCI Detection and Cognitive Score Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mci-otfusion-a-multimodal-model-for-mci-detection/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mci-otfusion-a-multimodal-model-for-mci-detection/ 轻度认知障碍检测 | 6.5/10 Meanflow-Accelerated Multimodal Video-to-Audio Synthesis Via One-Step Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflow-accelerated-multimodal-video-to-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflow-accelerated-multimodal-video-to-audio/ 音频生成 | 7.5/10 MeanFlowSE: One-Step Generative Speech Enhancement via Conditional Mean Flow https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflowse-one-step-generative-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanflowse-one-step-generative-speech-enhancement/ 语音增强 | 7.5/10 MeanSE: Efficient Generative Speech Enhancement with Mean Flows https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanse-efficient-generative-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanse-efficient-generative-speech-enhancement/ 语音增强 | 6.5/10 MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvc-lightweight-and-streaming-zero-shot-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvc-lightweight-and-streaming-zero-shot-voice/ 语音转换 | 7.5/10 MeanVoiceFlow: One-Step Nonparallel Voice Conversion with Mean Flows https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvoiceflow-one-step-nonparallel-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-meanvoiceflow-one-step-nonparallel-voice/ 语音转换 | 7.0/10 Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-measuring-prosody-diversity-in-zero-shot-tts-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-measuring-prosody-diversity-in-zero-shot-tts-a/ 语音合成 | 8.0/10 MECap-R1: Emotion-Aware Policy with Reinforcement Learning for Multimodal Emotion Captioning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mecap-r1-emotion-aware-policy-with-reinforcement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mecap-r1-emotion-aware-policy-with-reinforcement/ 语音情感识别 | 7.5/10 Medical ASR Enhancement by Domain-Specific Reinforcement Fine-Tuning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-medical-asr-enhancement-by-domain-specific/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-medical-asr-enhancement-by-domain-specific/ 语音识别 | 6.5/10 MELA-TTS: Joint Transformer-Diffusion Model with Representation Alignment for Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mela-tts-joint-transformer-diffusion-model-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mela-tts-joint-transformer-diffusion-model-with/ 语音合成 | 7.0/10 Melos: Sentence-To-Section Training with Multi-Task Learning for LLM-Driven Song Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-melos-sentence-to-section-training-with-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-melos-sentence-to-section-training-with-multi/ 音乐生成 | 6.5/10 Membership Inference Attack against Music Diffusion Models via Generative Manifold Perturbation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-membership-inference-attack-against-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-membership-inference-attack-against-music/ 音频安全 | 7.5/10 MFF-RVRDI: Multimodal Fusion Framework for Robust Video Recording Device Identification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mff-rvrdi-multimodal-fusion-framework-for-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mff-rvrdi-multimodal-fusion-framework-for-robust/ 视频设备识别 | 7.5/10 MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large Audio-Language Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mi-fuse-label-fusion-for-unsupervised-domain/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mi-fuse-label-fusion-for-unsupervised-domain/ 语音情感识别 | 8.0/10 Microphone-Less Measurement of Three-Dimensional Radiating Impulse Response of Sound Source using Spherical Harmonic-Domain Acousto-Optic Tomography https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-microphone-less-measurement-of-three-dimensional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-microphone-less-measurement-of-three-dimensional/ 声源定位 | 7.0/10 MIDI-LLaMA: An Instruction-Following Multimodal LLM for Symbolic Music Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-midi-llama-an-instruction-following-multimodal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-midi-llama-an-instruction-following-multimodal/ 音乐理解 | 7.5/10 Mind the Shift: Using Delta SSL Embeddings to Enhance Child ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-the-shift-using-delta-ssl-embeddings-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-the-shift-using-delta-ssl-embeddings-to/ 语音识别 | 7.0/10 Mind Your [m]S, Cross Your [t]S: a Large-Scale Phonetic Analysis of Speech Reproduction in Modern Speech Generators https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-your-ms-cross-your-ts-a-large-scale-phonetic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-your-ms-cross-your-ts-a-large-scale-phonetic/ 语音伪造检测 | 7.0/10 MirrorTalk: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mirrortalk-forging-personalized-avatars-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mirrortalk-forging-personalized-avatars-via/ 语音合成 | 7.0/10 Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mispronunciation-detection-and-diagnosis-without/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mispronunciation-detection-and-diagnosis-without/ 语音评估 | 8.0/10 Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-attention-sinks-and-massive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-attention-sinks-and-massive/ 语音识别 | 7.0/10 Mitigating Data Replication in Text-to-Audio Generative Diffusion Models Through Anti-Memorization Guidance https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-data-replication-in-text-to-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-data-replication-in-text-to-audio/ 音频生成 | 7.5/10 Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-intra-speaker-variability-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-intra-speaker-variability-in/ 说话人日志 | 7.0/10 Mitigating Language Prior-Induced Hallucinations via Bi-Level Contrastive Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-language-prior-induced-hallucinations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-language-prior-induced-hallucinations/ 多模态模型 | 7.5/10 Mitigating Shared-Private Branch Imbalance via Dual-Branch Rebalancing for Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-shared-private-branch-imbalance-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-shared-private-branch-imbalance-via/ 多模态模型 | 7.5/10 Mix2Morph: Learning Sound Morphing from Noisy Mixes https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mix2morph-learning-sound-morphing-from-noisy-mixes/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mix2morph-learning-sound-morphing-from-noisy-mixes/ 音频生成 | 7.5/10 MixGAN-based Non-blind Bandwidth Extension for Audio Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixgan-based-non-blind-bandwidth-extension-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixgan-based-non-blind-bandwidth-extension-for/ 音频增强 | 8.0/10 Mixture of Experts for Recognizing Depression from Interview and Reading Tasks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-for-recognizing-depression/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-for-recognizing-depression/ 语音生物标志物 | 6.0/10 Mixture To Beamformed Mixture: Leveraging Beamformed Mixture As Weak-Supervision for Speech Enhancement and Noise-Robust ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-to-beamformed-mixture-leveraging/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-to-beamformed-mixture-leveraging/ 语音增强 | 8.0/10 Mixture-of-Experts Based Soft-Label Learning for Multi-Label Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-based-soft-label-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-based-soft-label-learning-for/ 语音情感识别 | 7.5/10 Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-framework-for-field-of-view/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-framework-for-field-of-view/ 空间音频 | 6.5/10 Mixtures of Lightweight Articulatory Experts for Multilingual Asr https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/ 语音识别 | 7.0/10 ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ml-san-multi-level-speaker-adaptive-network-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ml-san-multi-level-speaker-adaptive-network-for/ 语音情感识别 | 8.0/10 MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmaudiosep-taming-video-to-audio-generative-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmaudiosep-taming-video-to-audio-generative-model/ 语音分离 | 8.0/10 MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmeb-v3-measuring-the-performance-gaps-of-omni/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmeb-v3-measuring-the-performance-gaps-of-omni/ 基准测试 | 7.5/10 MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mnv-17-a-high-quality-performative-mandarin/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mnv-17-a-high-quality-performative-mandarin/ 语音识别 | 7.5/10 Modeling Both Intra- And Inter-Utterance Variability for Conversational Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-modeling-both-intra-and-inter-utterance/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-modeling-both-intra-and-inter-utterance/ 语音情感识别 | 6.5/10 Modeling Inter-Segment Relationships in Speech for Dementia Detection with Audio Spectrogram Transformers and Graph Attention Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-modeling-inter-segment-relationships-in-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-modeling-inter-segment-relationships-in-speech/ 语音生物标志物 | 7.0/10 Modeling Strategies For Speech Enhancement in The Latent Space of a Neural Audio Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-modeling-strategies-for-speech-enhancement-in-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-modeling-strategies-for-speech-enhancement-in-the/ 语音增强 | 8.0/10 Monitoring exposure-length variations in submarine power cables using distributed fiber-optic sensing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-monitoring-exposure-length-variations-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-monitoring-exposure-length-variations-in/ 音频事件检测 | 6.5/10 More Than a Shortcut: A Hyperbolic Approach to Early-Exit Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-more-than-a-shortcut-a-hyperbolic-approach-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-more-than-a-shortcut-a-hyperbolic-approach-to/ 音频事件检测 | 8.0/10 Motionbeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-motionbeat-motion-aligned-music-representation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-motionbeat-motion-aligned-music-representation/ 舞蹈生成 | 7.5/10 MR-FlowDPO: Multi-Reward Direct Preference Optimization for Flow-Matching Text-to-Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/ 音乐生成 | 7.5/10 MSANET: Multi-Scale Semantic Aggregation Network for Brain-Assisted Speech Enhancement in Multi-Speaker Conditions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msanet-multi-scale-semantic-aggregation-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msanet-multi-scale-semantic-aggregation-network/ 语音增强 | 7.5/10 MSCT: Differential Cross-Modal Attention for Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msct-differential-cross-modal-attention-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msct-differential-cross-modal-attention-for/ 音频深度伪造检测 | 6.5/10 MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msf-ser-enriching-acoustic-modeling-with-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-msf-ser-enriching-acoustic-modeling-with-multi/ 语音情感识别 | 7.5/10 MT-HuBERT: Self-Supervised Mix-Training for Few-Shot Keyword Spotting in Mixed Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mt-hubert-self-supervised-mix-training-for-few/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mt-hubert-self-supervised-mix-training-for-few/ 关键词检测 | 7.0/10 MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-Token Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mtp-s2ut-enhancing-speech-to-speech-translation/ 语音翻译 | 8.5/10 Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-channel-speech-enhancement-for-cocktail/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-channel-speech-enhancement-for-cocktail/ 语音情感识别 | 7.5/10 Multi-Layer Attentive Probing Improves Transfer of Audio Representations for Bioacoustics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-layer-attentive-probing-improves-transfer/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-layer-attentive-probing-improves-transfer/ 生物声学 | 7.5/10 Multi-Scale Physiologically-Motivated Alignment for Auditory Attention Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-scale-physiologically-motivated-alignment/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-scale-physiologically-motivated-alignment/ 听觉注意力解码 | 7.5/10 Multi-Task Learning For Speech Quality Assessment Using ASR-Derived Entropy Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-learning-for-speech-quality-assessment/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-learning-for-speech-quality-assessment/ 语音质量评估 | 7.5/10 Multi-Task Transformer for Explainable Speech Deepfake Detection via Formant Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-transformer-for-explainable-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-task-transformer-for-explainable-speech/ 语音伪造检测 | 7.5/10 Multi-View Hierarchical Hypergraph Neural Network for Automatic Stuttering Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-view-hierarchical-hypergraph-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-view-hierarchical-hypergraph-neural-network/ 语音生物标志物 | 7.5/10 Multilingual Supervised Pretraining with Lm-Assisted Decoding for Visual Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/ 语音识别 | 6.5/10 Multimodal Co-Training with Subtractive Unlabeled-Benefit Bounds https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-co-training-with-subtractive-unlabeled/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-co-training-with-subtractive-unlabeled/ 多模态学习 | 6.0/10 Multimodal Fusion-Based IPCLIP Network for Mixed Reality Surgical Assistance https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-fusion-based-ipclip-network-for-mixed/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-fusion-based-ipclip-network-for-mixed/ 多模态模型 | 6.5/10 Multimodal LLMs as Expert Speech Annotators: Acoustic Macro-Descriptors for Parkinson's Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-llms-as-expert-speech-annotators/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-llms-as-expert-speech-annotators/ 语音生物标志物 | 6.5/10 Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-room-impulse-response-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-room-impulse-response-generation/ 音频生成 | 7.5/10 Multimodal Self-Attention Network with Temporal Alignment for Audio-Visual Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-self-attention-network-with-temporal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-self-attention-network-with-temporal/ 语音情感识别 | 8.0/10 Multimodal Transformer with Multiperspective Training for Predicting Self-Expression Skills from Video Interview https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-transformer-with-multiperspective/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-transformer-with-multiperspective/ 多模态模型 | 7.0/10 Multimodal Variational Graph Network for Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-variational-graph-network-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multimodal-variational-graph-network-for/ 语音情感识别 | 7.5/10 MuseTok: Symbolic Music Tokenization for Generation and Semantic Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musetok-symbolic-music-tokenization-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musetok-symbolic-music-tokenization-for/ 音乐生成 | 8.5/10 Musicdetr: A Position-Aware Spectral Note Detection Model for Singing Transcription https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musicdetr-a-position-aware-spectral-note/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musicdetr-a-position-aware-spectral-note/ 歌唱语音转录 | 8.5/10 MusiCRS: Benchmarking Audio-Centric Conversational Recommendation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musicrs-benchmarking-audio-centric-conversational/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-musicrs-benchmarking-audio-centric-conversational/ 音乐推荐 | 7.5/10 Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mutual-forcing-dual-mode-self-evolution-for-fast/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mutual-forcing-dual-mode-self-evolution-for-fast/ 音频生成 | 7.5/10 Natural Language to Spatial Audio Parameters: Lightweight Deterministic Rendering for Creative Authoring https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-natural-language-to-spatial-audio-parameters/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-natural-language-to-spatial-audio-parameters/ 空间音频 | 7.5/10 NCF-TTS: Enhancing Flow Matching Based Text-To-Speech with Neighborhood Consistency Flow https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ncf-tts-enhancing-flow-matching-based-text-to/ 语音合成 | 8.0/10 Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-nemotron-3-nano-omni-efficient-and-open/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-nemotron-3-nano-omni-efficient-and-open/ 多模态模型 | 8.5/10 Neural Network-Based Time-Frequency-Bin-Wise Linear Combination of Beamformers for Underdetermined Target Source Extraction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neural-network-based-time-frequency-bin-wise/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neural-network-based-time-frequency-bin-wise/ 语音分离 | 7.0/10 Neuromamba: Adaptive Frequency Filtering with a Pyramid Mamba for sEEG-driven Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neuromamba-adaptive-frequency-filtering-with-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neuromamba-adaptive-frequency-filtering-with-a/ 语音合成 | 8.0/10 NeuroSIFT: A Biologically-Inspired Framework with Explicit Signal-Noise Separation for Robust Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neurosift-a-biologically-inspired-framework-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neurosift-a-biologically-inspired-framework-with/ 多模态情感识别 | 8.0/10 nGPT as a Scalable Architecture for Speech Recognition and Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ngpt-as-a-scalable-architecture-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ngpt-as-a-scalable-architecture-for-speech/ 语音识别 | 7.5/10 No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-no-verifiable-reward-for-prosody-toward/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-no-verifiable-reward-for-prosody-toward/ 语音合成 | 8.0/10 Noise-Robust AV-ASR Using Visual Features both in the Whisper Encoder and Decoder https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-robust-av-asr-using-visual-features-both-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-robust-av-asr-using-visual-features-both-in/ 语音识别 | 8.0/10 Noise-Robust Contrastive Learning with an MFCC-Conformer for Coronary Artery Disease Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-robust-contrastive-learning-with-an-mfcc/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-robust-contrastive-learning-with-an-mfcc/ 音频分类 | 7.0/10 Noise-to-Notes: Diffusion-Based Generation and Refinement for Automatic Drum Transcription https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-to-notes-diffusion-based-generation-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-noise-to-notes-diffusion-based-generation-and/ 音乐信息检索 | 8.0/10 Non-Line-of-Sight Vehicle Detection via Audio-Visual Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-non-line-of-sight-vehicle-detection-via-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-non-line-of-sight-vehicle-detection-via-audio/ 音频分类 | 8.0/10 Obstructive Sleep Apnea Endotype Prediction During Wakefulness Using Voice Biomarkers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-obstructive-sleep-apnea-endotype-prediction/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-obstructive-sleep-apnea-endotype-prediction/ 语音生物标志物 | 6.5/10 Off-The-Grid Multi-Pitch Estimation Using Optimal Transport https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-off-the-grid-multi-pitch-estimation-using-optimal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-off-the-grid-multi-pitch-estimation-using-optimal/ 音乐信息检索 | 7.5/10 OMNI-AVSR: Towards Unified Multimodal Speech Recognition With Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-omni-avsr-towards-unified-multimodal-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-omni-avsr-towards-unified-multimodal-speech/ 语音识别 | 8.5/10 On deepfake voice detection - It’s all in the presentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-deepfake-voice-detection-its-all-in-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-deepfake-voice-detection-its-all-in-the/ 音频深度伪造检测 | 8.0/10 On The Design of Efficient Neural Methods for Geometry-Agnostic Multichannel Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-the-design-of-efficient-neural-methods-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-the-design-of-efficient-neural-methods-for/ 语音增强 | 6.5/10 On the Design of Higher-Order Time-Intensity Microphone Arrays for Panoramic Audio Recording and Reproduction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-the-design-of-higher-order-time-intensity/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-on-the-design-of-higher-order-time-intensity/ 空间音频 | 7.0/10 One Model–Three Tasks: Discovering a Shared Winning Ticket for Low-Complexity Audio Intelligence https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-one-modelthree-tasks-discovering-a-shared-winning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-one-modelthree-tasks-discovering-a-shared-winning/ 音频分类 | 7.5/10 Online Register For Dual-Mode Self-Supervised Speech Models: Mitigating the Lack of Future Context https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-online-register-for-dual-mode-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-online-register-for-dual-mode-self-supervised/ 语音识别 | 6.5/10 Optimizing Domain-Adaptive Self-Supervised Learning for Clinical Voice-Based Disease Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-domain-adaptive-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-domain-adaptive-self-supervised/ 语音生物标志物 | 7.0/10 Optimizing Speech Language Models for Acoustic Consistency https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-speech-language-models-for-acoustic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-speech-language-models-for-acoustic/ 语音合成 | 8.0/10 OV-INSTRUCTTTS: Towards Open-Vocabulary Instruct Text-to-Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ov-instructtts-towards-open-vocabulary-instruct/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ov-instructtts-towards-open-vocabulary-instruct/ 语音合成 | 8.0/10 PAC: Pronunciation-Aware Contextualized Large Language Model-Based Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pac-pronunciation-aware-contextualized-large/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pac-pronunciation-aware-contextualized-large/ 语音识别 | 7.0/10 PADAM: Perceptual Audio Defect Assessment Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-padam-perceptual-audio-defect-assessment-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-padam-perceptual-audio-defect-assessment-model/ 音频分类 | 7.0/10 ParaGSE: Parallel Generative Speech Enhancement with Group-Vector-Quantization-Based Neural Speech Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-paragse-parallel-generative-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-paragse-parallel-generative-speech-enhancement/ 语音增强 | 7.5/10 Parametric Neural Amp Modeling with Active Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-parametric-neural-amp-modeling-with-active/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-parametric-neural-amp-modeling-with-active/ 音频生成 | 8.0/10 PC-MCL: Patient-Consistent Multi-Cycle Learning with Multi-Label Bias Correction for Respiratory Sound Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pc-mcl-patient-consistent-multi-cycle-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pc-mcl-patient-consistent-multi-cycle-learning/ 音频分类 | 7.5/10 Peeking Into the Future for Contextual Biasing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-peeking-into-the-future-for-contextual-biasing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-peeking-into-the-future-for-contextual-biasing/ 语音识别 | 7.0/10 Perceptual Loss Optimized HRTF Personalization in Spherical Harmonic Domain https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-perceptual-loss-optimized-hrtf-personalization-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-perceptual-loss-optimized-hrtf-personalization-in/ 空间音频 | 7.0/10 Perceptual Quality Assessment for Stylized Talking Heads https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-perceptual-quality-assessment-for-stylized/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-perceptual-quality-assessment-for-stylized/ 模型评估 | 7.5/10 PerformSinger: Multimodal Singing Voice Synthesis Leveraging Synchronized Lip Cues from Singing Performance Videos https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-performsinger-multimodal-singing-voice-synthesis/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-performsinger-multimodal-singing-voice-synthesis/ 歌唱语音合成 | 4.5/10 Personal Sound Zones with Flexible Bright Zone Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-personal-sound-zones-with-flexible-bright-zone/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-personal-sound-zones-with-flexible-bright-zone/ 空间音频 | 7.5/10 PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-personaplex-voice-and-role-control-for-full/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-personaplex-voice-and-role-control-for-full/ 语音对话系统 | 8.5/10 PFluxTTS: Hybrid Flow-Matching TTS with Robust Cross-Lingual Voice Cloning and Inference-Time Model Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pfluxtts-hybrid-flow-matching-tts-with-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pfluxtts-hybrid-flow-matching-tts-with-robust/ 语音合成 | 7.0/10 PG-SE: Predictive Acceleration and Correction for Generative Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pg-se-predictive-acceleration-and-correction-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pg-se-predictive-acceleration-and-correction-for/ 语音增强 | 7.5/10 Phase-Retrieval-Based Physics-Informed Neural Networks For Acoustic Magnitude Field Reconstruction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phase-retrieval-based-physics-informed-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phase-retrieval-based-physics-informed-neural/ 声源定位 | 7.0/10 Phase-Space Signal Processing of Acoustic Data for Advanced Manufacturing In-Situ Monitoring https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phase-space-signal-processing-of-acoustic-data/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phase-space-signal-processing-of-acoustic-data/ 音频事件检测 | 7.0/10 PhoenixDSR: Phoneme-Guided and LLM-Enhanced Dysarthric Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phoenixdsr-phoneme-guided-and-llm-enhanced/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phoenixdsr-phoneme-guided-and-llm-enhanced/ 语音识别 | 7.0/10 Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phoneme-level-visual-speech-recognition-via-point/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phoneme-level-visual-speech-recognition-via-point/ 视觉语音识别 | 7.5/10 Phonological Tokenizer: Prosody-Aware Phonetic Token Via Multi-Objective Fine-Tuning with Differentiable K-Means https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phonological-tokenizer-prosody-aware-phonetic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phonological-tokenizer-prosody-aware-phonetic/ 语音表示学习 | 8.0/10 Phrased: Phrase Dictionary Biasing for Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phrased-phrase-dictionary-biasing-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-phrased-phrase-dictionary-biasing-for-speech/ 语音翻译 | 7.5/10 Physics-Informed Neural Networks for Ocean Acoustic Field Reconstruction and Source Localization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-physics-informed-neural-networks-for-ocean/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-physics-informed-neural-networks-for-ocean/ 声源定位 | 7.5/10 Pianoroll-Event: A Novel Score Representation for Symbolic Music https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pianoroll-event-a-novel-score-representation-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pianoroll-event-a-novel-score-representation-for/ 音乐生成 | 6.5/10 PICOAUDIO2: Temporal Controllable Text-to-Audio Generation with Natural Language Description https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-picoaudio2-temporal-controllable-text-to-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-picoaudio2-temporal-controllable-text-to-audio/ 音频生成 | 7.5/10 Plug-and-Play Emotion Graphs for Compositional Prompting in Zero-Shot Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-plug-and-play-emotion-graphs-for-compositional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-plug-and-play-emotion-graphs-for-compositional/ 语音情感识别 | 7.0/10 Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-poly-svc-polyphony-aware-singing-voice-conversion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-poly-svc-polyphony-aware-singing-voice-conversion/ 歌唱语音转换 | 6.5/10 Polynomial Mixing for Efficient Self-Supervised Speech Encoders https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/ 语音识别 | 8.0/10 Position-Invariant Fine-Tuning Of Speech Enhancement Models With Self-Supervised Speech Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-position-invariant-fine-tuning-of-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-position-invariant-fine-tuning-of-speech/ 语音增强 | 6.5/10 Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/ 语音合成 | 8.0/10 Principled Coarse-Grained Acceptance For Speculative Decoding In Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-principled-coarse-grained-acceptance-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-principled-coarse-grained-acceptance-for/ 语音合成 | 7.5/10 PRoADS: Provably Secure And Robust Audio Diffusion Steganography With Latent Optimization And Backward Euler Inversion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proads-provably-secure-and-robust-audio-diffusion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proads-provably-secure-and-robust-audio-diffusion/ 音频安全 | 6.5/10 Probing the Hidden Talent of ASR foundation models for L2 English Oral Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-the-hidden-talent-of-asr-foundation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-the-hidden-talent-of-asr-foundation/ 预训练 | 7.5/10 Probing Whisper for Dysarthric Speech in Detection and Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-whisper-for-dysarthric-speech-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-whisper-for-dysarthric-speech-in/ 语音生物标志物 | 6.5/10 Production-Scale Dynamic Vocabulary ASR Biasing with Word-Level FST and Robust Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-production-scale-dynamic-vocabulary-asr-biasing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-production-scale-dynamic-vocabulary-asr-biasing/ 语音识别 | 7.5/10 Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-proficiency-aware-adaptation-and-data/ 语音识别 | 6.5/10 Prompt-Guided Mixture-of-Experts for Robust Multimodal Sentiment Analysis with Missing Modalities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prompt-guided-mixture-of-experts-for-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prompt-guided-mixture-of-experts-for-robust/ 语音情感识别 | 8.5/10 PromptSep: Generative Audio Separation Via Multimodal Prompting https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-promptsep-generative-audio-separation-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-promptsep-generative-audio-separation-via/ 语音分离 | 7.5/10 Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prosody-guided-harmonic-attention-for-phase/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prosody-guided-harmonic-attention-for-phase/ 语音合成 | 8.0/10 PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prost-llm-progressively-enhancing-the-speech-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prost-llm-progressively-enhancing-the-speech-to/ 语音翻译 | 7.5/10 Prototype-Guided Cross-Modal Contrastive Learning for Continual Audio-Visual Sound Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prototype-guided-cross-modal-contrastive-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prototype-guided-cross-modal-contrastive-learning/ 语音分离 | 7.5/10 PRSA: Preventing Malicious Speaker Recognition and Speech Synthesis Simultaneously with Adversarial Examples https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prsa-preventing-malicious-speaker-recognition-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prsa-preventing-malicious-speaker-recognition-and/ 语音匿名化 | 7.0/10 PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-psp-an-interpretable-per-dimension-accent/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-psp-an-interpretable-per-dimension-accent/ 基准测试 | 7.5/10 PSTalker: Realistic 3D Talking Head Synthesis via a Semantic-Aware Audio-Driven Point-Based Shape https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pstalker-realistic-3d-talking-head-synthesis-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pstalker-realistic-3d-talking-head-synthesis-via/ 说话人合成 | 7.5/10 Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-purification-before-fusion-toward-mask-free/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-purification-before-fusion-toward-mask-free/ 语音识别 | 7.5/10 Qastanet: A DNN-Based Quality Metric for Spatial Audio https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qastanet-a-dnn-based-quality-metric-for-spatial/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qastanet-a-dnn-based-quality-metric-for-spatial/ 空间音频 | 7.5/10 QE-XVC: Zero-Shot Cross-Lingual Voice Conversion via Query-Enhancement and Conditional Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qe-xvc-zero-shot-cross-lingual-voice-conversion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qe-xvc-zero-shot-cross-lingual-voice-conversion/ 语音转换 | 7.5/10 QFOCUS: Controllable Synthesis for Automated Speech Stress Editing to Deliver Human-Like Emphatic Intent https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qfocus-controllable-synthesis-for-automated/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qfocus-controllable-synthesis-for-automated/ 语音合成 | 7.5/10 Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for Voicemos 2024 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quality-assessment-of-noisy-and-enhanced-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quality-assessment-of-noisy-and-enhanced-speech/ 语音质量评估 | 7.0/10 Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quantifying-speaker-embedding-phonological-rule/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quantifying-speaker-embedding-phonological-rule/ 语音合成 | 7.0/10 Random Matrix-Driven Graph Representation Learning For Bioacoustic Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-random-matrix-driven-graph-representation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-random-matrix-driven-graph-representation/ 生物声学 | 7.5/10 Ranking The Impact of Contextual Specialization in Neural Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ranking-the-impact-of-contextual-specialization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ranking-the-impact-of-contextual-specialization/ 语音增强 | 7.5/10 RAP: Real-Time Audio-Driven Portrait Animation with Video Diffusion Transformer https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rap-real-time-audio-driven-portrait-animation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rap-real-time-audio-driven-portrait-animation/ 音视频 | 7.0/10 RAS: a Reliability Oriented Metric for Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ras-a-reliability-oriented-metric-for-automatic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ras-a-reliability-oriented-metric-for-automatic/ 语音识别 | 7.5/10 RASD-SR: A Robust Anomalous Sound Detection Framework with Score Recalibration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rasd-sr-a-robust-anomalous-sound-detection/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rasd-sr-a-robust-anomalous-sound-detection/ 异常声音检测 | 8.5/10 Rationale-Guided Learning for Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rationale-guided-learning-for-multimodal-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rationale-guided-learning-for-multimodal-emotion/ 语音情感识别 | 7.0/10 RCAL: Reinforced Cross-Modal Alignment for Multimodal Sentiment Analysis with Sparse Visual Frames https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rcal-reinforced-cross-modal-alignment-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rcal-reinforced-cross-modal-alignment-for/ 多模态模型 | 8.5/10 Reading Between the Waves: Robust Topic Segmentation Using Inter-Sentence Audio Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reading-between-the-waves-robust-topic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reading-between-the-waves-robust-topic/ 音频分类 | 7.0/10 Real-Time Streaming MEL Vocoding with Generative Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-real-time-streaming-mel-vocoding-with-generative/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-real-time-streaming-mel-vocoding-with-generative/ 语音合成 | 7.5/10 Reasoning Driven Captions to Assist Noise Robust Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reasoning-driven-captions-to-assist-noise-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reasoning-driven-captions-to-assist-noise-robust/ 语音情感识别 | 7.0/10 ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recom-realistic-co-speech-motion-generation-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recom-realistic-co-speech-motion-generation-with/ 音频生成 | 7.0/10 Reconstruction of Spherical Sound Source Radiation Characteristics with Graph Signal Processing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reconstruction-of-spherical-sound-source/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reconstruction-of-spherical-sound-source/ 空间音频 | 7.5/10 Recovering Performance in Speech Emotion Recognition from Discrete Tokens Via Multi-Layer Fusion and Paralinguistic Feature Integration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recovering-performance-in-speech-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-recovering-performance-in-speech-emotion/ 语音情感识别 | 6.5/10 Reducing Prompt Sensitivity in LLM-Based Speech Recognition Through Learnable Projection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reducing-prompt-sensitivity-in-llm-based-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reducing-prompt-sensitivity-in-llm-based-speech/ 语音识别 | 7.0/10 Reference Microphone Selection for Guided Source Separation Based on The Normalized L-P Norm https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-microphone-selection-for-guided-source/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-microphone-selection-for-guided-source/ 语音增强 | 7.0/10 Reference-Aware SFM Layers for Intrusive Intelligibility Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-aware-sfm-layers-for-intrusive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-aware-sfm-layers-for-intrusive/ 语音评估 | 7.5/10 Refgen: Reference-Guided Synthetic Data Generation for Anomalous Sound Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-refgen-reference-guided-synthetic-data-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-refgen-reference-guided-synthetic-data-generation/ 音频事件检测 | 7.5/10 Regularized Inverse Filter Design for Rigid Spherical Microphone Array Processing: Laplace- And Time-Domain Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-regularized-inverse-filter-design-for-rigid/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-regularized-inverse-filter-design-for-rigid/ 空间音频 | 8.0/10 Relative Time Intervals Representation For Word-Level Timestamping With Masked Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-relative-time-intervals-representation-for-word/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-relative-time-intervals-representation-for-word/ 语音识别 | 8.0/10 Reliable AI via Age-Balanced Validation: Fair Model Selection for Parkinson’s Detection from Voice https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reliable-ai-via-age-balanced-validation-fair/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reliable-ai-via-age-balanced-validation-fair/ 语音生物标志物 | 7.5/10 Representation-Based Data Quality Audits for Audio https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-based-data-quality-audits-for-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-based-data-quality-audits-for-audio/ 数据集 | 7.5/10 Representation-Diverse Self-Supervision for Cross-Domain Bioacoustic Learning in Low-Resource Settings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/ 生物声学 | 7.0/10 Residual Tokens Enhance Masked Autoencoders for Speech Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-residual-tokens-enhance-masked-autoencoders-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-residual-tokens-enhance-masked-autoencoders-for/ 语音合成 | 7.0/10 Respire-Mamba C-UNet: Consistency-Trained Autoencoder for High-Fidelity Respiratory Sound Compression https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-respire-mamba-c-unet-consistency-trained/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-respire-mamba-c-unet-consistency-trained/ 音频压缩 | 7.0/10 Rethinking Entity Disambiguation in Complex Modalities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rethinking-entity-disambiguation-in-complex/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rethinking-entity-disambiguation-in-complex/ 实体消歧 | 8.0/10 Rethinking Music Captioning with Music Metadata LLMS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rethinking-music-captioning-with-music-metadata/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rethinking-music-captioning-with-music-metadata/ 音乐理解 | 7.0/10 Retrieval-Based Speculative Decoding For Autoregressive Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-retrieval-based-speculative-decoding-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-retrieval-based-speculative-decoding-for/ 语音合成 | 7.0/10 Revisiting Direct Speech-to-Text Translation with Speech LLMS: Better Scaling than Cot Prompting? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-revisiting-direct-speech-to-text-translation-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-revisiting-direct-speech-to-text-translation-with/ 语音翻译 | 7.5/10 RFM-Editing: Rectified Flow Matching for Text-Guided Audio Editing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rfm-editing-rectified-flow-matching-for-text/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rfm-editing-rectified-flow-matching-for-text/ 音频编辑 | 7.5/10 RHO-PERFECT: Correlation Ceiling for Subjective Evaluation Datasets https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rho-perfect-correlation-ceiling-for-subjective/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rho-perfect-correlation-ceiling-for-subjective/ 模型评估 | 7.5/10 RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rir-former-coordinate-guided-transformer-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rir-former-coordinate-guided-transformer-for/ 房间脉冲响应 | 7.0/10 RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rlbr-reinforcement-learning-with-biasing-rewards/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rlbr-reinforcement-learning-with-biasing-rewards/ 语音识别 | 8.0/10 RMODGDF: A Robust STFT-Derived Feature for Musical Instrument Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rmodgdf-a-robust-stft-derived-feature-for-musical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rmodgdf-a-robust-stft-derived-feature-for-musical/ 音乐信息检索 | 7.0/10 Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-accent-identification-via-voice-conversion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-accent-identification-via-voice-conversion/ 语音识别 | 7.5/10 Robust and Lightweight F0 Estimation Through Mid-Level Fusion of DSP-Informed Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-and-lightweight-f0-estimation-through-mid/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-and-lightweight-f0-estimation-through-mid/ 基频估计 | 8.0/10 Robust Deepfake Audio Detection via Multi-Level Intermediate Feature Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-deepfake-audio-detection-via-multi-level/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-deepfake-audio-detection-via-multi-level/ 音频深度伪造检测 | 7.5/10 Robust Online Overdetermined Independent Vector Analysis Based on Bilinear Decomposition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-online-overdetermined-independent-vector/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-robust-online-overdetermined-independent-vector/ 语音分离 | 7.0/10 RoCo: Robust Code for Fast and Effective Proactive Defense against Voice Cloning Attack https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-roco-robust-code-for-fast-and-effective-proactive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-roco-robust-code-for-fast-and-effective-proactive/ 音频安全 | 7.5/10 RRPO: Robust Reward Policy Optimization for LLM-Based Emotional TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rrpo-robust-reward-policy-optimization-for-llm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rrpo-robust-reward-policy-optimization-for-llm/ 语音合成 | 7.5/10 S-PRESSO: Ultra Low Bitrate Sound Effect Compression with Diffusion Autoencoders and Offline Quantization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s-presso-ultra-low-bitrate-sound-effect/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s-presso-ultra-low-bitrate-sound-effect/ 音频生成 | 7.5/10 S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s-sondo-self-supervised-knowledge-distillation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s-sondo-self-supervised-knowledge-distillation/ 音频分类 | 7.0/10 S2Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s2voice-style-aware-autoregressive-modeling-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-s2voice-style-aware-autoregressive-modeling-with/ 歌唱语音转换 | 7.0/10 SA-SSL-MOS: Self-Supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sa-ssl-mos-self-supervised-learning-mos/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sa-ssl-mos-self-supervised-learning-mos/ 语音质量评估 | 7.0/10 SAASDNet: An EEG-Based Streaming Auditory Attention Switch Decoding Network for Self-Initiated Attention Switching in Mixed Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saasdnet-an-eeg-based-streaming-auditory/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saasdnet-an-eeg-based-streaming-auditory/ 脑机接口 | 8.0/10 SAGA-SR: Semantically and Acoustically Guided Audio Super-Resolution https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saga-sr-semantically-and-acoustically-guided/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-saga-sr-semantically-and-acoustically-guided/ 音频增强 | 7.5/10 Salad-VAE: Semantic Audio Compression with Language-Audio Distillation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-salad-vae-semantic-audio-compression-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-salad-vae-semantic-audio-compression-with/ 音频压缩 | 7.5/10 Sampling-Rate-Agnostic Speech Super-Resolution Based on Gaussian Process Dynamical Systems with Deep Kernel Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sampling-rate-agnostic-speech-super-resolution/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sampling-rate-agnostic-speech-super-resolution/ 语音增强 | 6.5/10 SAUNA: Song-Level Audio & User-Listening Data Neural Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sauna-song-level-audio-user-listening-data-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sauna-song-level-audio-user-listening-data-neural/ 音乐信息检索 | 7.0/10 Savgbench: Benchmarking Spatially Aligned Audio-Video Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-savgbench-benchmarking-spatially-aligned-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-savgbench-benchmarking-spatially-aligned-audio/ 基准测试 | 7.5/10 Scalable Evaluation for Audio Identification Via Synthetic Latent Fingerprint Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scalable-evaluation-for-audio-identification-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scalable-evaluation-for-audio-identification-via/ 音频检索 | 7.0/10 Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-ambiguity-augmenting-human-annotation-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-ambiguity-augmenting-human-annotation-in/ 语音情感识别 | 6.5/10 Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-multi-talker-asr-with-speaker-agnostic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-multi-talker-asr-with-speaker-agnostic/ 语音识别 | 8.5/10 Scaling Spoken Language Models with Syllabic Speech Tokenization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-spoken-language-models-with-syllabic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-spoken-language-models-with-syllabic/ 语音理解 | 7.0/10 SceneRAG: Scene-Level Retrieval-Augmented Generation for Video Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scenerag-scene-level-retrieval-augmented/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scenerag-scene-level-retrieval-augmented/ 视频理解 | 7.5/10 SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-se-dicow-self-enrolled-diarization-conditioned/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-se-dicow-self-enrolled-diarization-conditioned/ 语音识别 | 8.5/10 Secondary Source Placement for Sound Field Control Based on Ising Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-secondary-source-placement-for-sound-field/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-secondary-source-placement-for-sound-field/ 空间音频 | 6.0/10 SED: Structural Entropy Based Speech Discretization for Discrete Token-Based ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sed-structural-entropy-based-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sed-structural-entropy-based-speech/ 语音识别 | 6.5/10 Segmentwise Pruning in Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-segmentwise-pruning-in-audio-language-models/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-segmentwise-pruning-in-audio-language-models/ 音频问答 | 7.0/10 SELD-MOHA: A Fine-Tuning Method with the Mixture of Heterogeneous Adapters for Sound Event Localization and Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-seld-moha-a-fine-tuning-method-with-the-mixture/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-seld-moha-a-fine-tuning-method-with-the-mixture/ 音频事件检测 | 7.0/10 Selective Hub Fusion with Modality-Heterogeneous Experts for Multimodal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-selective-hub-fusion-with-modality-heterogeneous/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-selective-hub-fusion-with-modality-heterogeneous/ 多模态模型 | 6.5/10 Self-Supervised Note Tracking and Multi-Pitch Estimation Via Reconstruction-Based Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-self-supervised-note-tracking-and-multi-pitch/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-self-supervised-note-tracking-and-multi-pitch/ 多音高估计音符跟踪 | 8.5/10 Semantic Anchor Transfer from Short to Long Speech in a Distillation-Based Summarization Framework https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-anchor-transfer-from-short-to-long/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-anchor-transfer-from-short-to-long/ 语音摘要 | 7.5/10 Semantic-Guided Pseudo-Feature Attention Network for Audio-Visual Zero-Shot Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-guided-pseudo-feature-attention-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-guided-pseudo-feature-attention-network/ 音频分类零样本学习 | 7.0/10 SEP-ST: Incorporating Speech Entity Prompt Into Large Language Models for Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sep-st-incorporating-speech-entity-prompt-into/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sep-st-incorporating-speech-entity-prompt-into/ 语音翻译 | 7.5/10 Separate this, and all of these Things Around It: Music Source Separation Via Hyperellipsoidal Queries https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-separate-this-and-all-of-these-things-around-it/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-separate-this-and-all-of-these-things-around-it/ 音乐分离 | 7.0/10 Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sequence-level-unsupervised-training-in-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sequence-level-unsupervised-training-in-speech/ 语音识别 | 6.5/10 Sequential and Simultaneous Optimization of Microphone Array Geometry and Region-of-Interest Beamforming https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sequential-and-simultaneous-optimization-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sequential-and-simultaneous-optimization-of/ 声源定位 | 7.5/10 Session-Level Spoken Language Assessment with A Multimodal Foundation Model Via Multi-Target Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-session-level-spoken-language-assessment-with-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-session-level-spoken-language-assessment-with-a/ 语音评估 | 7.5/10 SFM-TTS: Lightweight and Rapid Speech Synthesis with Flexible Shortcut Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/ 语音合成 | 7.0/10 Shared Representation Learning for Reference-Guided Targeted Sound Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shared-representation-learning-for-reference/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shared-representation-learning-for-reference/ 音频事件检测 | 8.5/10 Shortcut Flow Matching for Speech Enhancement: Step-Invariant Flows via Single Stage Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shortcut-flow-matching-for-speech-enhancement/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-shortcut-flow-matching-for-speech-enhancement/ 语音增强 | 7.0/10 Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-Scale Dataset Cleansing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sidon-fast-and-robust-open-source-multilingual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sidon-fast-and-robust-open-source-multilingual/ 语音增强 | 8.5/10 SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sightsound-r1-cross-modal-reasoning-distillation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sightsound-r1-cross-modal-reasoning-distillation/ 音频问答 | 7.5/10 Sing What You Fit: A Perception-Based Dataset and Benchmark for Vocal-Song Suitability Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sing-what-you-fit-a-perception-based-dataset-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sing-what-you-fit-a-perception-based-dataset-and/ 音乐信息检索 | 7.0/10 Sing2Song: An Accompaniment Generation System Based on Solo Singing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sing2song-an-accompaniment-generation-system/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sing2song-an-accompaniment-generation-system/ 音乐生成 | 7.5/10 Single-Microphone Audio Point Source Discriminative Localization from Reverberation Late Tail Estimation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-single-microphone-audio-point-source/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-single-microphone-audio-point-source/ 说话人分离 | 7.0/10 Single-Step Controllable Music Bandwidth extension with Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-single-step-controllable-music-bandwidth/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-single-step-controllable-music-bandwidth/ 音乐信息检索 | 7.0/10 SingMOS-Pro: An Comprehensive Benchmark For Singing Quality Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-singmos-pro-an-comprehensive-benchmark-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-singmos-pro-an-comprehensive-benchmark-for/ 歌唱语音合成 | 7.5/10 SIREN: Spatially-Informed Reconstruction of Binaural Audio with Vision https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-siren-spatially-informed-reconstruction-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-siren-spatially-informed-reconstruction-of/ 空间音频 | 7.0/10 SIRUP: A Diffusion-Based Virtual Upmixer of Steering Vectors for Highly-Directive Spatialization with First-Order Ambisonics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sirup-a-diffusion-based-virtual-upmixer-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sirup-a-diffusion-based-virtual-upmixer-of/ 声源定位 | 7.0/10 SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slap-scalable-language-audio-pretraining-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slap-scalable-language-audio-pretraining-with/ 音频检索 | 8.0/10 SLM-SS: Speech Language Model for Generative Speech Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slm-ss-speech-language-model-for-generative/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slm-ss-speech-language-model-for-generative/ 语音分离 | 7.5/10 SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slm-tta-a-framework-for-test-time-adaptation-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slm-tta-a-framework-for-test-time-adaptation-of/ 语音识别 | 7.0/10 Slot Filling as a Reasoning Task for Speechllms https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slot-filling-as-a-reasoning-task-for-speechllms/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-slot-filling-as-a-reasoning-task-for-speechllms/ 槽填充 | 6.5/10 SmoothCLAP: Soft-Target Enhanced Contrastive Language-Audio Pretraining for Affective Computing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-smoothclap-soft-target-enhanced-contrastive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-smoothclap-soft-target-enhanced-contrastive/ 语音情感识别 | 6.5/10 Snore Sound Classification Based on Physiological Features and Adaptive Loss Function https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-snore-sound-classification-based-on-physiological/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-snore-sound-classification-based-on-physiological/ 音频分类 | 6.5/10 Solving the Helmholtz Equation Via Physics-Informed Neural Networks with an Adaptive Weighting Strategy https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-solving-the-helmholtz-equation-via-physics/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-solving-the-helmholtz-equation-via-physics/ 声学建模 | 6.5/10 SONAR: Self-Distilled Continual Pre-Training for Domain Adaptive Audio Representation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sonar-self-distilled-continual-pre-training-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sonar-self-distilled-continual-pre-training-for/ 音频事件检测 | 7.0/10 SoundCompass: Navigating Target Sound Extraction with Effective Directional Clue Integration in Complex Acoustic Scenes https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-soundcompass-navigating-target-sound-extraction/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-soundcompass-navigating-target-sound-extraction/ 语音分离 | 7.5/10 Sounding Highlights: Dual-Pathway Audio Encoders for Audio-Visual Video Highlight Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sounding-highlights-dual-pathway-audio-encoders/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sounding-highlights-dual-pathway-audio-encoders/ 视频高光检测 | 8.5/10 Sounds that Shape: Audio-Driven 3D Mesh Generation with Attribute-Decoupled Score Distillation Sampling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sounds-that-shape-audio-driven-3d-mesh-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sounds-that-shape-audio-driven-3d-mesh-generation/ 音频生成 | 7.0/10 Source Separation For A Cappella Music https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-source-separation-for-a-cappella-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-source-separation-for-a-cappella-music/ 语音分离 | 6.5/10 SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sp-mcqa-evaluating-intelligibility-of-tts-beyond/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sp-mcqa-evaluating-intelligibility-of-tts-beyond/ 语音合成 | 7.0/10 SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spade-structured-pruning-and-adaptive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spade-structured-pruning-and-adaptive/ 语音合成 | 7.5/10 SPAM: Style Prompt Adherence Metric for Prompt-Based TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spam-style-prompt-adherence-metric-for-prompt/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spam-style-prompt-adherence-metric-for-prompt/ 语音合成 | 7.0/10 Sparse Autoencoders Make Audio Foundation Models More Explainable https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sparse-autoencoders-make-audio-foundation-models/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sparse-autoencoders-make-audio-foundation-models/ 模型评估 | 6.5/10 Sparse-View Visual-Acoustic Latent Learning for Novel-View Audio Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sparse-view-visual-acoustic-latent-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sparse-view-visual-acoustic-latent-learning-for/ 空间音频 | 7.5/10 Spatial Covariance Matrix Reconstruction for Speech Enhancement in Reverberant Multi-Source Environments https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatial-covariance-matrix-reconstruction-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatial-covariance-matrix-reconstruction-for/ 语音增强 | 7.5/10 Spatial-CLAP: Learning Spatially-Aware Audio–Text Embeddings for Multi-Source Conditions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatial-clap-learning-spatially-aware-audiotext/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatial-clap-learning-spatially-aware-audiotext/ 空间音频 | 8.5/10 Spatially Aware Self-Supervised Models for Multi-Channel Neural Speaker Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatially-aware-self-supervised-models-for-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatially-aware-self-supervised-models-for-multi/ 说话人分离 | 8.0/10 SpatialNet-Echo: Real-Time Acoustic Echo Cancellation via Integrated Narrow-Band and Cross-Band Processing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatialnet-echo-real-time-acoustic-echo/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spatialnet-echo-real-time-acoustic-echo/ 语音增强 | 7.5/10 Speaker Anonymisation for Speech-Based Suicide Risk Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speaker-anonymisation-for-speech-based-suicide/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speaker-anonymisation-for-speech-based-suicide/ 语音匿名化 | 7.5/10 Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speaking-clearly-a-simplified-whisper-based-codec/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speaking-clearly-a-simplified-whisper-based-codec/ 语音编码 | 7.5/10 Spectral or Spatial? Leveraging Both for Speaker Extraction in Challenging Data Conditions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spectral-or-spatial-leveraging-both-for-speaker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spectral-or-spatial-leveraging-both-for-speaker/ 语音分离 | 7.0/10 Spectrogram Event Based Feature Representation for Generalizable Automatic Music Transcription https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spectrogram-event-based-feature-representation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spectrogram-event-based-feature-representation/ 音乐信息检索 | 7.5/10 Speech Emotion Recognition based on Hierarchical Transformer with Shifted Windows https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-emotion-recognition-based-on-hierarchical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-emotion-recognition-based-on-hierarchical/ 语音情感识别 | 8.0/10 Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-quality-based-localization-of-low-quality/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-quality-based-localization-of-low-quality/ 语音质量评估 | 7.0/10 SpeechCT-CLIP: Distilling Text-Image Knowledge to Speech for Voice-Native Multimodal CT Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechct-clip-distilling-text-image-knowledge-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechct-clip-distilling-text-image-knowledge-to/ 医疗AI | 7.5/10 SpeechMapper: Speech-To-Text Embedding Projector for LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechmapper-speech-to-text-embedding-projector/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechmapper-speech-to-text-embedding-projector/ 语音大模型 | 7.0/10 Spike-Driven Low-Power Speech Bandwidth Extension https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spike-driven-low-power-speech-bandwidth-extension/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spike-driven-low-power-speech-bandwidth-extension/ 语音增强 | 8.0/10 Spiking Attention Network: A Hybrid Neuromorphic Approach to Underwater Acoustic Localization and Zero-Shot Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spiking-attention-network-a-hybrid-neuromorphic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spiking-attention-network-a-hybrid-neuromorphic/ 声源定位 | 7.0/10 Spiking Temporal-Enhanced Network for Zero-Shot Audio-Visual Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spiking-temporal-enhanced-network-for-zero-shot/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spiking-temporal-enhanced-network-for-zero-shot/ 音频分类 | 7.0/10 Spring Reverb Emulation with Hybrid Gated Convolutional Networks and State Space Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spring-reverb-emulation-with-hybrid-gated/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spring-reverb-emulation-with-hybrid-gated/ 音频生成 | 7.5/10 SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ssvd-o-parameter-efficient-fine-tuning-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ssvd-o-parameter-efficient-fine-tuning-with/ 语音识别 | 7.0/10 ST-HNTM: Joint Speech-Text Neural Topic Modeling on the Hypersphere https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-st-hntm-joint-speech-text-neural-topic-modeling/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-st-hntm-joint-speech-text-neural-topic-modeling/ 主题建模 | 7.0/10 STACodec: Semantic Token Assignment for Balancing Acoustic Fidelity and Semantic Information in Audio Codecs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stacodec-semantic-token-assignment-for-balancing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stacodec-semantic-token-assignment-for-balancing/ 语音识别 | 8.0/10 Staged Diffusion with Hybrid Mixture-of-Experts (MOE) for Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-staged-diffusion-with-hybrid-mixture-of-experts/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-staged-diffusion-with-hybrid-mixture-of-experts/ 语音情感识别 | 8.0/10 Stemphonic: All-At-Once Flexible Multi-Stem Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stemphonic-all-at-once-flexible-multi-stem-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stemphonic-all-at-once-flexible-multi-stem-music/ 音乐生成 | 7.7/10 Step-Audio-R1.5 Technical Report https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-step-audio-r15-technical-report/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-step-audio-r15-technical-report/ 语音对话系统 | 8.0/10 StereoFoley: Object-Aware Stereo Audio Generation from Video https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stereofoley-object-aware-stereo-audio-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stereofoley-object-aware-stereo-audio-generation/ 音频生成 | 7.5/10 Stereophonic Acoustic Echo Cancellation Using an Improved Affine Projection Algorithm with Adaptive Multiple Sub-Filters https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stereophonic-acoustic-echo-cancellation-using-an/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stereophonic-acoustic-echo-cancellation-using-an/ 语音增强 | 6.0/10 Still Thinking or Stopped Talking? Dialogue Silence Intention Classification Using Multimodal Large Language Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-still-thinking-or-stopped-talking-dialogue/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-still-thinking-or-stopped-talking-dialogue/ 语音对话系统 | 6.5/10 Str-DiffSep: Streamable Diffusion Model for Speech Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-str-diffsep-streamable-diffusion-model-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-str-diffsep-streamable-diffusion-model-for-speech/ 语音分离 | 7.5/10 Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization Via Neural Audio Codec and Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stream-voice-anon-enhancing-utility-of-real-time/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stream-voice-anon-enhancing-utility-of-real-time/ 语音匿名化 | 7.0/10 Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streaming-speech-recognition-with-decoder-only/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streaming-speech-recognition-with-decoder-only/ 语音识别 | 7.0/10 Streamingbench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streamingbench-assessing-the-gap-for-mllms-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streamingbench-assessing-the-gap-for-mllms-to/ 基准测试 | 7.5/10 StreamMark: A Deep Learning-Based Semi-Fragile Audio Watermarking for Proactive Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streammark-a-deep-learning-based-semi-fragile/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streammark-a-deep-learning-based-semi-fragile/ 音频深度伪造检测 | 8.0/10 Stress Prediction from Temporal Emotion Trajectories in Clinical Patient-Physician Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/ 语音情感识别 | 7.0/10 Structure-Aware Diffusion Schrödinger Bridge https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-structure-aware-diffusion-schrdinger-bridge/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-structure-aware-diffusion-schrdinger-bridge/ 数据集对齐 | 7.7/10 StyHarmo: Efficient Style-Specific Video Generation with Music Synchronization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-styharmo-efficient-style-specific-video/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-styharmo-efficient-style-specific-video/ 视频生成 | 6.5/10 Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-style-attack-disguise-when-fonts-become-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-style-attack-disguise-when-fonts-become-a/ 对抗样本 | 7.0/10 Style-Disentangled Diffusion for Controllable and Identity-Generalized Speech-Driven Body Motion Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-style-disentangled-diffusion-for-controllable-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-style-disentangled-diffusion-for-controllable-and/ 语音驱动动作生成 | 7.0/10 StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylebench-evaluating-speech-language-models-on/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylebench-evaluating-speech-language-models-on/ 基准测试 | 8.5/10 StylePitcher: Generating Style-Following and Expressive Pitch Curves for Versatile Singing Tasks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylepitcher-generating-style-following-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylepitcher-generating-style-following-and/ 歌唱语音合成 | 7.5/10 Subgraph Localization in the Subbands for Partially Spoofed Speech Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subgraph-localization-in-the-subbands-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subgraph-localization-in-the-subbands-for/ 音频深度伪造检测 | 8.0/10 Subsequence SDTW: Differentiable Alignment with Flexible Boundary Conditions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subsequence-sdtw-differentiable-alignment-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subsequence-sdtw-differentiable-alignment-with/ 音乐信息检索 | 8.0/10 Subspace Hybrid Adaptive Filtering for Phonocardiogram Signal Denoising https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subspace-hybrid-adaptive-filtering-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subspace-hybrid-adaptive-filtering-for/ 音频增强 | 7.0/10 Sunac: Source-Aware Unified Neural Audio Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sunac-source-aware-unified-neural-audio-codec/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sunac-source-aware-unified-neural-audio-codec/ 音频生成 | 7.5/10 SURE: Synergistic Uncertainty-Aware Reasoning for Multimodal Emotion Recognition in Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sure-synergistic-uncertainty-aware-reasoning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sure-synergistic-uncertainty-aware-reasoning-for/ 语音情感识别 | 7.5/10 SwitchCodec: Adaptive Residual-Expert Sparse Quantization for High-Fidelity Neural Audio Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-switchcodec-adaptive-residual-expert-sparse/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-switchcodec-adaptive-residual-expert-sparse/ 音频生成 | 8.5/10 Symphony Rendering: Midi and Composer-Conditioned Auto Orchestration with Flow-Matching Transformers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphony-rendering-midi-and-composer-conditioned/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphony-rendering-midi-and-composer-conditioned/ 音乐生成 | 7.0/10 SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphonygen-3d-hierarchical-orchestral-generation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphonygen-3d-hierarchical-orchestral-generation/ 音乐生成 | 7.5/10 SynaSpot: A Lightweight, Streaming Multi-modal Framework for Keyword Spotting with Audio-Text Synergy https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synaspot-a-lightweight-streaming-multi-modal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synaspot-a-lightweight-streaming-multi-modal/ 关键词检测 | 7.5/10 Synchronous Secondary Path Modeling and Kronecker-Factorized Adaptive Algorithm for Multichannel Active Noise Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synchronous-secondary-path-modeling-and-kronecker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synchronous-secondary-path-modeling-and-kronecker/ 主动噪声控制 | 7.0/10 Syncspeech: Efficient and Low-Latency Text-to-Speech Based on Temporal Masked Transformer https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-syncspeech-efficient-and-low-latency-text-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-syncspeech-efficient-and-low-latency-text-to/ 语音合成 | 7.5/10 SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synparaspeech-automated-synthesis-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synparaspeech-automated-synthesis-of/ 语音合成 | 7.5/10 Synthcloner: Synthesizer-Style Audio Transfer via Factorized Codec with ADSR Envelope Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthcloner-synthesizer-style-audio-transfer-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthcloner-synthesizer-style-audio-transfer-via/ 音频生成 | 8.5/10 Synthesized Data Selection via Score Distribution Matching for Te Reo Māori Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/ 语音识别 | 8.0/10 Synthetic Data Domain Adaptation for ASR via LLM-Based Text and Phonetic Respelling Augmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-data-domain-adaptation-for-asr-via-llm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-data-domain-adaptation-for-asr-via-llm/ 语音识别 | 8.0/10 Synthetic yet Striking? Assessing Vocal Charisma in TTS via Perceptual and Algorithmic Measures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-yet-striking-assessing-vocal-charisma/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-yet-striking-assessing-vocal-charisma/ 语音合成 | 7.5/10 T-Cache: Fast Inference For Masked Generative Transformer-Based TTS Via Prompt-Aware Feature Caching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-cache-fast-inference-for-masked-generative/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-cache-fast-inference-for-masked-generative/ 语音合成 | 9.0/10 T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Phone TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-mimi-a-transformer-based-mimi-decoder-for-real/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-t-mimi-a-transformer-based-mimi-decoder-for-real/ 语音合成 | 7.0/10 TAG: Structured Temporal Audio Generation via LLM-Guided Manual Scription and Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tag-structured-temporal-audio-generation-via-llm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tag-structured-temporal-audio-generation-via-llm/ 音频生成 | 7.5/10 TAGARELA - A Portuguese Speech Dataset from Podcasts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/ 语音识别语音合成 | 7.0/10 Taming Audio VAEs via Target-KL Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-taming-audio-vaes-via-target-kl-regularization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-taming-audio-vaes-via-target-kl-regularization/ 音频生成 | 6.5/10 Target Speaker Anonymization in Multi-Speaker Recordings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-target-speaker-anonymization-in-multi-speaker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-target-speaker-anonymization-in-multi-speaker/ 语音匿名化 | 7.6/10 Target-Speaker LLM-ASR with Speaker-Aware Speech Encoder https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-target-speaker-llm-asr-with-speaker-aware-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-target-speaker-llm-asr-with-speaker-aware-speech/ 语音识别 | 8.8/10 Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/ 语音合成 | 7.0/10 Task-Oriented Sound Privacy Preservation for Sound Event Detection Via End-to-End Adversarial Multi-Task Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-oriented-sound-privacy-preservation-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-oriented-sound-privacy-preservation-for/ 音频事件检测 | 7.5/10 TASU: Text-only Alignment for Speech Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tasu-text-only-alignment-for-speech-understanding/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tasu-text-only-alignment-for-speech-understanding/ 语音识别 | 7.0/10 TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tau-a-benchmark-for-cultural-sound-understanding/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tau-a-benchmark-for-cultural-sound-understanding/ 音频问答 | 7.5/10 Teacher-Guided Pseudo Supervision and Cross-Modal Alignment for Audio-Visual Video Parsing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teacher-guided-pseudo-supervision-and-cross-modal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teacher-guided-pseudo-supervision-and-cross-modal/ 音视频 | 7.0/10 Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-Wise Distillation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-audio-models-to-reason-a-unified/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-audio-models-to-reason-a-unified/ 音频问答 | 7.0/10 Teaching the Teachers: Boosting Unsupervised Domain Adaptation In Speech Recognition By Ensemble Update https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-the-teachers-boosting-unsupervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-teaching-the-teachers-boosting-unsupervised/ 语音识别 | 7.0/10 Temporal Distillation for Music Representation Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-distillation-for-music-representation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-distillation-for-music-representation/ 音乐信息检索 | 7.5/10 Temporal Graph Modeling for Speech Emotion Recognition Using LSTM-Aggregated Multigraph Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-graph-modeling-for-speech-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-graph-modeling-for-speech-emotion/ 语音情感识别 | 7.5/10 Temporal-Spatial Decouple Before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-spatial-decouple-before-act-disentangled/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporal-spatial-decouple-before-act-disentangled/ 情感分析 | 7.5/10 Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic Event Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporally-heterogeneous-graph-contrastive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-temporally-heterogeneous-graph-contrastive/ 音频事件检测 | 8.5/10 Test Time Adaptation for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-adaptation-for-speech-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-adaptation-for-speech-emotion/ 语音情感识别 | 7.0/10 Test-Time Scaling for Auditory Cognition in Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-scaling-for-auditory-cognition-in-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-scaling-for-auditory-cognition-in-audio/ 音频问答 | 7.0/10 Testing The Efficient Coding Hypothesis Beyond Humans: The Auditory Kernels of Bat Vocalizations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-testing-the-efficient-coding-hypothesis-beyond/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-testing-the-efficient-coding-hypothesis-beyond/ 生物声学 | 7.5/10 Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2midi-inferalign-improving-symbolic-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2midi-inferalign-improving-symbolic-music/ 音乐生成 | 7.5/10 Text2Move: Text-To-Moving Sound Generation via Trajectory Prediction and Temporal Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2move-text-to-moving-sound-generation-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-text2move-text-to-moving-sound-generation-via/ 空间音频 | 8.0/10 TextlessRAG: End-to-End Visual Document RAG by Speech without Text https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-textlessrag-end-to-end-visual-document-rag-by/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-textlessrag-end-to-end-visual-document-rag-by/ 语音问答 | 8.5/10 The 3rd Clarity Prediction Challenge: A Machine Learning Challenge for Hearing aid Speech Intelligibility Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-3rd-clarity-prediction-challenge-a-machine/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-3rd-clarity-prediction-challenge-a-machine/ 语音增强 | 7.5/10 The Curious Case of Visual Grounding: Different Effects for Speech-and Text-Based Language Encoders https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-curious-case-of-visual-grounding-different/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-curious-case-of-visual-grounding-different/ 模型评估 | 8.0/10 The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-impact-of-audio-watermarking-on-audio-anti/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-impact-of-audio-watermarking-on-audio-anti/ 音频深度伪造检测 | 8.5/10 The Muse Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-muse-benchmark-probing-music-perception-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-muse-benchmark-probing-music-perception-and/ 音乐理解 | 8.5/10 The Role of Prosodic and Lexical Cues in Turn-Taking with Self-Supervised Speech Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-role-of-prosodic-and-lexical-cues-in-turn/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-role-of-prosodic-and-lexical-cues-in-turn/ 语音对话系统 | 7.5/10 The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion to Singing Style Conversion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-singing-voice-conversion-challenge-2025-from/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-singing-voice-conversion-challenge-2025-from/ 歌唱语音转换 | 7.0/10 The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-structured-output-benchmark-a-multi-source/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-structured-output-benchmark-a-multi-source/ 基准测试 | 7.0/10 The Synergistic Role of Audio and Large Video-Language Model in Source-Free Video Domain Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-synergistic-role-of-audio-and-large-video/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-synergistic-role-of-audio-and-large-video/ 领域适应 | 7.0/10 Theory and Application of Circular Relative Harmonic Coefficients https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-theory-and-application-of-circular-relative/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-theory-and-application-of-circular-relative/ 声源定位 | 7.5/10 Thinking While Listening: Simple Test Time Scaling for Audio Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-thinking-while-listening-simple-test-time-scaling/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-thinking-while-listening-simple-test-time-scaling/ 音频分类 | 6.5/10 Three Seconds is Sufficient: A Multi-Pronged Framework for Model-Based Speaker Adaptation in ASR Under Data-Scarce Conditions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/ 语音识别 | 7.0/10 TICL: Text-Embedding KNN for Speech in-Context Learning Unlocks Speech Recognition Abilities of Large Multimodal Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ticl-text-embedding-knn-for-speech-in-context/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ticl-text-embedding-knn-for-speech-in-context/ 语音识别 | 7.5/10 Timbre-Aware Audio Difference Captioning for Anomalous Machine Sounds without Paired Training Data via Synthetic Perturbations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-aware-audio-difference-captioning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-aware-audio-difference-captioning-for/ 音频分类 | 7.5/10 Timbre-Based Pretraining with Pseudo-Labels for Multi-Instrument Automatic Music Transcription https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-based-pretraining-with-pseudo-labels-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-timbre-based-pretraining-with-pseudo-labels-for/ 音乐信息检索 | 7.0/10 Time vs. Layer: Locating Predictive Cues for Dysarthric Speech Descriptors in Wav2vec 2.0 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-time-vs-layer-locating-predictive-cues-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-time-vs-layer-locating-predictive-cues-for/ 语音质量评估 | 7.5/10 Time-Domain Synthesis of Virtual Sound Source Within Personalized Sound Zone using a Linear Loudspeaker Array https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-time-domain-synthesis-of-virtual-sound-source/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-time-domain-synthesis-of-virtual-sound-source/ 空间音频 | 8.0/10 Time-Shifted Token Scheduling for Symbolic Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-time-shifted-token-scheduling-for-symbolic-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-time-shifted-token-scheduling-for-symbolic-music/ 音乐生成 | 8.5/10 TinyMU: A Compact Audio-Language Model for Music Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tinymu-a-compact-audio-language-model-for-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tinymu-a-compact-audio-language-model-for-music/ 音乐理解 | 7.5/10 Tldiffgan: A Latent Diffusion-Gan Framework with Temporal Information Fusion for Anomalous Sound Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tldiffgan-a-latent-diffusion-gan-framework-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tldiffgan-a-latent-diffusion-gan-framework-with/ 音频事件检测 | 7.5/10 TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/ 语音合成 | 7.5/10 Tokenchain: A Discrete Speech Chain via Semantic Token Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tokenchain-a-discrete-speech-chain-via-semantic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tokenchain-a-discrete-speech-chain-via-semantic/ 语音识别 | 7.0/10 Toward Faithful Explanations in Acoustic Anomaly Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-toward-faithful-explanations-in-acoustic-anomaly/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-toward-faithful-explanations-in-acoustic-anomaly/ 音频事件检测 | 7.5/10 Toward Robust And Efficient Beat Tracking Via Beat-Aware Attention https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-toward-robust-and-efficient-beat-tracking-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-toward-robust-and-efficient-beat-tracking-via/ 音乐理解 | 8.5/10 Towards Blind Data Cleaning: A Case Study in Music Source Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-blind-data-cleaning-a-case-study-in-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-blind-data-cleaning-a-case-study-in-music/ 音乐信息检索 | 7.0/10 Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/ 语音识别 | 6.5/10 Towards Data Drift Monitoring for Speech Deepfake Detection in the Context of MLOps https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-data-drift-monitoring-for-speech-deepfake/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-data-drift-monitoring-for-speech-deepfake/ 音频深度伪造检测 | 7.0/10 Towards Distance-Aware Synthetic Audio Mixtures for Universal Sound Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-distance-aware-synthetic-audio-mixtures/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-distance-aware-synthetic-audio-mixtures/ 语音分离 | 6.5/10 Towards Effective Negation Modeling in Joint Audio-Text Models for Music https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-effective-negation-modeling-in-joint/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-effective-negation-modeling-in-joint/ 音乐理解 | 7.5/10 Towards Evaluating Generative Audio: Insights from Neural Audio Codec Embedding Distances https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-evaluating-generative-audio-insights-from/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-evaluating-generative-audio-insights-from/ 模型评估 | 6.5/10 Towards Fair ASR for Second Language Speakers using Fairness Prompted Finetuning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-fair-asr-for-second-language-speakers/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-fair-asr-for-second-language-speakers/ 语音识别 | 6.5/10 Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-lightweight-adaptation-of-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-lightweight-adaptation-of-speech/ 语音增强 | 8.5/10 Towards Multi-View Hierarchical Video-to-Piano Generation with MIDI Guidance https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-multi-view-hierarchical-video-to-piano/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-multi-view-hierarchical-video-to-piano/ 音乐生成 | 7.0/10 Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-orthographically-informed-evaluation-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-orthographically-informed-evaluation-of/ 语音识别 | 7.0/10 Towards Real-Time Generative Speech Restoration with Flow-Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-real-time-generative-speech-restoration/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-real-time-generative-speech-restoration/ 语音增强 | 6.0/10 Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-robust-dysarthric-speech-recognition-llm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-robust-dysarthric-speech-recognition-llm/ 语音识别 | 9.0/10 Tpeformer: Temporal Patch Embedding Transformer https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tpeformer-temporal-patch-embedding-transformer/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tpeformer-temporal-patch-embedding-transformer/ 语音情感识别 | 7.5/10 Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-train-short-infer-long-speech-llm-enables-zero/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-train-short-infer-long-speech-llm-enables-zero/ 说话人分离 | 9.0/10 Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-dynamics-aware-multi-factor-curriculum/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-dynamics-aware-multi-factor-curriculum/ 语音分离 | 7.0/10 Training Flow Matching Models with Reliable Labels via Self-Purification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-flow-matching-models-with-reliable/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-flow-matching-models-with-reliable/ 语音合成 | 7.5/10 Training-Free Inference-Time Scaling for Audio Source Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-free-inference-time-scaling-for-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-free-inference-time-scaling-for-audio/ 语音增强 | 7.5/10 Training-Free Multimodal Guidance for Video to Audio Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-free-multimodal-guidance-for-video-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-training-free-multimodal-guidance-for-video-to/ 音频生成 | 8.0/10 Transfer Learning for Paediatric Sleep Apnoea Detection using Physiology-Guided Acoustic Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/ 音频分类 | 7.0/10 Transferable Audio Lottery Tickets: Gradient Accumulation for Extreme Sparsity https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transferable-audio-lottery-tickets-gradient/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transferable-audio-lottery-tickets-gradient/ 音频分类 | 7.0/10 Tri-Attention Fusion: Joint Temporal-Spectral and Bidirectional Modeling for Speech Spoofing Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tri-attention-fusion-joint-temporal-spectral-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tri-attention-fusion-joint-temporal-spectral-and/ 语音伪造检测 | 7.0/10 Triad: Tri-Head with Auxiliary Duplicating Permutation Invariant Training for Multi-Task Sound Event Localization and Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triad-tri-head-with-auxiliary-duplicating/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triad-tri-head-with-auxiliary-duplicating/ 音频事件检测 | 7.5/10 Triage Knowledge Distillation for Speaker Verification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triage-knowledge-distillation-for-speaker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triage-knowledge-distillation-for-speaker/ 说话人验证 | 7.5/10 TTA: Transcribe, Translate and Alignment for Cross-Lingual Speech Representation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/ 语音识别 | 7.5/10 TVP-UNet: Threshold Variance Penalty U-Net for Voice Activity Detection in Dysarthric Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tvp-unet-threshold-variance-penalty-u-net-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tvp-unet-threshold-variance-penalty-u-net-for/ 语音活动检测 | 7.0/10 Two-Stage Language Model Framework for Acoustic Echo Cancellation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-two-stage-language-model-framework-for-acoustic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-two-stage-language-model-framework-for-acoustic/ 语音增强 | 7.5/10 UJCodec: An End-to-end Unet-Style Codec for Joint Speech Compression and Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ujcodec-an-end-to-end-unet-style-codec-for-joint/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ujcodec-an-end-to-end-unet-style-codec-for-joint/ 语音增强 | 7.5/10 UMA-SPLIT: Unimodal Aggregation for Both English and Mandarin Non-Autoregressive Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uma-split-unimodal-aggregation-for-both-english/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uma-split-unimodal-aggregation-for-both-english/ 语音识别 | 7.5/10 UMV: A Mixture-Of-Experts Vision Transformer with Multi-Spectrogram Fusion for Underwater Ship Noise Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-umv-a-mixture-of-experts-vision-transformer-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-umv-a-mixture-of-experts-vision-transformer-with/ 音频分类 | 7.5/10 Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uncertainty-aware-3d-emotional-talking-face/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uncertainty-aware-3d-emotional-talking-face/ 音视频 | 8.0/10 Understanding Textual Capability Degradation in Speech LLMS via Parameter Importance Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-understanding-textual-capability-degradation-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-understanding-textual-capability-degradation-in/ 语音问答 | 7.5/10 Understanding the Strengths and Weaknesses of SSL Models for Audio Deepfake Model Attribution https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-understanding-the-strengths-and-weaknesses-of-ssl/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-understanding-the-strengths-and-weaknesses-of-ssl/ 音频深度伪造检测 | 7.0/10 UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unet-based-fusion-and-exponential-moving-average/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unet-based-fusion-and-exponential-moving-average/ 说话人验证 | 7.5/10 Universr: Unified and Versatile Audio Super-Resolution Via Vocoder-Free Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-universr-unified-and-versatile-audio-super/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-universr-unified-and-versatile-audio-super/ 音频超分辨率 | 8.0/10 UNMIXX: Untangling Highly Correlated Singing Voices Mixtures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unmixx-untangling-highly-correlated-singing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unmixx-untangling-highly-correlated-singing/ 语音分离 | 8.5/10 Unrequited Emotions: Investigating the Gaps in Motivation and Practice in Speech Emotion Recognition Research https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unrequited-emotions-investigating-the-gaps-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unrequited-emotions-investigating-the-gaps-in/ 语音情感识别 | 8.0/10 Unseen but Not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unseen-but-not-unknown-using-dataset-concealment/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unseen-but-not-unknown-using-dataset-concealment/ 语音质量评估 | 8.3/10 Unsupervised Discovery and Analysis of the Vocal Repertoires and Patterns of Select Corvid Species https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-discovery-and-analysis-of-the-vocal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-discovery-and-analysis-of-the-vocal/ 生物声学 | 7.5/10 Unsupervised Lexicon Learning from Speech is Limited by Representations Rather than Clustering https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-lexicon-learning-from-speech-is/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-lexicon-learning-from-speech-is/ 语音发现 | 8.0/10 USVexplorer: Robust Detection of Ultrasonic Vocalizations with Cross Species Generalization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-usvexplorer-robust-detection-of-ultrasonic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-usvexplorer-robust-detection-of-ultrasonic/ 音频事件检测 | 8.0/10 UTI-LLM: A Personalized Articulatory-Speech Therapy Assistance System Based on Multimodal Large Language Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uti-llm-a-personalized-articulatory-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uti-llm-a-personalized-articulatory-speech/ 语音对话系统 | 7.5/10 Utilizing Information Theoretic Approach to Study Cochlear Neural Degeneration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-utilizing-information-theoretic-approach-to-study/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-utilizing-information-theoretic-approach-to-study/ 生物声学 | 6.5/10 UVT-LM: Unifying Visual and Tactile Perception with Language Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uvt-lm-unifying-visual-and-tactile-perception/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-uvt-lm-unifying-visual-and-tactile-perception/ 跨模态 | 7.0/10 V2A-DPO: Omni-Preference Optimization for Video-To-Audio Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-v2a-dpo-omni-preference-optimization-for-video-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-v2a-dpo-omni-preference-optimization-for-video-to/ 视频到音频生成 | 7.5/10 Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/ 语音识别 | 7.5/10 VBx for End-to-End Neural and Clustering-Based Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vbx-for-end-to-end-neural-and-clustering-based/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vbx-for-end-to-end-neural-and-clustering-based/ 说话人分离 | 8.5/10 VChangeCodec: An Ultra Low-Complexity Neural Speech Codec with Built-In Voice Changer for Customized Real-Time Communication https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vchangecodec-an-ultra-low-complexity-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vchangecodec-an-ultra-low-complexity-neural/ 语音转换语音增强 | 8.0/10 Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-via-score-to-performance-efficient-human/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-via-score-to-performance-efficient-human/ 音乐生成 | 7.5/10 Vib2Sound: Separation Of Multimodal Sound Sources https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vib2sound-separation-of-multimodal-sound-sources/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vib2sound-separation-of-multimodal-sound-sources/ 语音分离 | 6.5/10 Vioptt: Violin Technique-Aware Transcription from Synthetic Data Augmentation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vioptt-violin-technique-aware-transcription-from/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vioptt-violin-technique-aware-transcription-from/ 音乐信息检索 | 6.5/10 Virtual Consistency for Audio Editing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-virtual-consistency-for-audio-editing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-virtual-consistency-for-audio-editing/ 音乐生成 | 8.0/10 Visual Keys to Symphonies: Latent Diffusion for Multi-Scene Video-to-Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-visual-keys-to-symphonies-latent-diffusion-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-visual-keys-to-symphonies-latent-diffusion-for/ 音乐生成 | 7.5/10 ViTex: Visual Texture Control for Multi-Track Symbolic Music Generation via Discrete Diffusion Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vitex-visual-texture-control-for-multi-track/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vitex-visual-texture-control-for-multi-track/ 音乐生成 | 7.0/10 VividTalker: A Modular Framework for Expressive 3D Talking Avatars with Controllable Gaze and Blink https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vividtalker-a-modular-framework-for-expressive-3d/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vividtalker-a-modular-framework-for-expressive-3d/ 语音合成 | 7.5/10 VM-UNSSOR: Unsupervised Neural Speech Separation Enhanced by Higher-SNR Virtual Microphone Arrays https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vm-unssor-unsupervised-neural-speech-separation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vm-unssor-unsupervised-neural-speech-separation/ 语音分离 | 7.5/10 VMSP: Video-to-Music Generation with Two-Stage Alignment and Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vmsp-video-to-music-generation-with-two-stage/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vmsp-video-to-music-generation-with-two-stage/ 音乐生成 | 7.0/10 Vocalnet-M2: Advancing Low-Latency Spoken Language Modeling via Integrated Multi-Codebook Tokenization and Multi-Token Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vocalnet-m2-advancing-low-latency-spoken-language/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vocalnet-m2-advancing-low-latency-spoken-language/ 语音对话系统 | 7.5/10 Voting-Based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voting-based-pitch-estimation-with-temporal-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voting-based-pitch-estimation-with-temporal-and/ 语音识别 | 8.0/10 VoxMorph: Scalable Zero-Shot Voice Identity Morphing via Disentangled Embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxmorph-scalable-zero-shot-voice-identity/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxmorph-scalable-zero-shot-voice-identity/ 语音克隆 | 9.0/10 VoXtream: Full-Stream Text-To-Speech With Extremely Low Latency https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxtream-full-stream-text-to-speech-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voxtream-full-stream-text-to-speech-with/ 语音合成 | 8.5/10 VT-Heads: Voice Cloning and Talking Head Generation from Text Based on V-DiT https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vt-heads-voice-cloning-and-talking-head/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-vt-heads-voice-cloning-and-talking-head/ 视频生成 | 6.5/10 Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-walking-through-uncertainty-an-empirical-study-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-walking-through-uncertainty-an-empirical-study-of/ 音频问答 | 7.5/10 WAV2LEV: Predicting Levenshtein Edit Operation Sequences For Fine-Grained Estimation of Automatic Speech Recognition Error https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wav2lev-predicting-levenshtein-edit-operation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wav2lev-predicting-levenshtein-edit-operation/ 语音识别 | 7.5/10 Wave-Trainer-Fit: Neural Vocoder With Trainable Prior And Fixed-Point Iteration Towards High-Quality Speech Generation From SSL Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wave-trainer-fit-neural-vocoder-with-trainable/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wave-trainer-fit-neural-vocoder-with-trainable/ 语音合成 | 7.0/10 Wavenext 2: Convnext-Based Fast Neural Vocoders with Residual Denoising and Sub-Modeling for Gan And Diffusion Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavenext-2-convnext-based-fast-neural-vocoders/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavenext-2-convnext-based-fast-neural-vocoders/ 语音合成 | 9.0/10 WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavesp-net-learnable-wavelet-domain-sparse-prompt/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavesp-net-learnable-wavelet-domain-sparse-prompt/ 语音伪造检测 | 8.0/10 WaveSpikeNet: A Wavelet-Spiking Fusion Architecture for Audio Classification on Edge Devices https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavespikenet-a-wavelet-spiking-fusion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavespikenet-a-wavelet-spiking-fusion/ 音频分类 | 7.5/10 WavLink: Compact Audio–Text Embeddings with a Global Whisper Token https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavlink-compact-audiotext-embeddings-with-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavlink-compact-audiotext-embeddings-with-a/ 音频检索 | 8.0/10 What the student learns in knowledge distillation: A subspace view and evidence on Convolutional Recurrent Network https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-what-the-student-learns-in-knowledge-distillation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-what-the-student-learns-in-knowledge-distillation/ 语音增强 | 6.5/10 When Audio Matters: A Lightweight, Hierarchical Fusion Model for Speech and Non-Verbal Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-audio-matters-a-lightweight-hierarchical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-audio-matters-a-lightweight-hierarchical/ 语音情感识别 | 8.0/10 When Children Talk and Machines Listen: Toward an Interpretable Speech-Based Screener for Dutch Developmental Language Disorder https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-children-talk-and-machines-listen-toward-an/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-children-talk-and-machines-listen-toward-an/ 语音生物标志物 | 7.0/10 When Noise Lowers the Loss: Rethinking Likelihood-Based Evaluation in Music Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-noise-lowers-the-loss-rethinking-likelihood/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-noise-lowers-the-loss-rethinking-likelihood/ 音乐生成 | 7.0/10 When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-silence-matters-the-impact-of-irrelevant/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-silence-matters-the-impact-of-irrelevant/ 模型评估 | 7.0/10 When Voice Matters: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-voice-matters-a-controlled-study-of-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-voice-matters-a-controlled-study-of-audio/ 模型评估 | 7.0/10 Whisper-FEST: Single-Channel Far-Field Enhanced Speech-to-text without Parallel Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-fest-single-channel-far-field-enhanced/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-fest-single-channel-far-field-enhanced/ 语音识别 | 7.5/10 Whisper-MLA: Reducing GPU Memory Consumption of ASR Models Based on MHA2MLA Conversion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-mla-reducing-gpu-memory-consumption-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-mla-reducing-gpu-memory-consumption-of/ 语音识别 | 7.0/10 Whisper-QF: Leveraging Dual Cross-Attention Q-Former for Speech Emotion Recognition With Multi-Task Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-qf-leveraging-dual-cross-attention-q/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-qf-leveraging-dual-cross-attention-q/ 语音情感识别 | 7.5/10 Whisper: Courtside Edition - Enhancing ASR Performance through LLM-Driven Context Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-courtside-edition-enhancing-asr/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisper-courtside-edition-enhancing-asr/ 语音识别 | 6.5/10 WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/ 语音识别 | 6.5/10 Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-why-do-speech-language-models-fail-to-generate/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-why-do-speech-language-models-fail-to-generate/ 语音生成 | 7.0/10 Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-Resource Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/ 语音识别 | 6.5/10 Z-Scores: A Metric for Linguistically Assessing Disfluency Removal https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-z-scores-a-metric-for-linguistically-assessing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-z-scores-a-metric-for-linguistically-assessing/ 模型评估 | 6.5/10 ZK-VSA: Zero-Knowledge Verifiable Speaker Anonymization Leveraging Phase Vocoder with Time-Scale Modification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-zk-vsa-zero-knowledge-verifiable-speaker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-zk-vsa-zero-knowledge-verifiable-speaker/ 语音匿名化 | 7.5/10 ZSV2C-MLLM: Zero-Shot Visual Voice Cloning Via Multimodal Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-zsv2c-mllm-zero-shot-visual-voice-cloning-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-zsv2c-mllm-zero-shot-visual-voice-cloning-via/ 语音克隆 | 6.5/10 β-AVSDNET: A Novel End-To-End Neural Network Architecture For Audio-Visual Speaker Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-avsdnet-a-novel-end-to-end-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-avsdnet-a-novel-end-to-end-neural-network/ 说话人分离 | 7.5/10 语音/音频论文速递 2026-04-29 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29/ 共分析 29 篇语音/AI 论文 A Functorial Formulation of Neighborhood Aggregating Deep Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-a-functorial-formulation-of-neighborhood/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-a-functorial-formulation-of-neighborhood/ 理论分析 | 6.5/10 All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-all-that-glitters-is-not-audio-rethinking-text/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-all-that-glitters-is-not-audio-rethinking-text/ 音频问答 | 6.5/10 An event-based sequence modeling approach to recognizing non-triad chords with oversegmentation minimization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-an-event-based-sequence-modeling-approach-to/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-an-event-based-sequence-modeling-approach-to/ 音乐理解 | 7.5/10 CineAGI: Character-Consistent Movie Creation through LLM-Orchestrated Multi-Modal Generation and Cross-Scene Integration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-cineagi-character-consistent-movie-creation/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-cineagi-character-consistent-movie-creation/ 跨模态 | 8.0/10 Come Together: Analyzing Popular Songs Through Statistical Embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-come-together-analyzing-popular-songs-through/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-come-together-analyzing-popular-songs-through/ 音乐信息检索 | 6.5/10 Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-comparison-of-semg-encoding-accuracy-across/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-comparison-of-semg-encoding-accuracy-across/ 语音生物标志物 | 8.0/10 Explainable AI in Speaker Recognition -- Making Latent Representations Understandable https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-explainable-ai-in-speaker-recognition-making/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-explainable-ai-in-speaker-recognition-making/ 说话人识别 | 7.5/10 Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-hallo-live-real-time-streaming-joint-audio-video/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-hallo-live-real-time-streaming-joint-audio-video/ 音视频 | 8.5/10 HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-headrouter-dynamic-head-weight-routing-for-task/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-headrouter-dynamic-head-weight-routing-for-task/ 音频大模型 | 8.0/10 Latent-Hysteresis Graph ODEs: Modeling Coupled Topology-Feature Evolution via Continuous Phase Transitions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-latent-hysteresis-graph-odes-modeling-coupled/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-latent-hysteresis-graph-odes-modeling-coupled/ 图神经网络 | 8.0/10 Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-listening-with-time-precise-temporal-awareness/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-listening-with-time-precise-temporal-awareness/ 音频场景理解 | 8.0/10 MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-magic-tts-fine-grained-controllable-speech/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-magic-tts-fine-grained-controllable-speech/ 语音合成 | 7.0/10 Meta-Ensemble Learning with Diverse Data Splits for Improved Respiratory Sound Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-meta-ensemble-learning-with-diverse-data-splits/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-meta-ensemble-learning-with-diverse-data-splits/ 音频分类 | 8.0/10 Opening the Design Space: Two Years of Performance with Intelligent Musical Instruments https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-opening-the-design-space-two-years-of-performance/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-opening-the-design-space-two-years-of-performance/ 音乐生成 | 6.5/10 Predictive Directional Selective Fixed-Filter Active Noise Control for Moving Sources via a Convolutional Recurrent Neural Network https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-predictive-directional-selective-fixed-filter/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-predictive-directional-selective-fixed-filter/ 声源定位 | 7.5/10 Psychologically-Grounded Graph Modeling for Interpretable Depression Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-psychologically-grounded-graph-modeling-for/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-psychologically-grounded-graph-modeling-for/ 语音情感识别 | 8.0/10 RAS: a Reliability Oriented Metric for Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-ras-a-reliability-oriented-metric-for-automatic/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-ras-a-reliability-oriented-metric-for-automatic/ 语音识别 | 7.5/10 Robust Audio-Text Retrieval via Cross-Modal Attention and Hybrid Loss https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-robust-audio-text-retrieval-via-cross-modal/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-robust-audio-text-retrieval-via-cross-modal/ 音频检索 | 7.5/10 RTCFake: Speech Deepfake Detection in Real-Time Communication https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-rtcfake-speech-deepfake-detection-in-real-time/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-rtcfake-speech-deepfake-detection-in-real-time/ 语音伪造检测 | 7.0/10 Scaling Properties of Continuous Diffusion Spoken Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-scaling-properties-of-continuous-diffusion-spoken/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-scaling-properties-of-continuous-diffusion-spoken/ 语音生成 | 8.0/10 Spectro-Temporal Modulation Representation Framework for Human-Imitated Speech Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-spectro-temporal-modulation-representation/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-spectro-temporal-modulation-representation/ 语音伪造检测 | 6.5/10 Speech Enhancement Based on Drifting Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-speech-enhancement-based-on-drifting-models/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-speech-enhancement-based-on-drifting-models/ 语音增强 | 7.5/10 Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-talker-t2av-joint-talking-audio-video-generation/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-talker-t2av-joint-talking-audio-video-generation/ 语音合成 | 7.5/10 TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-tts-prism-a-perceptual-reasoning-and/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-tts-prism-a-perceptual-reasoning-and/ 语音合成评估 | 7.0/10 语音/音频论文速递 2026-04-28 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28/ 共分析 24 篇语音/AI 论文 Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-advancing-automatic-speech-recognition-using/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-advancing-automatic-speech-recognition-using/ 语音识别 | 7.0/10 Audio Effect Estimation with DNN-Based Prediction and Search Algorithm https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-audio-effect-estimation-with-dnn-based-prediction/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-audio-effect-estimation-with-dnn-based-prediction/ 音乐理解 | 8.0/10 Audio Video Verbal Analysis (AVVA) for Capturing Classroom Dialogues https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-audio-video-verbal-analysis-avva-for-capturing/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-audio-video-verbal-analysis-avva-for-capturing/ 音频问答 | 6.0/10 Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-beyond-acoustic-sparsity-and-linguistic-bias-a/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-beyond-acoustic-sparsity-and-linguistic-bias-a/ 发音错误检测 | 8.5/10 DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-dm-asr-diarization-aware-multi-speaker-asr-with/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-dm-asr-diarization-aware-multi-speaker-asr-with/ 说话人识别 | 8.0/10 Earable Platform with Integrated Simultaneous EEG Sensing and Auditory Stimulation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-earable-platform-with-integrated-simultaneous-eeg/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-earable-platform-with-integrated-simultaneous-eeg/ 音频事件检测 | 5.5/10 Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-full-duplex-interaction-in-spoken-dialogue/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-full-duplex-interaction-in-spoken-dialogue/ 语音对话系统 | 6.5/10 Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-identifying-and-typifying-demographic-unfairness/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-identifying-and-typifying-demographic-unfairness/ 语音识别 | 7.0/10 Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-listening-with-time-precise-temporal-awareness/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-listening-with-time-precise-temporal-awareness/ 音频场景理解 | 8.0/10 Spectrographic Portamento Gradient Analysis: A Quantitative Method for Historical Cello Recordings with Application to Beethoven's Piano and Cello Sonatas, 1930--2012 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-spectrographic-portamento-gradient-analysis-a/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-spectrographic-portamento-gradient-analysis-a/ 音乐信息检索 | 7.5/10 Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-transformer-based-rhythm-quantization-of/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-transformer-based-rhythm-quantization-of/ 音乐信息检索 | 8.0/10 TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-tts-prism-a-perceptual-reasoning-and/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-tts-prism-a-perceptual-reasoning-and/ 语音质量评估 | 7.5/10 UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-unisonate-a-unified-model-for-speech-music-and/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-unisonate-a-unified-model-for-speech-music-and/ 音频生成 | 8.5/10 语音/音频论文速递 2026-04-27 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27/ 共分析 13 篇语音/AI 论文 MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25-magic-tts-fine-grained-controllable-speech/ Sat, 25 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25-magic-tts-fine-grained-controllable-speech/ 语音合成 | 7.5/10 MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25-momo-a-framework-for-seamless-physical-verbal-and/ Sat, 25 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25-momo-a-framework-for-seamless-physical-verbal-and/ 机器人技能学习 | 7.5/10 语音/音频论文速递 2026-04-25 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25/ Sat, 25 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-25/ 共分析 2 篇语音/AI 论文 "This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-this-wasnt-made-for-me-recentering-user/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-this-wasnt-made-for-me-recentering-user/ 语音识别 | 7.0/10 ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-atrie-adaptive-tuning-for-robust-inference-and/ 语音合成 | 7.0/10 AUDITA: A New Dataset to Audit Humans vs. AI Skill at Audio QA https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-audita-a-new-dataset-to-audit-humans-vs-ai-skill/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-audita-a-new-dataset-to-audit-humans-vs-ai-skill/ 音频问答 | 6.5/10 Beyond Rules: Towards Basso Continuo Personal Style Identification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-beyond-rules-towards-basso-continuo-personal/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-beyond-rules-towards-basso-continuo-personal/ 音乐理解 | 7.0/10 DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-diarizen-explained-a-tutorial-for-the-open-source/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-diarizen-explained-a-tutorial-for-the-open-source/ 说话人分离 | 6.5/10 Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-dilated-cnns-for-periodic-signal-processing-a-low/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-dilated-cnns-for-periodic-signal-processing-a-low/ 语音增强 | 6.5/10 Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-do-llm-decoders-listen-fairly-benchmarking-how/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-do-llm-decoders-listen-fairly-benchmarking-how/ 语音识别 | 7.5/10 Evaluation of Automatic Speech Recognition Using Generative Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-evaluation-of-automatic-speech-recognition-using/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-evaluation-of-automatic-speech-recognition-using/ 语音识别 | 7.5/10 Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-full-duplex-interaction-in-spoken-dialogue/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-full-duplex-interaction-in-spoken-dialogue/ 语音对话系统 | 6.5/10 Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-hierarchical-policy-optimization-for-simultaneous/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-hierarchical-policy-optimization-for-simultaneous/ 语音翻译 | 7.5/10 Low-Rank Adaptation Redux for Large Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-low-rank-adaptation-redux-for-large-models/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-low-rank-adaptation-redux-for-large-models/ 大语言模型 | 5.5/10 MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-magic-tts-fine-grained-controllable-speech/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-magic-tts-fine-grained-controllable-speech/ 语音合成 | 7.5/10 Materialistic RIR: Material Conditioned Realistic RIR Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-materialistic-rir-material-conditioned-realistic/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-materialistic-rir-material-conditioned-realistic/ 音频生成 | 7.5/10 MER 2026: From Discriminative Emotion Recognition to Generative Emotion Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-mer-2026-from-discriminative-emotion-recognition/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-mer-2026-from-discriminative-emotion-recognition/ 语音情感识别 | 6.0/10 Misinformation Span Detection in Videos via Audio Transcripts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-misinformation-span-detection-in-videos-via-audio/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-misinformation-span-detection-in-videos-via-audio/ 音频安全 | 7.5/10 Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-phonological-subspace-collapse-is-aetiology/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-phonological-subspace-collapse-is-aetiology/ Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-preferences-of-a-voice-first-nation-large-scale/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-preferences-of-a-voice-first-nation-large-scale/ 语音合成 | 7.5/10 Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-prosody-as-supervision-bridging-the-non-verbal/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-prosody-as-supervision-bridging-the-non-verbal/ 语音情感识别 | 8.0/10 Sema: Semantic Transport for Real-Time Multimodal Agents https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-sema-semantic-transport-for-real-time-multimodal/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-sema-semantic-transport-for-real-time-multimodal/ 实时处理 | 6.5/10 Time vs. Layer: Locating Predictive Cues for Dysarthric Speech Descriptors in wav2vec 2.0 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-time-vs-layer-locating-predictive-cues-for/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-time-vs-layer-locating-predictive-cues-for/ 语音生物标志物 | 7.0/10 Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-video-robin-autoregressive-diffusion-planning-for/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-video-robin-autoregressive-diffusion-planning-for/ 音乐生成 | 7.0/10 语音/音频论文速递 2026-04-24 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24/ 共分析 21 篇语音/AI 论文 Aligning Stuttered-Speech Research with End-User Needs: Scoping Review, Survey, and Guidelines https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-aligning-stuttered-speech-research-with-end-user/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-aligning-stuttered-speech-research-with-end-user/ 1. **问题**：当前口吃语音技术研究与口吃者（PWS）及言语语言病理学家（SLP）的实际需求存在系统性脱节，研究重点、任务定义和评估方法未能充分以用户为中心。 2. **方法核心**：通过两部分结合分析：1）对228篇相关论文进行范围综述，提出研究任务分类法并分析研究现状；2）对70名利益相 ATIR: Towards Audio-Text Interleaved Contextual Retrieval https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-atir-towards-audio-text-interleaved-contextual/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-atir-towards-audio-text-interleaved-contextual/ 这篇论文旨在解决现有音频-文本检索方法无法处理查询和文档中音频与文本交错出现（如多轮对话、混合输入）的局限性。为此，作者定义了音频-文本交错上下文检索（ATIR）任务，并构建了一个包含约8.8万对样本的大规模基准。为解决直接应用多模态大语言模型（MLLM）时音频token冗余导致的效率和精度问题，论 Before the Mic: Physical-Layer Voiceprint Anonymization with Acoustic Metamaterials https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-before-the-mic-physical-layer-voiceprint/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-before-the-mic-physical-layer-voiceprint/ 这篇论文针对在公共场景（如会议、演讲）中，不可信录音设备可能导致声纹泄露且事后无法补救的问题，提出了EchoMask——首个基于声学超材料的物理层实时声纹匿名化系统。其核心方法是在声音到达麦克风前，通过精心设计的被动声学结构对特定低频段（300-700Hz）进行选择性干扰，该频段对说话人识别至关重要 Centering Ecological Goals in Automated Identification of Individual Animals https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-centering-ecological-goals-in-automated/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-centering-ecological-goals-in-automated/ 这篇论文旨在解决一个关键问题：为什么近年来在动物个体自动识别（基于图像或声音）上报告的高准确率算法，却很少转化为生态学实践中的常规工具？其方法核心是提出一个“以生态目标为中心”的评估与部署框架，强调自动化识别的有用性取决于其服务的具体生态问题、可用数据以及错误类型带来的实际后果。与以往主要关注算法准 CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-cointeract-physically-consistent-human-object/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-cointeract-physically-consistent-human-object/ 1. **问题**：现有视频扩散模型在生成人机交互（HOI）视频时，常出现手/脸结构崩溃和人机物理穿透等问题，根源在于模型缺乏对3D空间关系和交互结构的理解。 2. **方法核心**：提出CoInteract框架，核心是“空间结构化协同生成”范式。在一个共享的DiT骨干中联合训练RGB外观流和辅助的 Deep Hierarchical Knowledge Loss for Fault Intensity Diagnosis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-deep-hierarchical-knowledge-loss-for-fault/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-deep-hierarchical-knowledge-loss-for-fault/ 1. **要解决什么问题**：传统故障强度诊断方法将各类故障视为独立标签，忽略了物理状态之间固有的层次依赖关系（如“空化”是“初期空化”、“稳定空化”等的父类），这限制了模型的性能和鲁棒性。 2. **方法核心是什么**：提出一个名为DHK的通用框架，其核心是设计两个新的损失函数：**层次树损失 Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-embedding-based-intrusive-evaluation-metrics-for/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-embedding-based-intrusive-evaluation-metrics-for/ 1. **问题**：音乐源分离（MSS）领域常用的客观评估指标（BSS-Eval）与人类感知评分相关性较低，导致模型评估不够准确。 2. **方法核心**：提出两种基于嵌入的侵入式评估指标：在预训练MERT模型的嵌入空间上计算目标与分离信号的均方误差（MSE_MERT）和一种逐曲目的Fréchet音 Enhancing ASR Performance in the Medical Domain for Dravidian Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/ 这篇论文旨在解决达罗毗荼语言（Telugu和Kannada）在医疗领域自动语音识别（ASR）中面临的标注数据稀缺和语言形态复杂两大挑战。其核心方法是提出一个“置信度感知训练框架”，该框架通过一个混合置信度评分机制（结合静态的感知、声学相似性、WER分数和动态的模型熵），对混合了真实与合成语音的训练数 Enhancing Speaker Verification with Whispered Speech via Post-Processing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-speaker-verification-with-whispered/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-speaker-verification-with-whispered/ 1. **问题**：耳语语音因缺乏声带振动，其声学特征与正常语音差异显著，导致现有的说话人验证系统性能严重下降。这在用户为保护隐私而低语、或因疾病无法正常发声等实际场景中构成挑战。 2. **方法核心**：在预训练的说话人验证骨干网络（ReDimNet-B6）之上，添加一个轻量级的编码器-解码器结构 Environmental Sound Deepfake Detection Using Deep-Learning Framework https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-environmental-sound-deepfake-detection-using-deep/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-environmental-sound-deepfake-detection-using-deep/ 1. **问题**：针对环境声音（包括声音场景和声音事件）的深度伪造检测（ESDD）任务，现有研究不足，且尚不清楚声音场景与声音事件的伪造检测是否需要不同模型。 2. **方法核心**：提出一个深度学习框架，核心是采用预训练的音频模型（BEATs）作为特征提取器，并结合一种三阶段训练策略（包含对 Explicit Dropout: Deterministic Regularization for Transformer Architectures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-explicit-dropout-deterministic-regularization-for/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-explicit-dropout-deterministic-regularization-for/ 这篇论文旨在解决传统Dropout方法依赖随机掩码、正则化效果不透明且难以精确控制的问题。其核心方法是提出一种确定性公式，将Dropout重新表述为一个可直接加入训练损失函数的显式正则化项，并推导出了适用于Transformer架构中注意力机制（Q、K、V）和前馈网络的正则化表达式。与已有方法相比， FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-fastturn-unifying-acoustic-and-streaming-semantic/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-fastturn-unifying-acoustic-and-streaming-semantic/ 这篇论文针对全双工语音对话系统中需要低延迟、高精度判断用户是否结束发言（轮次检测）的难题，提出了FastTurn统一框架。其核心方法是将流式CTC解码提供的快速部分语义信息，与Conformer编码器提取的声学特征，通过适配器输入给大语言模型（LLM）进行推理，并最终融合声学与语义特征进行轮次预测。 FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-flip-towards-understanding-and-interpreting/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-flip-towards-understanding-and-interpreting/ 这篇论文旨在解决对多语言、多模态句子嵌入（如SONAR, LaBSE）的可解释性问题。核心方法是提出一种称为因子化线性投影（FLiP）的模型，通过将嵌入向量线性投影到词汇表空间来提取关键词，以此作为理解嵌入内容的代理任务。与之前非因子化的线性探测方法（如LiP）和SpLiCE相比，FLiP在关键词提 Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-indic-codecfake-meets-satyam-towards-detecting/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-indic-codecfake-meets-satyam-towards-detecting/ 1. **问题**：现有针对基于神经音频编解码器的语音深度伪造（CodecFake）检测的研究主要集中在英语和中文，对于语言多样性极高的印度语言缺乏大规模的基准数据集和有效的检测方法。 2. **方法**：作者构建了首个大规模印度语言CodecFake数据集（ICF），并提出了一个名为SATYA MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-momo-a-framework-for-seamless-physical-verbal-and/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-momo-a-framework-for-seamless-physical-verbal-and/ 1. **问题**：工业机器人需要频繁适应新任务和环境，但现有技能调整方法（如手动重编程）对非专家用户不友好，且单一交互模态无法高效处理所有类型的调整需求。 2. **方法核心**：提出MOMO框架，集成三种互补交互模态：动觉接触（用于精确空间修正）、自然语言（用于高层语义修改）和图形界面（用于参数 MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-move-translating-laughter-and-tears-via-mixture/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-move-translating-laughter-and-tears-via-mixture/ 这篇论文旨在解决语音到语音翻译（S2ST）系统普遍丢失源语音中非语言声音（如笑声、哭声）和情感信息的问题，这严重影响了跨语言交流的自然度和准确性。为此，作者提出了三项核心贡献：首先，设计了一个可扩展的自动化数据合成管道，用于生成大规模、高质量的英中富有表现力S2ST平行语料，克服了训练数据稀缺的瓶颈 ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-onote-benchmarking-omnimodal-notation-processing/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-onote-benchmarking-omnimodal-notation-processing/ 1. **问题**：当前多模态大模型在音乐符号处理（Omnimodal Notation Processing, ONP）领域存在严重缺陷：研究碎片化、模型存在严重的符号偏差（偏向五线谱）、且普遍依赖不可靠的“LLM-as-a-Judge”评估方法，掩盖了模型在音乐理论推理上的系统性失败。 2. Qwen3.5-Omni Technical Report https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-qwen35-omni-technical-report/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-qwen35-omni-technical-report/ 这篇论文介绍了Qwen3.5-Omni，一个支持文本、图像、音频和音频-视频输入的全模态大语言模型。为解决现有模型在实时交互、跨模态推理和工具使用上的不足，其核心方法是采用“Thinker-Talker”架构，并引入混合专家（MoE）设计以提升效率。与前代相比，主要创新在于：1）模型规模扩展至数千亿 Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-reducing-the-offline-streaming-gap-for-unified/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-reducing-the-offline-streaming-gap-for-unified/ 1. **问题**：训练一个既能高精度离线转录又能低延迟流式识别的统一ASR模型极具挑战性，传统方法在低延迟下性能会急剧下降。 2. **方法核心**：提出一个统一的Transducer框架，结合分块注意力（含右上下文）和动态块卷积（DCConv）来适配两种模式。核心创新是引入了模式一致性正则化损失 SAND: The Challenge on Speech Analysis for Neurodegenerative Disease Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-sand-the-challenge-on-speech-analysis-for/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-sand-the-challenge-on-speech-analysis-for/ 1. **解决的问题**：针对神经退行性疾病（特别是肌萎缩侧索硬化症ALS）的早期诊断和监测，缺乏大规模、有临床标注的语音数据集，以及标准化的算法评估框架。 2. **方法核心**：构建并发布了名为SAND的挑战赛，其核心是提供一个扩展的、包含纵向数据的ALS患者与健康对照语音数据集（VOC-A Self-Noise Reduction for Capacitive Sensors via Photoelectric DC Servo: Application to Condenser Microphones https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-self-noise-reduction-for-capacitive-sensors-via/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-self-noise-reduction-for-capacitive-sensors-via/ 1. **问题**：电容式传感器（如ECM麦克风）的自噪声主要源于前置放大器中用于建立直流偏置的门极电阻（Rm）的热噪声。该电阻同时决定了噪声的低通截止频率和信号的高通截止频率，形成了一个难以调和的噪声-带宽权衡。 2. **方法核心**：提出PDS-Amp（光电直流伺服放大器），用基于外部光电效应 SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-speechparaling-bench-a-comprehensive-benchmark/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-speechparaling-bench-a-comprehensive-benchmark/ 1. **问题**：现有大型音频语言模型在副语言（如情绪、语气、音色）生成与理解能力上的评估存在特征覆盖不全、评估方法主观且不可扩展的问题。 2. **方法**：提出了SpeechParaling-Bench，一个包含1000余个中英平行语音查询、覆盖超过100个细粒度副语言特征的综合基准。基准 Tadabur: A Large-Scale Quran Audio Dataset https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-tadabur-a-large-scale-quran-audio-dataset/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-tadabur-a-large-scale-quran-audio-dataset/ 1. **问题**：现有的古兰经语音数据集在规模、诵读者多样性、音频质量和标注深度上存在严重不足，限制了古兰经ASR、诵读者识别等任务的研究进展。 2. **方法核心**：提出Tadabur数据集及其构建流水线。流水线核心是“古兰经经文对齐模块”（AAM），它结合WhisperX进行初步转录，再利用 Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-text-to-speech-with-chain-of-details-modeling/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-text-to-speech-with-chain-of-details-modeling/ 1. **问题**：现有基于离散token的TTS模型，其“粗到细”的生成范式主要体现在从语义token到声学token的转换，而对语音固有的时间动态（temporal dynamics）缺乏显式建模。 2. **方法核心**：提出Chain-of-Details (CoD)框架，将语音生成分解为多 Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-towards-streaming-target-speaker-extraction-via/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-towards-streaming-target-speaker-extraction-via/ 1. **要解决什么问题**：现有基于生成模型（如扩散模型、自回归模型）的目标说话人提取（TSE）方法依赖全局上下文，难以直接用于实时流式场景，强行适配会导致性能严重下降。 2. **方法核心是什么**：提出首个面向流式TSE的自回归（AR）框架，核心是“分块交错拼接范式”。该范式将混合语音分块 Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-utterance-level-methods-for-identifying-reliable/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-utterance-level-methods-for-identifying-reliable/ 1. **要解决什么问题**：儿童语音自动识别（ASR）错误率高，影响语言学习、阅读辅助等应用。传统置信度估计方法在噪声大、模式多变的儿童语音上可能失效。需要一种在转录后（utterance级别）自动识别哪些ASR输出是可靠的方法，以减少人工审核负担。 2. **方法核心是什么**：提出两种基于 X-VC: Zero-shot Streaming Voice Conversion in Codec Space https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-x-vc-zero-shot-streaming-voice-conversion-in/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-x-vc-zero-shot-streaming-voice-conversion-in/ 1. **问题**：零样本语音转换需要同时实现高质量的说话人特征迁移和低延迟的流式推理，这是一个尚未很好解决的挑战。 2. **方法核心**：提出X-VC系统，在预训练的SAC语音编解码器的潜在空间中进行一步转换。核心是一个双条件声学转换器，它联合处理源语音的编解码器潜在表示和目标参考语音的帧级梅尔语音/音频论文速递 2026-04-23 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23/ 共分析 27 篇语音/AI 论文 APRVOS: 1st Place Winner of 5th PVUW MeViS-Audio Track https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-aprvos-1st-place-winner-of-5th-pvuw-mevis-audio/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-aprvos-1st-place-winner-of-5th-pvuw-mevis-audio/ 这篇论文报告了APRVOS系统，一个专为MEVIS_Audio（音频条件下的指代视频对象分割）任务设计的冠军方案。**要解决的问题**是传统文本指代分割模型无法直接处理包含噪声、不完整且可能描述视频中不存在物体的语音输入。**采用的方法**是一个四阶段流水线：首先使用VibeVoice-ASR将语音 ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-atrie-adaptive-tuning-for-robust-inference-and/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-atrie-adaptive-tuning-for-robust-inference-and/ 本文针对现有语音合成系统在生成角色驱动、情感丰富的语音时难以同时保持角色身份一致性和情感表达准确性的问题，提出了ATRIE框架。其核心是**Persona-Prosody Dual-Track (P2-DT) 架构**，将语音生成解耦为静态的**音色轨道**（通过标量量化保持身份锚点）和动态的**韵 Audio Spoof Detection with GaborNet https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-audio-spoof-detection-with-gabornet/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-audio-spoof-detection-with-gabornet/ 本论文旨在解决传统SincNet前端在音频伪造检测中因有限长度sinc函数截断导致的频率泄漏问题。作者提出使用可学习的Gabor滤波器组（GaborNet）替代SincNet，并将其集成到两种先进的端到端检测架构RawNet2和RawGAT-ST中。同时，论文探索了将LEAF（Learnable F BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-beat-tokenizing-and-generating-symbolic-music-by/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-beat-tokenizing-and-generating-symbolic-music-by/ 本文针对符号音乐生成中主流的事件序列（event-based）tokenization方法隐含处理时间规律、导致模型需额外学习时间网格的问题，提出了一种名为**BEAT**的新型网格化tokenization框架。其核心思想是将音乐在时间上均匀离散化为“拍”（beat）作为基本单位，将每拍内每个音高 Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-benign-fine-tuning-breaks-safety-alignment-in/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-benign-fine-tuning-breaks-safety-alignment-in/ 这篇论文首次系统研究了良性（无害）音频数据微调对音频大模型安全对齐的破坏作用。**要解决的问题**是：用户出于提升模型性能目的进行的常规微调，是否会无意中破坏模型的安全防护？**方法**上，作者提出了一个基于嵌入空间邻近度的过滤框架，从语义、声学及混合维度，选择性地用与有害内容在表示空间上相近的良性 Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-comparison-of-semg-encoding-accuracy-across/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-comparison-of-semg-encoding-accuracy-across/ 这篇论文旨在为无声言语接口（SSI）选择更优的中间表示目标。研究系统比较了发音特征（SPARC）和传统的音素独热编码，在预测表面肌电（sEMG）信号包络上的表现。核心发现是：1）在出声、默语和次发声三种模式下，SPARC特征的编码准确性均显著优于音素特征；2）出声和默语模式的编码性能相当，次发声模式 Deep Supervised Contrastive Learning of Pitch Contours for Robust Pitch Accent Classification in Seoul Korean https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-deep-supervised-contrastive-learning-of-pitch/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-deep-supervised-contrastive-learning-of-pitch/ 这篇论文旨在解决将连续变化的基频（F0）曲线映射到首尔韩语中离散、不变的音高重音类别（如LHLH, HHLH）这一难题。传统方法易受F0测量噪声和说话人差异的影响。为此，作者提出了**Dual-Glob**，一个深度监督对比学习框架。其核心是通过一个**双分支（干净视图和增强视图）编码器**，在共享 Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-detecting-hallucinations-in-speechllms-at/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-detecting-hallucinations-in-speechllms-at/ 本文旨在解决语音大模型（SpeechLLMs）在推理时产生的“幻觉”问题，即生成与输入音频不符的流畅文本。现有方法依赖昂贵的黄金标准输出，而文本LLM的方法无法捕捉音频特有信号。为此，作者提出了四个基于注意力图的轻量级指标（AudioRatio, AudioConsistency, AudioEnt Disentangling Damage from Operational Variability: A Label-Free Self-Supervised Representation Learning Framework for Output-Only Structural Damage Identification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-disentangling-damage-from-operational-variability/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-disentangling-damage-from-operational-variability/ 本文针对结构健康监测中损伤信号易被环境与操作变异掩盖的核心挑战，提出了一种**无标签、自监督的解缠表示学习框架**。该框架采用双流自编码器架构，通过**时间序列重构损失**确保信息完整性，并利用**VICReg自监督损失**（基于假设损伤状态不变的基线期数据）强制损伤敏感表征（`z_dmg`）对操作 Environmental Sound Deepfake Detection Using Deep-Learning Framework https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-environmental-sound-deepfake-detection-using-deep/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-environmental-sound-deepfake-detection-using-deep/ 本文针对环境声音（如声音事件、声音场景）的深度伪造检测这一新兴任务，提出了一个系统的深度学习框架。**核心贡献**在于通过大量实验，系统评估了不同频谱图（MEL, CQT, Gammatone）、多种CNN架构（ResNet, Inception等）以及预训练模型（BEATs）在该任务上的表现，并验 HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-halluaudio-a-comprehensive-benchmark-for/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-halluaudio-a-comprehensive-benchmark-for/ 这篇论文旨在解决大型音频语言模型（LALM）中普遍存在的“幻觉”问题（即生成与音频证据不符的内容）缺乏系统性评估工具的难题。为此，作者构建并发布了**HalluAudio**，这是首个大规模、多领域（语音、环境声、音乐）、多任务（二分类、多选、属性验证、开放生成）的人工验证音频幻觉检测基准，包含超过 MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-move-translating-laughter-and-tears-via-mixture/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-move-translating-laughter-and-tears-via-mixture/ MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-mtr-duplexbench-towards-a-comprehensive/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-mtr-duplexbench-towards-a-comprehensive/ 这篇论文旨在解决当前全双工语音语言模型（FD-SLMs）评测体系的一个关键缺陷：缺乏对多轮、连续对话能力的系统性评估。现有基准多关注单轮交互或特定对话特性（如打断），忽略了模型在多轮语境下维持指令遵循、安全等核心能力的一致性。为此，作者提出了**MTR-DuplexBench**，一个全新的多轮全双 NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-nvbench-a-benchmark-for-speech-synthesis-with-non/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-nvbench-a-benchmark-for-speech-synthesis-with-non/ 这篇论文旨在解决语音合成（TTS）领域中一个关键但被忽视的问题：如何标准化评估系统生成非语言声音（NVV，如笑声、叹息）的能力。作者提出了**NVBench**，一个包含**45类NVV统一分类体系**的双语（英/中）基准。其核心方法包括：1）构建了一个每类50例、总计4500例的高质量平衡评估数据 Qwen3.5-Omni Technical Report https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-qwen35-omni-technical-report/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-qwen35-omni-technical-report/ 这篇技术报告全面介绍了Qwen3.5-Omni，一个能够统一理解与生成文本、图像、音频和音视频内容的全模态大语言模型。**要解决的问题**是现有模型在实时交互、跨模态推理和自主智能体行为方面的局限性。**采用的方法**是基于“思考者-说话者”架构，引入了多项关键创新：1）思考者和说话者均采用混合注意 Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-reducing-the-offline-streaming-gap-for-unified/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-reducing-the-offline-streaming-gap-for-unified/ 本文旨在解决训练单一自动语音识别（ASR）模型同时高效支持高精度离线转写和低延迟流式识别这一挑战。现有统一模型在低延迟流式模式下性能下降明显。作者提出了一个统一的RNN-Transducer (RNNT) 框架，其核心是结合了**带右上下文的chunk限制注意力**和**动态chunk卷积（DCCo Tadabur: A Large-Scale Quran Audio Dataset https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-tadabur-a-large-scale-quran-audio-dataset/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-tadabur-a-large-scale-quran-audio-dataset/ 本文旨在解决古兰经语音研究领域缺乏大规模、多样化、细粒度标注数据集的问题。为此，作者提出了**Tadabur**数据集及其自动化构建流水线。该流水线首先从公共平台收集音频，并利用大语言模型（Gemini）从非结构化文本中提取标准化元数据（如章节、朗诵者）。核心步骤是**Ayah Alignment Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-text-to-speech-with-chain-of-details-modeling/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-text-to-speech-with-chain-of-details-modeling/ 本文针对文本转语音（TTS）任务，提出了一种名为“细节链”（Chain-of-Details, CoD）的新框架。**要解决的问题**是现有TTS方法在建模语音生成的时域动态（从粗略时序到精细声学细节的渐进过程）方面存在不足。**使用的方法**是将语音生成分解为多个时间分辨率递增的阶段，在每个阶段使 Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-towards-streaming-target-speaker-extraction-via/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-towards-streaming-target-speaker-extraction-via/ 这篇论文旨在解决生成式目标说话人提取（TSE）模型在流式实时应用中因依赖全局上下文而导致性能严重下降的核心问题。作者首次提出了一个基于自回归语言模型（LauraGPT）的流式TSE框架。其核心创新是“分块交织拼接范式”，通过将混合音频块与对应的目标语音离散编码块交错排列作为模型输入，严格保证了推理的 UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-uaf-a-unified-audio-front-end-llm-for-full-duplex/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-uaf-a-unified-audio-front-end-llm-for-full-duplex/ **核心贡献**：本文提出了首个专为全双工语音交互设计的统一音频前端大模型（UAF）。它打破了传统级联式前端处理的范式，将语音活动检测（VAD）、说话人识别（SR）、自动语音识别（ASR）、轮次检测（TD）和问答（QA）等多个任务，统一建模为一个自回归序列预测问题。 **关键方法**：模型采用“音 Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/ 这篇论文旨在解决现有印度语言语音识别（Indic ASR）基准不反映真实场景、评估方法不公平的核心问题。为此，作者构建了“Voice of India”大规模基准，其数据源自3.6万名说话者的非脚本化电话对话，覆盖15种主要印度语言和139个地区集群，总计536小时。关键创新在于采用了考虑拼写变体的语音/音频论文速递 2026-04-22 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/ 共分析 21 篇语音/AI 论文 A novel LSTM music generator based on the fractional time-frequency feature extraction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-a-novel-lstm-music-generator-based-on-the/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-a-novel-lstm-music-generator-based-on-the/ 本文提出了一种基于分数阶傅里叶变换（FrFT）和长短期记忆网络（LSTM）的新型AI音乐生成系统。**核心目标**是利用FrFT在分数阶域（时频平面的旋转表示）中提取比传统时域或频域更丰富的音乐信号特征，以解决传统LSTM在捕捉音乐复杂时频结构上的不足。**关键方法**是将输入音乐信号进行FrFT变 A state-space representation of the boundary integral equation for room acoustic modelling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-a-state-space-representation-of-the-boundary/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-a-state-space-representation-of-the-boundary/ 本文旨在解决传统房间声学建模中多种方法（如边界元法、延迟网络、几何声学）彼此独立、缺乏统一理论基础的问题。作者提出了一种名为**边界积分算子状态空间（BIOSS）** 的新框架。该框架的核心是将描述声场的边界积分方程重新表述为一个状态空间模型，其中**状态是房间边界上的声压分布函数**，系统动态由* Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-aligning-language-models-for-lyric-to-melody/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-aligning-language-models-for-lyric-to-melody/ 这篇论文旨在解决大语言模型在歌词到旋律生成任务中，通过监督微调（SFT）训练出的模型常产生音乐上不可行（如节奏怪异、音域超限）的“约束违反”问题。**核心贡献**是提出了一套无需人工标注、基于规则约束的自动化对齐框架。**关键方法**分为三步：首先对预训练LLM进行SFT以获得基础生成能力；其次，利 Anonymization, Not Elimination: Utility-Preserved Speech Anonymization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-anonymization-not-elimination-utility-preserved/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-anonymization-not-elimination-utility-preserved/ 这篇论文针对语音数据隐私保护中“隐私泄露”与“数据效用损失”的核心矛盾，提出了一个新颖的两阶段框架。首先，为解决语音匿名化（保护“谁在说”）中身份多样性不足和可控性差的问题，提出了基于流匹配的说话人嵌入匿名器（F3-VA），它能生成多样且与原始说话人充分分离的新身份。其次，为解决内容匿名化（保护“说 ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-artifactnet-detecting-ai-generated-music-via/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-artifactnet-detecting-ai-generated-music-via/ 这篇论文旨在解决AI生成音乐检测中普遍存在的泛化能力差的问题。当前主流方法（如CLAM、SpecTTTra）通过学习AI音乐的声音特征，在面对未见过的生成器时性能急剧下降。作者提出了一个核心假设：当前主流AI音乐生成器（如Suno, Udio）都依赖神经音频编解码器（如EnCodec）的残差矢量量化 Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-audio-cogito-towards-deep-audio-reasoning-in/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-audio-cogito-towards-deep-audio-reasoning-in/ 本文旨在解决大型音频语言模型（LALMs）在复杂音频推理任务中能力不足、推理过程不透明的问题。**核心贡献**是提出了一个名为 **Audio-Cogito** 的完全开源解决方案，其核心是一个四阶段的自动化数据构建管道 **Cogito-Pipe**，用于生成高质量、多样化的音频推理链（CoT）数 Audio-DeepThinker: Progressive Reasoning-Aware Reinforcement Learning for High-Quality Chain-of-Thought Emergence in Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-audio-deepthinker-progressive-reasoning-aware/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-audio-deepthinker-progressive-reasoning-aware/ 这篇论文旨在解决大型音频语言模型（LALMs）缺乏显式、高质量推理能力的问题。现有方法要么受限于监督数据的质量，要么使用粗糙的奖励，导致生成的思维链形式良好但缺乏声学依据。作者提出了**Audio-DeepThinker**框架，其核心贡献有三：1）设计了一种**混合推理相似度奖励**，结合LLM评 AVRT: Audio-Visual Reasoning Transfer through Single-Modality Teachers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-avrt-audio-visual-reasoning-transfer-through/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-avrt-audio-visual-reasoning-transfer-through/ 本文旨在解决多模态大模型在音视频联合推理任务上缺乏高质量训练数据的核心挑战。**核心贡献**是提出了AVRT框架，通过组合单模态专家模型的能力来合成多模态推理数据。**关键方法**分为两步：1）**数据生成**：使用专门的视觉教师（Kimi-VL-Thinking）和音频教师（Audio Flami Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-benign-fine-tuning-breaks-safety-alignment-in/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-benign-fine-tuning-breaks-safety-alignment-in/ 这篇论文首次系统研究了**良性音频数据微调对音频大模型安全对齐的破坏性影响**。核心问题是：用户出于提升性能的目的，在完全无害的音频数据上微调模型，是否会意外削弱其拒绝有害指令的能力？作者提出了一个**基于嵌入空间邻近性的过滤框架**，通过计算良性音频与有害音频在模型内部或外部参考编码器空间中的距离 BhashaSutra: A Task-Centric Unified Survey of Indian NLP Datasets, Corpora, and Resources https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-bhashasutra-a-task-centric-unified-survey-of/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-bhashasutra-a-task-centric-unified-survey-of/ 这篇论文旨在解决印度语言NLP研究资源分散、缺乏统一概览的痛点。作者首次提出了一个以任务为中心的统一分类体系，系统性地梳理和整合了超过200个数据集、50个基准测试以及100多个模型、工具和系统，覆盖了从核心语言处理（如分词、词性标注）到文本分类、生成翻译、信息检索、语音与多模态，乃至社会文化任务（ ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-claricodec-optimising-neural-speech-codes-for/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-claricodec-optimising-neural-speech-codes-for/ 本文针对卫星、水下通信等超低比特率（200bps）场景下，传统神经语音编解码器因优化重建质量而牺牲可懂度的问题，提出了ClariCodec。其核心方法是将编码器的量化过程重新定义为一个随机策略，并利用强化学习（RL），以词错率（WER）作为奖励信号对编码器进行微调，而冻结解码器等声学重建管线。实验表 Coexisting Tempo Traditions in Beethoven's Piano and Cello Sonatas: A K-means Clustering Analysis of Recorded Performances, 1930-2012 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-coexisting-tempo-traditions-in-beethovens-piano/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-coexisting-tempo-traditions-in-beethovens-piano/ 本文旨在挑战音乐表演实证研究中普遍使用的单一回归分析模型，该模型常将历史速度变化描绘为一个单向、统一的过程。作者提出，这种模型掩盖了多种演奏传统并存的事实。研究通过对贝多芬五首钢琴与大提琴奏鸣曲（Op. 5, 69, 102）在1930-2012年间超过一百个乐章录音的逐小节速度数据进行K-mean FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-flip-towards-understanding-and-interpreting/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-flip-towards-understanding-and-interpreting/ 本文提出**FLiP**，一种**因子化线性投影模型**，旨在**理解并解释**多语言、多模态句子嵌入空间（如SONAR, LaBSE, Gemini）。核心思想是将嵌入空间的解释转化为一个**线性关键词提取任务**：通过一个简单的线性投影，从句子嵌入向量中恢复出构成该句子的词汇。实验表明，训练良好 FreezeEmpath: Efficient Training for Empathetic Spoken Chatbots with Frozen LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-freezeempath-efficient-training-for-empathetic/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-freezeempath-efficient-training-for-empathetic/ 本文旨在解决训练共情语音聊天机器人时面临的**共情语音数据稀缺、模型泛化能力弱、以及微调导致LLM通用能力退化**三大难题。作者提出了**FreezeEmpath**，一种高效的端到端训练框架。其核心方法是**冻结基础LLM**，采用**语义-情感解耦编码策略**，通过独立的语义适配器和情感提取器从 From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-from-reactive-to-proactive-assessing-the/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-from-reactive-to-proactive-assessing-the/ 本文旨在解决现有语音代理评估基准主要关注被动响应，而忽略其主动感知与干预能力的问题。作者提出了**ProVoice-Bench**，这是首个专门用于评估主动式语音代理的基准测试框架。该框架通过一个包含数字状态构建、场景合成、对话生成、声学模拟和对话组装的多阶段数据合成管道，构建了包含1182个高质量 Hard to Be Heard: Phoneme-Level ASR Analysis of Phonologically Complex, Low-Resource Endangered Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-hard-to-be-heard-phoneme-level-asr-analysis-of/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-hard-to-be-heard-phoneme-level-asr-analysis-of/ 这篇论文针对两种音系极其复杂、资源极度匮乏的濒危东高加索语言（Archi和Rutul），首次建立了语音识别（ASR）基准。作者们整合并标准化了现有的语言学记录，创建了约50分钟和1小时20分钟的语音-文本数据集。他们评估了多种前沿ASR模型（wav2vec2, Whisper, Qwen2-Audi HCFD: A Benchmark for Audio Deepfake Detection in Healthcare https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-hcfd-a-benchmark-for-audio-deepfake-detection-in/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-hcfd-a-benchmark-for-audio-deepfake-detection-in/ 本文针对医疗健康领域中神经音频编解码器生成的语音深伪检测问题，提出了一个全新的研究任务（HCFD）和基准数据集（HCFK）。研究发现，在健康语音上训练的现有深伪检测模型在病态语音上性能显著下降。为此，论文首先验证了预训练音频模型（如PaSST）能更好地应对病理语音带来的变异性。更重要的是，本文提出了 ICLAD: In-Context Learning with Comparison-Guidance for Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-iclad-in-context-learning-with-comparison/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-iclad-in-context-learning-with-comparison/ 本文针对音频深度伪造检测模型在真实场景（in-the-wild）中泛化能力差的核心问题，提出了一种名为ICLAD的全新范式。该框架利用音频语言模型（ALM）的上下文学习能力，实现了无需训练的快速适应。其核心是创新的**成对比较推理**策略：在离线阶段，引导ALM为每个样本同时生成“真实”和“伪造”的 Incremental learning for audio classification with Hebbian Deep Neural Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-incremental-learning-for-audio-classification/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-incremental-learning-for-audio-classification/ 本文针对音频分类中的增量学习（持续学习）问题，提出了一种受生物启发的解决方案。核心是解决深度学习模型在学习新任务时对旧知识的“灾难性遗忘”。作者首次将**Hebbian学习**（一种基于神经元同步激活的无监督、无反馈学习规则）与**增量学习**相结合，并设计了一个**核塑性**机制。该机制通过分析训 Latent Fourier Transform https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-latent-fourier-transform/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-latent-fourier-transform/ 这篇论文旨在解决现有音乐生成模型难以对**任意时间尺度**上的音乐模式进行精确控制的问题。作者提出了**潜在傅里叶变换（LatentFT）** 框架，其核心是将离散傅里叶变换应用于由扩散自编码器编码得到的**潜在向量序列**，从而得到“潜在频谱”。通过在训练过程中对潜在频谱进行随机频率掩码，迫使解码 LLM-Codec: Neural Audio Codec Meets Language Model Objectives https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-llm-codec-neural-audio-codec-meets-language-model/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-llm-codec-neural-audio-codec-meets-language-model/ 本文旨在解决语音语言模型（SLM）中一个根本性矛盾：神经音频编码器以波形重建为目标进行优化，而语言模型以序列预测为目标进行优化，这种目标不匹配导致生成的离散语音令牌熵值高、难以预测。为此，作者提出了LLM-Codec训练框架，在不改变编码器和语言模型架构的前提下，通过引入两个面向语言模型的正则化目标 MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mimiclm-zero-shot-voice-imitation-through/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mimiclm-zero-shot-voice-imitation-through/ 这篇论文旨在解决零样本语音模仿任务中高质量平行训练数据稀缺的核心瓶颈。传统方法要么依赖复杂的解耦架构，要么使用合成语音作为训练目标，导致输出质量受限于合成系统的能力。作者提出了一种名为 **MimicLM** 的新框架，其核心创新在于**“角色交换”的数据构建策略**：使用TTS生成的语音作为**训 MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mint-bench-a-comprehensive-multilingual-benchmark/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mint-bench-a-comprehensive-multilingual-benchmark/ 这篇论文旨在解决指令跟随文本转语音（TTS）领域缺乏系统化评估工具的问题。当前评估存在覆盖不全、诊断粒度粗、多语言支持弱等缺陷。为此，作者提出了**MINT-Bench**，一个全面的多语言基准测试。其核心方法包括：1）一个基于10种原子声学属性的**分层多轴分类法**，系统性地组织了从简单到复杂（ MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-move-translating-laughter-and-tears-via-mixture/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-move-translating-laughter-and-tears-via-mixture/ 这篇论文旨在解决语音到语音翻译（S2ST）系统普遍缺失非语言声音（如笑声、哭泣）和情感韵律的问题，这严重限制了跨语言交流的自然度和语用准确性。作者提出了三大贡献：1) 一个**可扩展的表达性数据合成管道**，能自动生成高质量、带情感标注的S2ST训练对，克服了数据稀缺瓶颈；2) **MoVE（混合声 Neural Encoding Detection is Not All You Need for Synthetic Speech Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-neural-encoding-detection-is-not-all-you-need-for/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-neural-encoding-detection-is-not-all-you-need-for/ 这篇综述论文的核心贡献在于**揭示并论证了当前合成语音检测领域的一个关键误区：过度依赖“神经编码检测”**。论文首先系统回顾了基于SincNet、自监督学习（SSL）和神经编码检测的三类数据驱动方法，指出当前性能最佳的SSL模型实际上主要捕捉的是声码器（vocoder）在波形生成阶段引入的痕迹，而非 NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-nim4-asr-towards-efficient-robust-and/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-nim4-asr-towards-efficient-robust-and/ 本文提出了NIM4-ASR，一个面向生产环境的高效、鲁棒且可定制的实时语音识别框架。该工作旨在解决现有LLM-based ASR在实际部署中的三大挑战：1) 轻量化模型性能严重下降（有限的向下扩展性）；2) 在声学挑战条件下产生幻觉；3) 缺乏生产就绪的热词定制机制。为此，作者提出了一套原则性的多阶 Omni-Embed-Audio: Leveraging Multimodal LLMs for Robust Audio-Text Retrieval https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-omni-embed-audio-leveraging-multimodal-llms-for/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-omni-embed-audio-leveraging-multimodal-llms-for/ 这篇论文旨在解决当前音频-文本检索模型在**真实、多样化用户查询**下性能下降的问题。作者指出，现有基准测试（如AudioCaps, Clotho）依赖描述性标题式查询，与真实世界中简短、多变的搜索行为（如问题、命令、关键词、排除性查询）存在巨大差距。为此，论文提出了两大核心贡献：1) **Omni Prosody as Supervision: Bridging the Non-Verbal--Verbal for Multilingual Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-prosody-as-supervision-bridging-the-non-verbal/ 这篇论文旨在解决低资源多语言语音情感识别（SER）中标注数据稀缺的核心瓶颈。作者提出了一个颠覆性的范式：**将SER重新定义为无监督的“非言语到言语”迁移问题**。其核心假设是，非言语发声（如笑、哭）中蕴含的韵律情感线索比言语更纯粹、更跨语言，因此可以作为更好的监督源。为此，作者设计了**NOVA- SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-self-emo-emotional-self-evolution-from/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-self-emo-emotional-self-evolution-from/ 本文旨在解决对话系统中情感识别（ERC）与情感表达能力受限于高质量标注数据稀缺且静态的问题。**核心贡献**是提出了一个心理学动机的自我进化框架 **SELF-EMO**。**关键方法**是构建一个角色扮演的自博弈范式，使模型同时充当“情绪识别者”和“对话响应者”，并通过一个“生成-筛选-重用”的数 Still Between Us? Evaluating and Improving Voice Assistant Robustness to Third-Party Interruptions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-still-between-us-evaluating-and-improving-voice/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-still-between-us-evaluating-and-improving-voice/ 本文旨在解决语音语言模型（SLMs）在真实场景中无法有效区分主要用户与第三方插入语音（Third-Party Interruption, TPI）的问题，这会导致上下文理解失败。为此，作者首先创建了 **TPI-Train**，一个包含8.8万个样本的训练数据集，其核心设计是“说话人感知的难负例”， VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-vibe-voice-induced-open-ended-bias-evaluation-for/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-vibe-voice-induced-open-ended-bias-evaluation-for/ 这篇论文旨在解决大型音频语言模型（LALM）在开放生成任务中社会偏见评估不足的问题。现有基准多依赖合成语音和选择题（MCQ），无法捕捉模型在真实交互中自然流露的刻板印象。为此，作者提出了**VIBE**框架，其核心是使用**真实人声录音**输入模型，并通过**开放生成任务**（如故事创作、个性化推荐 Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-video-robin-autoregressive-diffusion-planning-for/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-video-robin-autoregressive-diffusion-planning-for/ 本文针对现有视频到音乐（V2M）生成模型缺乏对创作者风格、主题等细粒度意图控制的问题，提出了Video-Robin，一个结合文本提示的视频配乐框架。其核心方法是将生成过程解耦为两个阶段：首先，一个多模态自回归规划头（AR-Head）整合视频帧和文本提示，通过语义语言模型、有限标量量化（FSQ）和残差 VoxSafeBench: Not Just What Is Said, but Who, How, and Where https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-voxsafebench-not-just-what-is-said-but-who-how/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-voxsafebench-not-just-what-is-said-but-who-how/ 这篇论文旨在解决当前语音语言模型（SLM）社会对齐评估不全面、不深入的问题。现有基准要么只关注基础音频理解，要么孤立地研究单一风险，无法区分模型是因“不懂”还是因“没用对地方”而失败。为此，作者提出了**VoxSafeBench**，这是首个联合评估SLM在**安全、公平、隐私**三大社会对齐维度上 Where Do Self-Supervised Speech Models Become Unfair? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-where-do-self-supervised-speech-models-become/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-where-do-self-supervised-speech-models-become/ 这篇论文旨在探究自监督语音模型（S3M）的不公平性究竟在模型的哪个层级产生。研究团队采用了一种轻量级的线性探针方法，在多个S3M（如WavLM, Wav2Vec2, BEST-RQ, Whisper）的每一层嵌入上，同时评估了说话人识别（SID）和自动语音识别（ASR）任务的整体性能及对不同说话人组语音/音频论文速递 2026-04-21 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21/ 共分析 34 篇语音/AI 论文 ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-actormind-emulating-human-actor-reasoning-for/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-actormind-emulating-human-actor-reasoning-for/ 这篇论文旨在解决现有角色扮演研究局限于文本模态，而忽视了日常交流中主导的语音模态的问题。为此，作者首先**定义了“语音角色扮演”任务**，要求模型能根据角色、场景和对话历史，生成带有个性化语音特征（如特定情感、语调）的自发性回应。为此，他们构建了**ActorMindBench**，这是一个基于《老 ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-artifactnet-detecting-ai-generated-music-via/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-artifactnet-detecting-ai-generated-music-via/ 本文旨在解决AI生成音乐检测中泛化性差和模型参数效率低的问题。作者提出了一种名为**ArtifactNet**的新框架，其核心创新在于将问题**重新定义为“法医物理学”**，即直接提取和分析神经音频编解码器在生成音频中不可避免留下的物理痕迹（残留物）。该方法使用一个轻量级的**Bounded-mas AST: Adaptive, Seamless, and Training-Free Precise Speech Editing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ast-adaptive-seamless-and-training-free-precise/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ast-adaptive-seamless-and-training-free-precise/ 本文针对现有语音编辑方法依赖任务特定训练、未编辑区域时间一致性差的问题，提出了AST（Adaptive, Seamless, and Training-free），一种基于预训练AM-FM（自回归-流匹配）范式TTS模型的精确语音编辑框架。AST首先通过逆Euler ODE求解器将原始语音反演至潜空 Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-beyond-monologue-interactive-talking-listening/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-beyond-monologue-interactive-talking-listening/ 本文旨在解决从单向“独白”式虚拟人生成迈向自然“全双工”交互式生成的核心挑战。**核心问题**在于，现有方法要么因严格的帧对齐而反应僵硬，要么因引入全局注意力而破坏唇同步。**关键方法**是提出一个基于多头高斯核（MHGK）的统一注意力架构，该机制通过为不同的注意力头分配从窄到宽的高斯分布感受野，使 BlasBench: An Open Benchmark for Irish Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-blasbench-an-open-benchmark-for-irish-speech/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-blasbench-an-open-benchmark-for-irish-speech/ 这篇论文旨在解决爱尔兰语语音识别（ASR）领域缺乏统一、可靠评估标准的问题。现有工作或基准要么忽略爱尔兰语特有的文本规范（如保留fada变音符号、初始辅音突变），要么在不同数据集和归一化方法下进行，导致结果无法比较。为此，作者提出了**BlasBench**，一个开放的评估框架，其核心是一个**爱尔 Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-discrete-token-modeling-for-multi-stem-music/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-discrete-token-modeling-for-multi-stem-music/ 本文提出了一种用于多轨音乐源分离的生成式框架，其核心创新在于将分离任务重新定义为**条件离散令牌生成**问题。传统方法直接在时频域估计连续信号，而本文方法首先利用**HCodec**神经音频编解码器将音频波形转换为离散的声学与语义令牌序列。然后，一个基于**Conformer**的条件编码器从混合音 Elucidating the SNR-t Bias of Diffusion Probabilistic Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-elucidating-the-snr-t-bias-of-diffusion/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-elucidating-the-snr-t-bias-of-diffusion/ 这篇论文的核心贡献是识别并系统分析了扩散概率模型（DPMs）中一个基础性问题——信噪比-时间步（SNR-t）偏差。该偏差指推理时去噪样本的实际SNR与其所分配时间步t所理论对应的SNR不匹配，这种错位源于训练时的严格耦合在推理时被累积误差打破。作者通过详实的实验（滑动窗口测试、前向与反向过程对比）揭 Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-full-duplex-bench-v3-benchmarking-tool-use-for/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-full-duplex-bench-v3-benchmarking-tool-use-for/ 这篇论文针对当前全双工语音代理评估缺乏真实性（依赖合成语音）和任务简单性（单步调用）的问题，提出了**Full-Duplex-Bench-v3 (FDB-v3)** 基准。该基准的核心创新在于使用**100条真实人类录音**（含五种不流畅性注释），在四个任务域中设计了需要**多步API链式调用**的 Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-generalizable-audio-visual-navigation-via/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-generalizable-audio-visual-navigation-via/ 本文旨在解决音频-视觉导航（AVN）智能体在未见环境和未闻声音类别下泛化能力差的核心问题。作者指出，现有方法性能下降主要源于两个因素：一是音频表征混淆了语义与空间信息，导致对未闻声��定位不准；二是强化学习策略过拟合于训练环境的动态和布局。为此，本文提出了一个名为BDATP的即插即用框架。在感知层面 HARNESS: Lightweight Distilled Arabic Speech Foundation Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-harness-lightweight-distilled-arabic-speech/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-harness-lightweight-distilled-arabic-speech/ 这篇论文针对阿拉伯语语音识别、方言识别和情感识别中通用多语言/英语模型性能不足、且大模型难以部署的问题，提出了 HArnESS——一个以阿拉伯语为中心的自监督语音模型家族。作者采用 HuBERT 风格的迭代自蒸馏框架，先在大规模阿拉伯语-英语双语数据（约 23K 小时）上训练 24 层的教师模型 H Hierarchical Codec Diffusion for Video-to-Speech Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-hierarchical-codec-diffusion-for-video-to-speech/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-hierarchical-codec-diffusion-for-video-to-speech/ 本论文针对 Video-to-Speech（VTS）生成中视觉-语音模态信息不对称的问题，提出现有方法忽略了语音从粗粒度语义到细粒度韵律的层次结构，导致视觉条件无法与语音表示精准对齐。为此，作者提出 HiCoDiT（Hierarchical Codec Diffusion Transformer）， Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-interactive-asr-towards-human-like-interaction/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-interactive-asr-towards-human-like-interaction/ 这篇论文针对传统ASR的两大盲区——WER指标对语义错误不敏感、以及系统无法通过自然交互进行纠错——提出了Interactive ASR框架。首先，作者引入S²ER（Sentence-level Semantic Error Rate），利用LLM-as-a-Judge二元判断识别结果与参考文本是否 Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-joint-centric-dual-contrastive-alignment-with/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-joint-centric-dual-contrastive-alignment-with/ 这篇论文旨在解决音频-文本多模态表示学习中的一个关键挑战：如何在低资源、长序列且模态维度严重不平衡（音频高维、文本低维）的情况下，实现有效的跨模态对齐，同时保留各自的特异性信息。为此，作者提出了HILBERT框架。该方法首先利用冻结的预训练音频（如HuBERT）和文本（如T5）编码器提取片段级特征， MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-moshirag-asynchronous-knowledge-retrieval-for/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-moshirag-asynchronous-knowledge-retrieval-for/ 本文旨在解决全双工语音语言模型（如Moshi）事实性不足的核心问题，同时不牺牲其高交互性。**问题**：全双工模型能实时打断和回应，但因训练数据规模远小于文本，其知识储备和事实准确性较弱。**方法**：提出了MoshiRAG，一个模块化框架。它在Moshi模型中引入一个特殊的`<ret>`检索触发令 MUSCAT: MUltilingual, SCientific ConversATion Benchmark https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-muscat-multilingual-scientific-conversation/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-muscat-multilingual-scientific-conversation/ 本文提出了 MUSCAT，一个用于评估多语言科学对话场景下自动语音识别（ASR）性能的新基准。数据集包含 6 组双语对话录音（共约 65 分钟，9,066 词），涉及英语与德语、土耳其语、中文、越南语的配对对话；每组对话使用 Meeting Owl 3、ReSpeaker USB 麦克风阵列和 Me NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-naijas2st-a-multi-accent-benchmark-for-speech-to/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-naijas2st-a-multi-accent-benchmark-for-speech-to/ 这篇论文旨在解决非洲低资源语言在语音翻译（S2ST和S2TT）研究中面临的高质量、多口音平行语音数据严重匮乏的核心瓶颈。为此，作者构建了**NaijaS2ST**数据集，涵盖豪萨语、伊博语、约鲁巴语和尼日利亚皮钦语与英语的平行语音，每种语言约50小时，捕获了真实的说话者与口音多样性。基于此数据集，论 NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-nvbench-a-benchmark-for-speech-synthesis-with-non/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-nvbench-a-benchmark-for-speech-synthesis-with-non/ 本文旨在解决语音合成（TTS）领域中非语言声音（NVV，如笑声、叹息、哭泣）缺乏标准化评估框架的问题。为此，作者提出了NVBench，一个双语（英/中）基准测试。其核心方法包括：1）设计了一个涵盖45种NVV类型的统一分类法；2）构建了一个类型均衡的高质量双语评估数据集；3）提出了一套多轴评估协议， PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ps-tts-phonetic-synchronization-in-text-to-speech/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-ps-tts-phonetic-synchronization-in-text-to-speech/ 这篇论文旨在解决自动配音（AD）中目标语音与源语音在时长和唇形上的同步难题。其核心贡献是提出了一套两阶段的文本改写方法，并集成到TTS系统中：首先通过语言模型进行**等时性**改写，确保目标语音时长匹配源语音；其次引入**音素同步（PS）**，使用动态时间规整（DTW）和从训练数据中学习的元音距离， Qwen3.5-Omni Technical Report https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-qwen35-omni-technical-report/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-qwen35-omni-technical-report/ Qwen3.5-Omni 是一个旨在统一理解、推理、生成与行动的全模态大语言模型。它**解决**了现有模型在实时交互、长上下文音视频处理、流式语音生成稳定性以及多语言支持等方面的局限性。**方法上**，它基于Thinker-Talker架构，引入了Hybrid MoE以提升效率，采用显式时间戳替代稀 Spatial-Aware Conditioned Fusion for Audio-Visual Navigation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-spatial-aware-conditioned-fusion-for-audio-visual/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-spatial-aware-conditioned-fusion-for-audio-visual/ 本论文针对音频-视觉导航（AVN）中目标空间意图模糊、视觉特征缺乏听觉条件引导两大问题，提出了 Spatial-Aware Conditioned Fusion（SACF）框架。该框架首先设计了 Spatially Discretized Localization Descriptor（SDLD）， Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-temporal-contrastive-decoding-a-training-free/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-temporal-contrastive-decoding-a-training-free/ 统一的大型音频-语言模型（LALMs）在自回归解码时存在“时间平滑偏差”：短暂、瞬态的声学线索（如电话铃声、乐器拨弦）容易被语言先验和时间上平滑的上下文所淹没，导致生成结果缺乏音频特异性。本文提出 Temporal Contrastive Decoding (TCD)，一种完全免训练、仅在推理时生效 The Acoustic Camouflage Phenomenon: Re-evaluating Speech Features for Financial Risk Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-the-acoustic-camouflage-phenomenon-re-evaluating/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-the-acoustic-camouflage-phenomenon-re-evaluating/ 本研究探讨了在企业财报电话会议中，副语言声学特征（音高、抖动、停顿等）对预测灾难性股价下跌的效用。作者基于MAEC数据集，提取了两种模态的特征：文本端使用FinBERT计算脚本化开场白与即兴Q&A之间的情感极性差异（Sentiment Delta），音频端提取临床语音压力标记的方差特征（音高方差、抖 TinyMU: A Compact Audio-Language Model for Music Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-tinymu-a-compact-audio-language-model-for-music/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-tinymu-a-compact-audio-language-model-for-music/ 本文针对现有大型音频语言模型（LALM）参数庞大（数十亿级）、训练推理成本高、难以部署在边缘设备的问题，提出了 TinyMU——一个仅有 229M 参数的紧凑音乐语言模型。为此，作者构建了 MusicSkills-3.5M 数据集，包含 350 万个涵盖多选、二元判断和开放式格式的音乐问答样本，结合 VoxMind: An End-to-End Agentic Spoken Dialogue System https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-voxmind-an-end-to-end-agentic-spoken-dialogue/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-voxmind-an-end-to-end-agentic-spoken-dialogue/ 端到端语音对话模型在自然交互上进步迅速，但普遍缺乏处理复杂任务的agent能力（工具调用、规划、推理）。本文首先形式化定义了"端到端语音智能体"的四大维度——画像（Profile）、记忆（Memory）、规划（Planning）与执行（Action Execution），填补了该领域理论标准的空白。语音/音频论文速递 2026-04-20 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20/ 共分析 24 篇语音/AI 论文 A Manual Bar-by-Bar Tempo Measurement Protocol for Polyphonic Chamber Music Recordings: Design, Validation, and Application to Beethoven's Piano and Cello Sonatas https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-a-manual-bar-by-bar-tempo-measurement-protocol/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-a-manual-bar-by-bar-tempo-measurement-protocol/ 本文旨在解决现有自动化节拍提取工具在分析历史复调室内乐录音（特别是贝多芬钢琴与大提琴奏鸣曲）时出现的系统性失败问题。作者与一名VLSI工程师合作，设计并验证了一套形式化的手动逐小节速度测量协议。该协议 Adaptive Test-Time Scaling for Zero-Shot Respiratory Audio Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-adaptive-test-time-scaling-for-zero-shot/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-adaptive-test-time-scaling-for-zero-shot/ 本文旨在解决零样本呼吸音频分类中“一刀切”的推理计算浪费问题。为此，提出了TRIAGE框架，这是一个三层自适应推理管道：第一层（Tier-L）进行快速的标签-文本相似度匹配；若置信度不足则升级至第二层 An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-an-ultra-low-latency-end-to-end-streaming-speech/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-an-ultra-low-latency-end-to-end-streaming-speech/ 这篇论文旨在解决实时交互式语音合成中**推理延迟高**与**声学质量（尤其是高频细节）易受损**的核心矛盾。传统流水线依赖计算密集的神经声码器进行波形重建，且基于连续回归的声学模型易导致频谱过平滑。为 Audio Source Separation in Reverberant Environments using $β$-divergence based Nonnegative Factorization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-audio-source-separation-in-reverberant/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-audio-source-separation-in-reverberant/ 本文针对混响环境下的多通道音频源分离问题，提出了一种基于β-散度非负因子分解的参数估计新方法。传统方法依赖期望最大化（EM）算法估计源频谱方差和空间协方差矩阵，本文则利用包含源频谱先验信息的基矩阵（可 Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-audio-cogito-towards-deep-audio-reasoning-in/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-audio-cogito-towards-deep-audio-reasoning-in/ 这篇论文旨在解决大型音频语言模型（LALMs）在复杂音频推理任务上能力不足且依赖昂贵闭源数据的问题。作者提出了一个名为**Audio-Cogito**的全开源解决方案，其核心是**Cogito-Pip AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-avid-a-benchmark-for-omni-modal-audio-visual/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-avid-a-benchmark-for-omni-modal-audio-visual/ 这篇论文旨在解决当前全模态大模型在音视频不一致性理解能力上缺乏系统性评估的问题。现有基准要么只关注音视频对齐事件，要么局限于检测深度伪造中的低级伪影，无法评估模型对长视频中语义级矛盾的理解。为此，作者 Beyond Transcription: Unified Audio Schema for Perception-Aware AudioLLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-beyond-transcription-unified-audio-schema-for/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-beyond-transcription-unified-audio-schema-for/ 这篇论文旨在解决当前音频大语言模型（AudioLLMs）在细粒度声学感知任务上表现不佳的核心问题。作者指出，主流的以自动语音识别（ASR）为中心的训练范式，通过将音频映射到纯文本转录，系统性地丢弃了副 ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-claricodec-optimising-neural-speech-codes-for/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-claricodec-optimising-neural-speech-codes-for/ 这篇论文旨在解决卫星、水下等极端带宽受限场景下（如200bps）语音通信清晰度严重下降的问题。传统编解码器以波形重建为目标，在超低比特率下会将宝贵的比特分配给不必要的声学细节，而非核心语义信息。为此， Classical Machine Learning Baselines for Deepfake Audio Detection on the Fake-or-Real Dataset https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-classical-machine-learning-baselines-for-deepfake/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-classical-machine-learning-baselines-for-deepfake/ 本文旨在解决深度伪造音频检测领域缺乏透明、可解释基线的问题。研究团队采用经典机器学习方法，在Fake-or-Real (FoR) 数据集上构建了一个完整的检测流程。他们从高保真（44.1 kHz）和电 Comparison of window shapes and lengths in short-time feature extraction for classification of heart sound signals https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-comparison-of-window-shapes-and-lengths-in-short/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-comparison-of-window-shapes-and-lengths-in-short/ 本文针对心音信号（PCG）分类任务中，因信号非-stationarity而采用滑动窗口分段提取特征时，窗函数形状和长度选择缺乏系统性研究的问题，进行了一项实验性评估。作者使用双向长短期记忆网络（biL Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-contextual-biasing-for-asr-in-speech-llm-with/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-contextual-biasing-for-asr-in-speech-llm-with/ 这篇论文旨在解决语音大模型（SLLM）在识别训练数据中稀有或未见的“偏置词”时性能不佳的问题。传统方法依赖于为偏置词提供精确的音素序列（通过G2P系统生成），但这对用户有专业要求且工具兼容性差。为此， ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-controlfoley-unified-and-controllable-video-to/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-controlfoley-unified-and-controllable-video-to/ 本文提出了ControlFoley，一个统一且可控的视频到音频生成框架，旨在解决现有方法在跨模态冲突下文本控制力弱、以及参考音频控制中音色与时间信息纠缠的问题。其核心贡献包括：1）提出联合视觉编码范式 CoSyncDiT: Cognitive Synchronous Diffusion Transformer for Movie Dubbing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-cosyncdit-cognitive-synchronous-diffusion/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-cosyncdit-cognitive-synchronous-diffusion/ 本文针对电影配音（视觉语音克隆）中音色保真度与唇形同步难以兼得的痛点，提出了一种基于流匹配的认知同步扩散Transformer（CoSyncDiT）框架。该方法受专业配音员认知过程启发，将噪声到语音的 Diffusion Language Models for Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-diffusion-language-models-for-speech-recognition/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-diffusion-language-models-for-speech-recognition/ 这篇论文探索了将扩散语言模型（DLM）应用于自动语音识别（ASR）任务的新方法。其核心目标是利用扩散模型的双向注意和并行生成能力，来提升基于传统编码器（如CTC）生成的ASR候选假设的准确性。论文主要 Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-dual-axis-generative-reward-model-toward-semantic/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-dual-axis-generative-reward-model-toward-semantic/ 本文旨在解决全双工语音对话模型（SDMs）实现类人交互的核心挑战。现有自动化评估指标流于表面（如统计行为或预测时机准确率），无法为强化学习提供可靠的奖励信号，而人工评估成本高昂且难以扩展。为此，作者提 Elastic Net Regularization and Gabor Dictionary for Classification of Heart Sound Signals using Deep Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-elastic-net-regularization-and-gabor-dictionary/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-elastic-net-regularization-and-gabor-dictionary/ 本文旨在解决心音信号（PCG）的多分类问题，以辅助心血管疾病的自动诊断。核心贡献在于提出了一套结合**优化Gabor字典**和**弹性网络正则化**的特征提取框架，并与**CNN-LSTM深度学习网络 Enhancing time-frequency resolution with optimal transport and barycentric fusion of multiple spectrogram https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-enhancing-time-frequency-resolution-with-optimal/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-enhancing-time-frequency-resolution-with-optimal/ **核心问题**：短时傅里叶变换（STFT）生成的谱图受制于不确定性原理，无法同时获得优异的时间和频率分辨率。传统融合方法（如几何平均）要求输入谱图网格对齐，且性能有限。 **核心方法**：本文提出一 Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-few-shot-and-pseudo-label-guided-speech-quality/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-few-shot-and-pseudo-label-guided-speech-quality/ 本文旨在解决非侵入式语音质量评估在标注数据有限场景下的性能瓶颈。作者提出了GatherMOS框架，其核心是将大语言模型（如GPT-5）作为一个元评估器，通过精心设计的文本提示，融合多类异构信号：包括手 Four Decades of Digital Waveguides https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-four-decades-of-digital-waveguides/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-four-decades-of-digital-waveguides/ 这篇论文旨在全面回顾数字波导物理建模技术自诞生以来四十年的发展历程、核心应用与最新进展。它要解决的核心问题是，如何在保证物理模拟准确性的同时，实现声波传播模拟的高效计算，以满足实时音频处理（如虚拟乐器 From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-from-reactive-to-proactive-assessing-the/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-from-reactive-to-proactive-assessing-the/ 本文旨在解决当前语音代理评估中过度关注被动响应，而忽视其主动交互能力的问题。为此，作者提出了首个专门评估主动语音代理的基准测试框架 **ProVoice-Bench**。该框架包含四个新颖的任务，用以 Geo2Sound: A Scalable Geo-Aligned Framework for Soundscape Generation from Satellite Imagery https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-geo2sound-a-scalable-geo-aligned-framework-for/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-geo2sound-a-scalable-geo-aligned-framework-for/ 这篇论文提出了一个名为 **Geo2Sound** 的新任务和框架，旨在从卫星图像生成地理上一致且逼真的声音景观。**要解决的问题**是现有图像到音频模型在处理自上而下的卫星视图时面临三大挑战：缺乏结 Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-hijacking-large-audio-language-models-via-context/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-hijacking-large-audio-language-models-via-context/ 这篇论文揭示了针对音频大语言模型（LALM）的一种新型安全威胁：**上下文无关且不可感知的音频提示注入攻击**。攻击者仅需篡改输入音频数据（如会议录音、音乐片段），即可在用户不知情的情况下，劫持模型行 Listen, Pause, and Reason: Toward Perception-Grounded Hybrid Reasoning for Audio Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-listen-pause-and-reason-toward-perception/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-listen-pause-and-reason-toward-perception/ 本文旨在解决大型音频语言模型在复杂音频场景中因感知错误导致的推理失败问题。受听觉场景分析启发，作者提出了一个感知接地的混合推理框架。首先，他们构建了一个名为PAQA的新数据集，通过层次化解耦策略（区分 Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-listening-deepfake-detection-a-new-perspective/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-listening-deepfake-detection-a-new-perspective/ 本文首次提出了“聆听深度伪造检测”这一新任务，旨在识别视频中人物在倾听状态下（非说话时）的伪造反应，弥补了现有研究主要集中于“说话”场景的不足。为解决此任务数据稀缺的问题，作者构建了首个专门数据集Li MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-moshirag-asynchronous-knowledge-retrieval-for/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-moshirag-asynchronous-knowledge-retrieval-for/ 本文提出了MoshiRAG，这是首个集成检索增强生成功能的全双工语音语言模型。**要解决的问题**是全双工语音模型在保持实时交互性的同时，事实准确性不足的挑战。**核心方法**是基于Moshi模型，设 On the Distillation Loss Functions of Speech VAE for Unified Reconstruction, Understanding, and Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-on-the-distillation-loss-functions-of-speech-vae/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-on-the-distillation-loss-functions-of-speech-vae/ 本文针对现有语音变分自编码器（VAE）在统一语音重建、理解和生成任务上表现不平衡的问题（尤其是理解能力差），系统性地研究了蒸馏损失函数的设计空间。作者探索了三种将自监督学习（SSL）模型知识蒸馏到VA ProSDD: Learning Prosodic Representations for Speech Deepfake Detection against Expressive and Emotional Attacks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-prosdd-learning-prosodic-representations-for/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-prosdd-learning-prosodic-representations-for/ 这篇论文旨在解决当前语音深度伪造检测（SDD）系统在面对富有表现力和情感的合成语音攻击时泛化能力不足的核心问题。现有方法过度依赖伪造数据，容易学习数据集特定的伪影，而非自然语音的可迁移特征。为此，作者 Room compensation for loudspeaker reproduction using a supporting source https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-room-compensation-for-loudspeaker-reproduction/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-room-compensation-for-loudspeaker-reproduction/ 本文针对传统房间补偿技术仅能修正频谱（音色）而无法控制空间感知（如距离感）的局限，提出了一种创新的补偿方法。该方法通过引入一个延迟的、经过频谱滤波的辅助扬声器，选择性地向房间的混响声场中添加能量，从而 Sky-Ear: An Unmanned Aerial Vehicle-Enabled Victim Sound Detection and Localization System https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-sky-ear-an-unmanned-aerial-vehicle-enabled-victim/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-sky-ear-an-unmanned-aerial-vehicle-enabled-victim/ 本文针对无人机搜救任务中视觉系统受遮蔽、能耗高的问题，提出了一个名为“Sky-Ear”的音频驱动受害者检测与定位系统。核心方法是设计了一个基于环形麦克风阵列的两阶段处理框架：在“哨兵阶段”，系统利用单 SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-speakerrpl-v2-robust-open-set-speaker/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-speakerrpl-v2-robust-open-set-speaker/ 本文旨在解决开放集说话人识别中的鲁棒性问题，即系统在仅有少量目标说话人注册样本的情况下，需同时准确识别已知说话人并可靠拒识未知说话人。作者在先前SpeakerRPL V1框架基础上提出了三项关键改进： SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-spotsound-enhancing-large-audio-language-models/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-spotsound-enhancing-large-audio-language-models/ 本文旨在解决大型音频语言模型在**细粒度音频事件时间定位**上的不足。现有模型因训练数据缺乏精确时间戳、基准测试过于简单，导致在长音频中定位短暂事件（“大海捞针”）时表现不可靠。为此，作者提出了**S StreamMark: A Deep Learning-Based Semi-Fragile Audio Watermarking for Proactive Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-streammark-a-deep-learning-based-semi-fragile/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-streammark-a-deep-learning-based-semi-fragile/ 本文针对生成式AI带来的音频深度伪造威胁，提出了一种名为StreamMark的主动防御框架。该框架是一种基于深度学习的半脆弱音频水印系统，其核心创新在于重新定义了水印的目标：不是追求对所有变换的绝对鲁 TokenSE: a Mamba-based discrete token speech enhancement framework for cochlear implants https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-tokense-a-mamba-based-discrete-token-speech/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-tokense-a-mamba-based-discrete-token-speech/ 本文针对人工耳蜗用户在噪声和混响环境下语音理解困难的问题，提出了一种名为TokenSE的语音增强框架。该框架的核心创新在于将语音增强任务从传统的时频域或波形域转换到神经音频编解码器的离散令牌空间中进行 Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-tora3-trajectory-guided-audio-video-generation/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-tora3-trajectory-guided-audio-video-generation/ 本文针对现有音视频（AV）生成模型中存在的运动不真实、声音与运动事件不同步、声音强度与运动强度不匹配等问题，提出了Tora3框架。其核心创新在于**将物体轨迹视为连接视觉与听觉模态的共享运动学先验** Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-towards-fine-grained-temporal-perception-post/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-towards-fine-grained-temporal-perception-post/ 这篇论文旨在解决大型音频语言模型（LALM）在细粒度时间感知（如精确定位声音事件的起止时间）上的不足。作者提出了**TimePro-RL**框架，其核心是两步走策略：首先，提出**音频侧时间提示（AS Transformer Based Machine Fault Detection From Audio Input https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-transformer-based-machine-fault-detection-from/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-transformer-based-machine-fault-detection-from/ 本文旨在探讨基于Transformer的架构在机器故障音频检测任务上相对于传统卷积神经网络（CNN）的潜在优势。**要解决的问题**是传统CNN在处理频谱图时固有的局部性和平移不变性等归纳偏置，可能并 UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-unipase-a-generative-model-for-universal-speech/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-unipase-a-generative-model-for-universal-speech/ 这篇论文旨在解决通用语音增强（USE）中生成模型面临的“高感知质量”与“低内容幻觉”难以兼得的核心矛盾。作者提出了UniPASE框架，它扩展了其先前的低幻觉PASE模型，以处理包括噪声、混响、丢包、风 VoxEffects: A Speech-Oriented Audio Effects Dataset and Benchmark https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-voxeffects-a-speech-oriented-audio-effects/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-voxeffects-a-speech-oriented-audio-effects/ 本文旨在解决语音处理中一个基础但被忽视的问题：如何系统化地识别语音音频所经过的后期处理效果及其参数。现实中，语音几乎都经过了降噪、压缩等效果处理，但现有数据集缺乏此类精确标注，阻碍了相关研究。为此，作 VoxSafeBench: Not Just What Is Said, but Who, How, and Where https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-voxsafebench-not-just-what-is-said-but-who-how/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-voxsafebench-not-just-what-is-said-but-who-how/ 这篇论文旨在解决一个关键问题：当语音大模型（SLM）进入多用户共享环境时，仅基于文本内容的安全对齐策略是不足的，说话人身份、副语言特征和声学场景等音频上下文信息会根本性地改变请求的性质。为此，作者提出 WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-wavalign-enhancing-intelligence-and/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-wavalign-enhancing-intelligence-and/ 这篇论文旨在解决端到端语音对话模型在智能（IQ）和表达力（EQ）上难以同时提升的核心挑战。作者发现，直接对混合文本-语音序列应用统一的偏好优化（如DPO、GRPO）会导致问题：稀疏的偏好信号被淹没在密 Who is Speaking or Who is Depressed? A Controlled Study of Speaker Leakage in Speech-Based Depression Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-who-is-speaking-or-who-is-depressed-a-controlled/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-who-is-speaking-or-who-is-depressed-a-controlled/ 这篇论文的核心贡献在于系统性地揭示并量化了语音抑郁症检测模型中普遍存在的“说话人身份泄露”问题。作者指出，当前许多报告高准确率的模型，其性能可能严重依赖于对说话人身份（声纹）的记忆，而非对抑郁相关声学 Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-why-your-tokenizer-fails-in-information-fusion-a/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-why-your-tokenizer-fails-in-information-fusion-a/ 这篇论文深入探讨了在端到端音频语言模型中，将视觉信息融入音频分词器时普遍存在的“理解提升但重建质量下降”的核心矛盾。作者通过系统性实验，揭示了三个关键发现：融合位置（在量化前还是量化后）至关重要；在离 X-VC: Zero-shot Streaming Voice Conversion in Codec Space https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-x-vc-zero-shot-streaming-voice-conversion-in/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-x-vc-zero-shot-streaming-voice-conversion-in/ 这篇论文旨在解决零样本语音转换中**高保真说话人迁移**与**低延迟流式推理**难以兼得的核心挑战。作者提出了**X-VC**系统，其核心创新在于**在预训练神经编解码器（SAC）的潜在空间中进行一步语音/音频论文速递 2026-04-19 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19/ 共分析 42 篇语音/AI 论文语音/音频论文速递 2026-04-18 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-18/ Sat, 18 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-18/ 共分析 39 篇语音/AI 论文