模型评估 on 语音/音频论文速递

模型评估 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E6%A8%A1%E5%9E%8B%E8%AF%84%E4%BC%B0/ Recent content in 模型评估 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Bayesian Approach to Singing Skill Evaluation Using Semitone Pitch Histogram and MCMC-Based Generated Quantities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bayesian-approach-to-singing-skill-evaluation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bayesian-approach-to-singing-skill-evaluation/ 音乐理解 | 7.0/10 A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-dataset-of-robot-patient-and-doctor-patient/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-dataset-of-robot-patient-and-doctor-patient/ 语音对话系统 | 7.5/10 A Superb-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-superb-style-benchmark-of-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-superb-style-benchmark-of-self-supervised/ 音频深度伪造检测 | 7.0/10 A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-text-to-text-alignment-algorithm-for-better/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-text-to-text-alignment-algorithm-for-better/ 模型评估 | 7.5/10 Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-non-stationarity-objective-assessment/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-non-stationarity-objective-assessment/ 音频分类 | 7.0/10 Adaptive Spectral Weighting in Sagittal-Plane Sound Localization: A Reliability-Driven Approach https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-spectral-weighting-in-sagittal-plane/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-adaptive-spectral-weighting-in-sagittal-plane/ 声源定位 | 6.5/10 Aligning Generative Speech Enhancement with Perceptual Feedback https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aligning-generative-speech-enhancement-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aligning-generative-speech-enhancement-with/ 语音增强 | 7.5/10 AR&D: A Framework for Retrieving and Describing Concepts for Interpreting AudioLLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ard-a-framework-for-retrieving-and-describing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ard-a-framework-for-retrieving-and-describing/ 音频大模型 | 6.5/10 Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-identity-leakage-in-talking-face/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-assessing-identity-leakage-in-talking-face/ 说话人脸生成 | 7.5/10 Attention-Based Encoder-Decoder Target-Speaker Voice Activity Detection for Robust Speaker Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-based-encoder-decoder-target-speaker/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attention-based-encoder-decoder-target-speaker/ 说话人分离 | 8.0/10 Attentive AV-Fusionnet: Audio-Visual Quality Prediction with Hybrid Attention https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attentive-av-fusionnet-audio-visual-quality/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-attentive-av-fusionnet-audio-visual-quality/ 音视频 | 7.0/10 Auditory Illusion Benchmark for Large Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auditory-illusion-benchmark-for-large-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-auditory-illusion-benchmark-for-large-audio/ 模型评估 | 7.0/10 Automatic Estimation of Speaker Diarization Error Rate Based on Features of Audio Quality and Speaker Discriminability https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-estimation-of-speaker-diarization-error/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-automatic-estimation-of-speaker-diarization-error/ 说话人分离 | 7.5/10 AVO-65: A Large-Scale Hierarchical Audio-Visual Object Dataset https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-avo-65-a-large-scale-hierarchical-audio-visual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-avo-65-a-large-scale-hierarchical-audio-visual/ 音视频 | 7.0/10 Benchmarking Humans And Machines On Complex Multilingual Speech Understanding Tasks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-humans-and-machines-on-complex/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-humans-and-machines-on-complex/ 音频问答 | 7.5/10 Benchmarking Music Autotagging with MGPHot Expert Annotations vs. Generic Tag Datasets https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-music-autotagging-with-mgphot-expert/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-benchmarking-music-autotagging-with-mgphot-expert/ 音乐信息检索 | 7.5/10 Break-the-Beat! Controllable MIDI-to-Drum audio synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-break-the-beat-controllable-midi-to-drum-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-break-the-beat-controllable-midi-to-drum-audio/ 音乐生成 | 7.5/10 BridgeCode: A Dual Speech Representation Paradigm for Autoregressive Zero-Shot Text-to-Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridgecode-a-dual-speech-representation-paradigm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bridgecode-a-dual-speech-representation-paradigm/ 语音合成 | 8.0/10 BSMP-SENet:Band-Split Magnitude-Phase Network for Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bsmp-senetband-split-magnitude-phase-network-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bsmp-senetband-split-magnitude-phase-network-for/ 语音增强 | 7.0/10 Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-can-hierarchical-cross-modal-fusion-predict-human/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-can-hierarchical-cross-modal-fusion-predict-human/ 模型评估 | 6.0/10 ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-clawmark-a-living-world-benchmark-for-multi-turn/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-clawmark-a-living-world-benchmark-for-multi-turn/ 基准测试 | 7.0/10 Combining Multi-Order Attention and Multi-Resolution Discriminator for High-Fidelity Neural Vocoder https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-multi-order-attention-and-multi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-combining-multi-order-attention-and-multi/ 语音合成 | 6.5/10 Content Leakage in Librispeech and its Impact on the Privacy Evaluation of Speaker Anonymization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-leakage-in-librispeech-and-its-impact-on/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-content-leakage-in-librispeech-and-its-impact-on/ 语音匿名化 | 7.5/10 Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cutscene-agent-an-llm-agent-framework-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cutscene-agent-an-llm-agent-framework-for/ 生成模型 | 8.5/10 DISSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissr-disentangling-speech-representation-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dissr-disentangling-speech-representation-for/ 语音增强 | 7.5/10 Do Bias Benchmarks Generalise? Evidence from Voice-Based Evaluation of Gender Bias in Speechllms https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-bias-benchmarks-generalise-evidence-from-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-bias-benchmarks-generalise-evidence-from-voice/ 模型评估 | 8.0/10 Do Speech LLMs Learn Crossmodal Embedding Spaces? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-speech-llms-learn-crossmodal-embedding-spaces/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-speech-llms-learn-crossmodal-embedding-spaces/ 音频检索 | 6.5/10 Do You Hear What I Mean? Quantifying the Instruction-Perception GAP in Instruction-Guided Expressive Text-to-Speech Systems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-you-hear-what-i-mean-quantifying-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-do-you-hear-what-i-mean-quantifying-the/ 语音合成 | 8.0/10 DSRMS-TransUnet: A Decentralized Non-Shifted Transunet for Shallow Water Acoustic Source Range Estimation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dsrms-transunet-a-decentralized-non-shifted/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-dsrms-transunet-a-decentralized-non-shifted/ 声源定位 | 8.0/10 Efficient Audio-Visual Inference Via Token Clustering And Modality Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-audio-visual-inference-via-token/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-audio-visual-inference-via-token/ 音频问答 | 7.5/10 Enhancing Speech Intelligibility Prediction for Hearing Aids with Complementary Speech Foundation Model Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speech-intelligibility-prediction-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enhancing-speech-intelligibility-prediction-for/ 语音增强 | 7.5/10 Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-bias-in-spoken-dialogue-llms-for-real/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-bias-in-spoken-dialogue-llms-for-real/ 模型评估 | 7.0/10 Evaluating Compositional Structure in Audio Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-compositional-structure-in-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-compositional-structure-in-audio/ 模型评估 | 7.0/10 Evaluating Disentangled Representations for Controllable Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-disentangled-representations-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-disentangled-representations-for/ 音乐生成 | 7.5/10 Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-emotion-recognition-in-spoken-language/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-emotion-recognition-in-spoken-language/ 语音情感识别 | 7.5/10 Evaluating High-Resolution Piano Sustain Pedal Depth Estimation with Musically Informed Metrics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-high-resolution-piano-sustain-pedal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-high-resolution-piano-sustain-pedal/ 音乐信息检索 | 8.0/10 Evaluating Pretrained Speech Embedding Systems for Dysarthria Detection Across Heterogenous Datasets https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-pretrained-speech-embedding-systems/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-pretrained-speech-embedding-systems/ 语音生物标志物 | 7.5/10 Exploring How Audio Effects Alter Emotion with Foundation Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-how-audio-effects-alter-emotion-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-how-audio-effects-alter-emotion-with/ 音乐理解 | 7.0/10 Fine-Grained Frame Modeling in Multi-Head Self-Attention for Speech Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-grained-frame-modeling-in-multi-head-self/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-grained-frame-modeling-in-multi-head-self/ 语音伪造检测 | 8.0/10 FUSEMOS: Perceptual Evaluation of Text-to-Music Generation with Dual-Encoder Fusion and Ranking-Aware Composite Loss https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fusemos-perceptual-evaluation-of-text-to-music/ 音乐生成 | 7.5/10 Game-Time: Evaluating Temporal Dynamics in Spoken Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-game-time-evaluating-temporal-dynamics-in-spoken/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-game-time-evaluating-temporal-dynamics-in-spoken/ 语音对话系统 | 7.5/10 Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hashing-baseline-rethinking-hashing-in-the-age-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hashing-baseline-rethinking-hashing-in-the-age-of/ 音频检索音频分类 | 8.0/10 HD-PPT: Hierarchical Decoding of Content- and Prompt-Preference Tokens for Instruction-Based TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hd-ppt-hierarchical-decoding-of-content-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hd-ppt-hierarchical-decoding-of-content-and/ 语音合成 | 8.0/10 How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-to-label-resynthesized-audio-the-dual-role-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-to-label-resynthesized-audio-the-dual-role-of/ 音频深度伪造检测 | 7.5/10 ICASSP 2026 - 模型评估论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-033/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-033/ 共 16 篇 ICASSP 2026 模型评估方向论文 Identifying the Minimal and Maximal Phonetic Subspace of Speech Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identifying-the-minimal-and-maximal-phonetic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identifying-the-minimal-and-maximal-phonetic/ 语音识别 | 8.0/10 Identity Leakage Through Accent Cues in Voice Anonymisation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identity-leakage-through-accent-cues-in-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-identity-leakage-through-accent-cues-in-voice/ 语音匿名化 | 7.0/10 Improving the Speaker Anonymization Evaluation’s Robustness to Target Speakers with Adversarial Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-the-speaker-anonymization-evaluations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-the-speaker-anonymization-evaluations/ 语音匿名化 | 7.5/10 Integrating Speaker Embeddings and LLM-Derived Semantic Representations for Streaming Speaker Diarization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-integrating-speaker-embeddings-and-llm-derived/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-integrating-speaker-embeddings-and-llm-derived/ 说话人分离 | 6.5/10 Interpretable Music Harmonic Analysis Through Multilinear Mixture of Experts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-interpretable-music-harmonic-analysis-through/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-interpretable-music-harmonic-analysis-through/ 音乐理解 | 7.5/10 Investigating Modality Contribution in Audio LLMs for Music https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-investigating-modality-contribution-in-audio-llms/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-investigating-modality-contribution-in-audio-llms/ 模型评估 | 6.5/10 Investigating The Effect Of Sentence-Level Syntactic Structure On Information Loss In The Human Auditory System https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-investigating-the-effect-of-sentence-level/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-investigating-the-effect-of-sentence-level/ 语音识别 | 7.0/10 Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-linearity-in-audio-consistency/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-linearity-in-audio-consistency/ 音频生成 | 7.5/10 Leveraging Large Speech Language Models as Evaluators for Expressive Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-large-speech-language-models-as/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-large-speech-language-models-as/ 语音情感识别 | 6.5/10 Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-multiple-speech-enhancers-for-non/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-multiple-speech-enhancers-for-non/ 模型评估 | 7.5/10 Leveraging prediction entropy for Automatic prompt weighting in Zero-Shot Audio-Language Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-prediction-entropy-for-automatic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-prediction-entropy-for-automatic/ 音频分类 | 7.5/10 Lingometer: On-Device Personal Speech Word Counting System https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/ 语音活动检测 | 8.0/10 LLAC: Learned Lossless Audio Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llac-learned-lossless-audio-codec/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llac-learned-lossless-audio-codec/ 音频无损编码 | 7.5/10 Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-measuring-prosody-diversity-in-zero-shot-tts-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-measuring-prosody-diversity-in-zero-shot-tts-a/ 语音合成 | 8.0/10 Mitigating Language Prior-Induced Hallucinations via Bi-Level Contrastive Decoding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-language-prior-induced-hallucinations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mitigating-language-prior-induced-hallucinations/ 多模态模型 | 7.5/10 Mix2Morph: Learning Sound Morphing from Noisy Mixes https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mix2morph-learning-sound-morphing-from-noisy-mixes/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mix2morph-learning-sound-morphing-from-noisy-mixes/ 音频生成 | 7.5/10 Mixture-of-Experts Based Soft-Label Learning for Multi-Label Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-based-soft-label-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixture-of-experts-based-soft-label-learning-for/ 语音情感识别 | 7.5/10 MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmeb-v3-measuring-the-performance-gaps-of-omni/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmeb-v3-measuring-the-performance-gaps-of-omni/ 基准测试 | 7.5/10 MR-FlowDPO: Multi-Reward Direct Preference Optimization for Flow-Matching Text-to-Music Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mr-flowdpo-multi-reward-direct-preference/ 音乐生成 | 7.5/10 Multi-Layer Attentive Probing Improves Transfer of Audio Representations for Bioacoustics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-layer-attentive-probing-improves-transfer/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-layer-attentive-probing-improves-transfer/ 生物声学 | 7.5/10 Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-nemotron-3-nano-omni-efficient-and-open/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-nemotron-3-nano-omni-efficient-and-open/ 多模态模型 | 8.5/10 Optimizing Speech Language Models for Acoustic Consistency https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-speech-language-models-for-acoustic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-optimizing-speech-language-models-for-acoustic/ 语音合成 | 8.0/10 Perceptual Quality Assessment for Stylized Talking Heads https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-perceptual-quality-assessment-for-stylized/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-perceptual-quality-assessment-for-stylized/ 模型评估 | 7.5/10 Pianoroll-Event: A Novel Score Representation for Symbolic Music https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pianoroll-event-a-novel-score-representation-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-pianoroll-event-a-novel-score-representation-for/ 音乐生成 | 6.5/10 Probing Whisper for Dysarthric Speech in Detection and Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-whisper-for-dysarthric-speech-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-whisper-for-dysarthric-speech-in/ 语音生物标志物 | 6.5/10 PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-psp-an-interpretable-per-dimension-accent/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-psp-an-interpretable-per-dimension-accent/ 基准测试 | 7.5/10 Qastanet: A DNN-Based Quality Metric for Spatial Audio https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qastanet-a-dnn-based-quality-metric-for-spatial/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-qastanet-a-dnn-based-quality-metric-for-spatial/ 空间音频 | 7.5/10 RAS: a Reliability Oriented Metric for Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ras-a-reliability-oriented-metric-for-automatic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ras-a-reliability-oriented-metric-for-automatic/ 语音识别 | 7.5/10 Reference-Aware SFM Layers for Intrusive Intelligibility Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-aware-sfm-layers-for-intrusive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reference-aware-sfm-layers-for-intrusive/ 语音评估 | 7.5/10 Reliable AI via Age-Balanced Validation: Fair Model Selection for Parkinson’s Detection from Voice https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reliable-ai-via-age-balanced-validation-fair/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-reliable-ai-via-age-balanced-validation-fair/ 语音生物标志物 | 7.5/10 RHO-PERFECT: Correlation Ceiling for Subjective Evaluation Datasets https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rho-perfect-correlation-ceiling-for-subjective/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rho-perfect-correlation-ceiling-for-subjective/ 模型评估 | 7.5/10 Scalable Evaluation for Audio Identification Via Synthetic Latent Fingerprint Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scalable-evaluation-for-audio-identification-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scalable-evaluation-for-audio-identification-via/ 音频检索 | 7.0/10 Segmentwise Pruning in Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-segmentwise-pruning-in-audio-language-models/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-segmentwise-pruning-in-audio-language-models/ 音频问答 | 7.0/10 SFM-TTS: Lightweight and Rapid Speech Synthesis with Flexible Shortcut Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sfm-tts-lightweight-and-rapid-speech-synthesis/ 语音合成 | 7.0/10 Sing What You Fit: A Perception-Based Dataset and Benchmark for Vocal-Song Suitability Analysis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sing-what-you-fit-a-perception-based-dataset-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sing-what-you-fit-a-perception-based-dataset-and/ 音乐信息检索 | 7.0/10 SingMOS-Pro: An Comprehensive Benchmark For Singing Quality Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-singmos-pro-an-comprehensive-benchmark-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-singmos-pro-an-comprehensive-benchmark-for/ 歌唱语音合成 | 7.5/10 SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sp-mcqa-evaluating-intelligibility-of-tts-beyond/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sp-mcqa-evaluating-intelligibility-of-tts-beyond/ 语音合成 | 7.0/10 SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spade-structured-pruning-and-adaptive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spade-structured-pruning-and-adaptive/ 语音合成 | 7.5/10 SPAM: Style Prompt Adherence Metric for Prompt-Based TTS https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spam-style-prompt-adherence-metric-for-prompt/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spam-style-prompt-adherence-metric-for-prompt/ 语音合成 | 7.0/10 Sparse Autoencoders Make Audio Foundation Models More Explainable https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sparse-autoencoders-make-audio-foundation-models/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sparse-autoencoders-make-audio-foundation-models/ 模型评估 | 6.5/10 Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-quality-based-localization-of-low-quality/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speech-quality-based-localization-of-low-quality/ 语音质量评估 | 7.0/10 Step-Audio-R1.5 Technical Report https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-step-audio-r15-technical-report/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-step-audio-r15-technical-report/ 语音对话系统 | 8.0/10 Streamingbench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streamingbench-assessing-the-gap-for-mllms-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-streamingbench-assessing-the-gap-for-mllms-to/ 基准测试 | 7.5/10 StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylebench-evaluating-speech-language-models-on/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stylebench-evaluating-speech-language-models-on/ 基准测试 | 8.5/10 SwitchCodec: Adaptive Residual-Expert Sparse Quantization for High-Fidelity Neural Audio Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-switchcodec-adaptive-residual-expert-sparse/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-switchcodec-adaptive-residual-expert-sparse/ 音频生成 | 8.5/10 Symphony Rendering: Midi and Composer-Conditioned Auto Orchestration with Flow-Matching Transformers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphony-rendering-midi-and-composer-conditioned/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-symphony-rendering-midi-and-composer-conditioned/ 音乐生成 | 7.0/10 Synthetic yet Striking? Assessing Vocal Charisma in TTS via Perceptual and Algorithmic Measures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-yet-striking-assessing-vocal-charisma/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthetic-yet-striking-assessing-vocal-charisma/ 语音合成 | 7.5/10 TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tau-a-benchmark-for-cultural-sound-understanding/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tau-a-benchmark-for-cultural-sound-understanding/ 音频问答 | 7.5/10 Test-Time Scaling for Auditory Cognition in Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-scaling-for-auditory-cognition-in-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-test-time-scaling-for-auditory-cognition-in-audio/ 音频问答 | 7.0/10 The 3rd Clarity Prediction Challenge: A Machine Learning Challenge for Hearing aid Speech Intelligibility Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-3rd-clarity-prediction-challenge-a-machine/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-3rd-clarity-prediction-challenge-a-machine/ 语音增强 | 7.5/10 The Muse Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-muse-benchmark-probing-music-perception-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-muse-benchmark-probing-music-perception-and/ 音乐理解 | 8.5/10 The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-structured-output-benchmark-a-multi-source/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-structured-output-benchmark-a-multi-source/ 基准测试 | 7.0/10 Towards Evaluating Generative Audio: Insights from Neural Audio Codec Embedding Distances https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-evaluating-generative-audio-insights-from/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-evaluating-generative-audio-insights-from/ 模型评估 | 6.5/10 Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-robust-dysarthric-speech-recognition-llm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-robust-dysarthric-speech-recognition-llm/ 语音识别 | 9.0/10 Triad: Tri-Head with Auxiliary Duplicating Permutation Invariant Training for Multi-Task Sound Event Localization and Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triad-tri-head-with-auxiliary-duplicating/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-triad-tri-head-with-auxiliary-duplicating/ 音频事件检测 | 7.5/10 TTA: Transcribe, Translate and Alignment for Cross-Lingual Speech Representation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tta-transcribe-translate-and-alignment-for-cross/ 语音识别 | 7.5/10 Universr: Unified and Versatile Audio Super-Resolution Via Vocoder-Free Flow Matching https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-universr-unified-and-versatile-audio-super/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-universr-unified-and-versatile-audio-super/ 音频超分辨率 | 8.0/10 Unseen but Not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unseen-but-not-unknown-using-dataset-concealment/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unseen-but-not-unknown-using-dataset-concealment/ 语音质量评估 | 8.3/10 Utilizing Information Theoretic Approach to Study Cochlear Neural Degeneration https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-utilizing-information-theoretic-approach-to-study/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-utilizing-information-theoretic-approach-to-study/ 生物声学 | 6.5/10 V2A-DPO: Omni-Preference Optimization for Video-To-Audio Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-v2a-dpo-omni-preference-optimization-for-video-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-v2a-dpo-omni-preference-optimization-for-video-to/ 视频到音频生成 | 7.5/10 Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-walking-through-uncertainty-an-empirical-study-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-walking-through-uncertainty-an-empirical-study-of/ 音频问答 | 7.5/10 WAV2LEV: Predicting Levenshtein Edit Operation Sequences For Fine-Grained Estimation of Automatic Speech Recognition Error https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wav2lev-predicting-levenshtein-edit-operation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wav2lev-predicting-levenshtein-edit-operation/ 语音识别 | 7.5/10 When Noise Lowers the Loss: Rethinking Likelihood-Based Evaluation in Music Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-noise-lowers-the-loss-rethinking-likelihood/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-noise-lowers-the-loss-rethinking-likelihood/ 音乐生成 | 7.0/10 When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-silence-matters-the-impact-of-irrelevant/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-silence-matters-the-impact-of-irrelevant/ 模型评估 | 7.0/10 When Voice Matters: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-voice-matters-a-controlled-study-of-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-when-voice-matters-a-controlled-study-of-audio/ 模型评估 | 7.0/10 Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-why-do-speech-language-models-fail-to-generate/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-why-do-speech-language-models-fail-to-generate/ 语音生成 | 7.0/10 Z-Scores: A Metric for Linguistically Assessing Disfluency Removal https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-z-scores-a-metric-for-linguistically-assessing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-z-scores-a-metric-for-linguistically-assessing/ 模型评估 | 6.5/10 All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-all-that-glitters-is-not-audio-rethinking-text/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-all-that-glitters-is-not-audio-rethinking-text/ 音频问答 | 6.5/10 Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-comparison-of-semg-encoding-accuracy-across/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-comparison-of-semg-encoding-accuracy-across/ 语音生物标志物 | 8.0/10 Explainable AI in Speaker Recognition -- Making Latent Representations Understandable https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-explainable-ai-in-speaker-recognition-making/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-explainable-ai-in-speaker-recognition-making/ 说话人识别 | 7.5/10 Audio Video Verbal Analysis (AVVA) for Capturing Classroom Dialogues https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-audio-video-verbal-analysis-avva-for-capturing/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-audio-video-verbal-analysis-avva-for-capturing/ 音频问答 | 6.0/10 Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-identifying-and-typifying-demographic-unfairness/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-identifying-and-typifying-demographic-unfairness/ 语音识别 | 7.0/10 Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-transformer-based-rhythm-quantization-of/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-transformer-based-rhythm-quantization-of/ 音乐信息检索 | 8.0/10 "This Wasn't Made for Me": Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-this-wasnt-made-for-me-recentering-user/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-this-wasnt-made-for-me-recentering-user/ 语音识别 | 7.0/10 AUDITA: A New Dataset to Audit Humans vs. AI Skill at Audio QA https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-audita-a-new-dataset-to-audit-humans-vs-ai-skill/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-audita-a-new-dataset-to-audit-humans-vs-ai-skill/ 音频问答 | 6.5/10 Evaluation of Automatic Speech Recognition Using Generative Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-evaluation-of-automatic-speech-recognition-using/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-evaluation-of-automatic-speech-recognition-using/ 语音识别 | 7.5/10 Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-preferences-of-a-voice-first-nation-large-scale/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-preferences-of-a-voice-first-nation-large-scale/ 语音合成 | 7.5/10 Time vs. Layer: Locating Predictive Cues for Dysarthric Speech Descriptors in wav2vec 2.0 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-time-vs-layer-locating-predictive-cues-for/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-time-vs-layer-locating-predictive-cues-for/ 语音生物标志物 | 7.0/10 Aligning Stuttered-Speech Research with End-User Needs: Scoping Review, Survey, and Guidelines https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-aligning-stuttered-speech-research-with-end-user/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-aligning-stuttered-speech-research-with-end-user/ 1. **问题**：当前口吃语音技术研究与口吃者（PWS）及言语语言病理学家（SLP）的实际需求存在系统性脱节，研究重点、任务定义和评估方法未能充分以用户为中心。 2. **方法核心**：通过两部分结合分析：1）对228篇相关论文进行范围综述，提出研究任务分类法并分析研究现状；2）对70名利益相 Centering Ecological Goals in Automated Identification of Individual Animals https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-centering-ecological-goals-in-automated/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-centering-ecological-goals-in-automated/ 这篇论文旨在解决一个关键问题：为什么近年来在动物个体自动识别（基于图像或声音）上报告的高准确率算法，却很少转化为生态学实践中的常规工具？其方法核心是提出一个“以生态目标为中心”的评估与部署框架，强调自动化识别的有用性取决于其服务的具体生态问题、可用数据以及错误类型带来的实际后果。与以往主要关注算法准 Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-embedding-based-intrusive-evaluation-metrics-for/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-embedding-based-intrusive-evaluation-metrics-for/ 1. **问题**：音乐源分离（MSS）领域常用的客观评估指标（BSS-Eval）与人类感知评分相关性较低，导致模型评估不够准确。 2. **方法核心**：提出两种基于嵌入的侵入式评估指标：在预训练MERT模型的嵌入空间上计算目标与分离信号的均方误差（MSE_MERT）和一种逐曲目的Fréchet音 FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-flip-towards-understanding-and-interpreting/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-flip-towards-understanding-and-interpreting/ 这篇论文旨在解决对多语言、多模态句子嵌入（如SONAR, LaBSE）的可解释性问题。核心方法是提出一种称为因子化线性投影（FLiP）的模型，通过将嵌入向量线性投影到词汇表空间来提取关键词，以此作为理解嵌入内容的代理任务。与之前非因子化的线性探测方法（如LiP）和SpLiCE相比，FLiP在关键词提 ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-onote-benchmarking-omnimodal-notation-processing/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-onote-benchmarking-omnimodal-notation-processing/ 1. **问题**：当前多模态大模型在音乐符号处理（Omnimodal Notation Processing, ONP）领域存在严重缺陷：研究碎片化、模型存在严重的符号偏差（偏向五线谱）、且普遍依赖不可靠的“LLM-as-a-Judge”评估方法，掩盖了模型在音乐理论推理上的系统性失败。 2. Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-reducing-the-offline-streaming-gap-for-unified/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-reducing-the-offline-streaming-gap-for-unified/ 1. **问题**：训练一个既能高精度离线转录又能低延迟流式识别的统一ASR模型极具挑战性，传统方法在低延迟下性能会急剧下降。 2. **方法核心**：提出一个统一的Transducer框架，结合分块注意力（含右上下文）和动态块卷积（DCConv）来适配两种模式。核心创新是引入了模式一致性正则化损失 SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-speechparaling-bench-a-comprehensive-benchmark/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-speechparaling-bench-a-comprehensive-benchmark/ 1. **问题**：现有大型音频语言模型在副语言（如情绪、语气、音色）生成与理解能力上的评估存在特征覆盖不全、评估方法主观且不可扩展的问题。 2. **方法**：提出了SpeechParaling-Bench，一个包含1000余个中英平行语音查询、覆盖超过100个细粒度副语言特征的综合基准。基准 Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-utterance-level-methods-for-identifying-reliable/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-utterance-level-methods-for-identifying-reliable/ 1. **要解决什么问题**：儿童语音自动识别（ASR）错误率高，影响语言学习、阅读辅助等应用。传统置信度估计方法在噪声大、模式多变的儿童语音上可能失效。需要一种在转录后（utterance级别）自动识别哪些ASR输出是可靠的方法，以减少人工审核负担。 2. **方法核心是什么**：提出两种基于 Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-detecting-hallucinations-in-speechllms-at/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-detecting-hallucinations-in-speechllms-at/ 本文旨在解决语音大模型（SpeechLLMs）在推理时产生的“幻觉”问题，即生成与输入音频不符的流畅文本。现有方法依赖昂贵的黄金标准输出，而文本LLM的方法无法捕捉音频特有信号。为此，作者提出了四个基于注意力图的轻量级指标（AudioRatio, AudioConsistency, AudioEnt HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-halluaudio-a-comprehensive-benchmark-for/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-halluaudio-a-comprehensive-benchmark-for/ 这篇论文旨在解决大型音频语言模型（LALM）中普遍存在的“幻觉”问题（即生成与音频证据不符的内容）缺乏系统性评估工具的难题。为此，作者构建并发布了**HalluAudio**，这是首个大规模、多领域（语音、环境声、音乐）、多任务（二分类、多选、属性验证、开放生成）的人工验证音频幻觉检测基准，包含超过 MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-mtr-duplexbench-towards-a-comprehensive/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-mtr-duplexbench-towards-a-comprehensive/ 这篇论文旨在解决当前全双工语音语言模型（FD-SLMs）评测体系的一个关键缺陷：缺乏对多轮、连续对话能力的系统性评估。现有基准多关注单轮交互或特定对话特性（如打断），忽略了模型在多轮语境下维持指令遵循、安全等核心能力的一致性。为此，作者提出了**MTR-DuplexBench**，一个全新的多轮全双 Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/ 这篇论文旨在解决现有印度语言语音识别（Indic ASR）基准不反映真实场景、评估方法不公平的核心问题。为此，作者构建了“Voice of India”大规模基准，其数据源自3.6万名说话者的非脚本化电话对话，覆盖15种主要印度语言和139个地区集群，总计536小时。关键创新在于采用了考虑拼写变体的 A state-space representation of the boundary integral equation for room acoustic modelling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-a-state-space-representation-of-the-boundary/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-a-state-space-representation-of-the-boundary/ 本文旨在解决传统房间声学建模中多种方法（如边界元法、延迟网络、几何声学）彼此独立、缺乏统一理论基础的问题。作者提出了一种名为**边界积分算子状态空间（BIOSS）** 的新框架。该框架的核心是将描述声场的边界积分方程重新表述为一个状态空间模型，其中**状态是房间边界上的声压分布函数**，系统动态由* Anonymization, Not Elimination: Utility-Preserved Speech Anonymization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-anonymization-not-elimination-utility-preserved/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-anonymization-not-elimination-utility-preserved/ 这篇论文针对语音数据隐私保护中“隐私泄露”与“数据效用损失”的核心矛盾，提出了一个新颖的两阶段框架。首先，为解决语音匿名化（保护“谁在说”）中身份多样性不足和可控性差的问题，提出了基于流匹配的说话人嵌入匿名器（F3-VA），它能生成多样且与原始说话人充分分离的新身份。其次，为解决内容匿名化（保护“说 Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-benign-fine-tuning-breaks-safety-alignment-in/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-benign-fine-tuning-breaks-safety-alignment-in/ 这篇论文首次系统研究了**良性音频数据微调对音频大模型安全对齐的破坏性影响**。核心问题是：用户出于提升性能的目的，在完全无害的音频数据上微调模型，是否会意外削弱其拒绝有害指令的能力？作者提出了一个**基于嵌入空间邻近性的过滤框架**，通过计算良性音频与有害音频在模型内部或外部参考编码器空间中的距离 ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-claricodec-optimising-neural-speech-codes-for/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-claricodec-optimising-neural-speech-codes-for/ 本文针对卫星、水下通信等超低比特率（200bps）场景下，传统神经语音编解码器因优化重建质量而牺牲可懂度的问题，提出了ClariCodec。其核心方法是将编码器的量化过程重新定义为一个随机策略，并利用强化学习（RL），以词错率（WER）作为奖励信号对编码器进行微调，而冻结解码器等声学重建管线。实验表 Coexisting Tempo Traditions in Beethoven's Piano and Cello Sonatas: A K-means Clustering Analysis of Recorded Performances, 1930-2012 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-coexisting-tempo-traditions-in-beethovens-piano/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-coexisting-tempo-traditions-in-beethovens-piano/ 本文旨在挑战音乐表演实证研究中普遍使用的单一回归分析模型，该模型常将历史速度变化描绘为一个单向、统一的过程。作者提出，这种模型掩盖了多种演奏传统并存的事实。研究通过对贝多芬五首钢琴与大提琴奏鸣曲（Op. 5, 69, 102）在1930-2012年间超过一百个乐章录音的逐小节速度数据进行K-mean FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-flip-towards-understanding-and-interpreting/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-flip-towards-understanding-and-interpreting/ 本文提出**FLiP**，一种**因子化线性投影模型**，旨在**理解并解释**多语言、多模态句子嵌入空间（如SONAR, LaBSE, Gemini）。核心思想是将嵌入空间的解释转化为一个**线性关键词提取任务**：通过一个简单的线性投影，从句子嵌入向量中恢复出构成该句子的词汇。实验表明，训练良好 From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-from-reactive-to-proactive-assessing-the/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-from-reactive-to-proactive-assessing-the/ 本文旨在解决现有语音代理评估基准主要关注被动响应，而忽略其主动感知与干预能力的问题。作者提出了**ProVoice-Bench**，这是首个专门用于评估主动式语音代理的基准测试框架。该框架通过一个包含数字状态构建、场景合成、对话生成、声学模拟和对话组装的多阶段数据合成管道，构建了包含1182个高质量 Hard to Be Heard: Phoneme-Level ASR Analysis of Phonologically Complex, Low-Resource Endangered Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-hard-to-be-heard-phoneme-level-asr-analysis-of/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-hard-to-be-heard-phoneme-level-asr-analysis-of/ 这篇论文针对两种音系极其复杂、资源极度匮乏的濒危东高加索语言（Archi和Rutul），首次建立了语音识别（ASR）基准。作者们整合并标准化了现有的语言学记录，创建了约50分钟和1小时20分钟的语音-文本数据集。他们评估了多种前沿ASR模型（wav2vec2, Whisper, Qwen2-Audi Incremental learning for audio classification with Hebbian Deep Neural Networks https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-incremental-learning-for-audio-classification/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-incremental-learning-for-audio-classification/ 本文针对音频分类中的增量学习（持续学习）问题，提出了一种受生物启发的解决方案。核心是解决深度学习模型在学习新任务时对旧知识的“灾难性遗忘”。作者首次将**Hebbian学习**（一种基于神经元同步激活的无监督、无反馈学习规则）与**增量学习**相结合，并设计了一个**核塑性**机制。该机制通过分析训 MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mint-bench-a-comprehensive-multilingual-benchmark/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-mint-bench-a-comprehensive-multilingual-benchmark/ 这篇论文旨在解决指令跟随文本转语音（TTS）领域缺乏系统化评估工具的问题。当前评估存在覆盖不全、诊断粒度粗、多语言支持弱等缺陷。为此，作者提出了**MINT-Bench**，一个全面的多语言基准测试。其核心方法包括：1）一个基于10种原子声学属性的**分层多轴分类法**，系统性地组织了从简单到复杂（ Neural Encoding Detection is Not All You Need for Synthetic Speech Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-neural-encoding-detection-is-not-all-you-need-for/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-neural-encoding-detection-is-not-all-you-need-for/ 这篇综述论文的核心贡献在于**揭示并论证了当前合成语音检测领域的一个关键误区：过度依赖“神经编码检测”**。论文首先系统回顾了基于SincNet、自监督学习（SSL）和神经编码检测的三类数据驱动方法，指出当前性能最佳的SSL模型实际上主要捕捉的是声码器（vocoder）在波形生成阶段引入的痕迹，而非 VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-vibe-voice-induced-open-ended-bias-evaluation-for/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-vibe-voice-induced-open-ended-bias-evaluation-for/ 这篇论文旨在解决大型音频语言模型（LALM）在开放生成任务中社会偏见评估不足的问题。现有基准多依赖合成语音和选择题（MCQ），无法捕捉模型在真实交互中自然流露的刻板印象。为此，作者提出了**VIBE**框架，其核心是使用**真实人声录音**输入模型，并通过**开放生成任务**（如故事创作、个性化推荐 Where Do Self-Supervised Speech Models Become Unfair? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-where-do-self-supervised-speech-models-become/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-where-do-self-supervised-speech-models-become/ 这篇论文旨在探究自监督语音模型（S3M）的不公平性究竟在模型的哪个层级产生。研究团队采用了一种轻量级的线性探针方法，在多个S3M（如WavLM, Wav2Vec2, BEST-RQ, Whisper）的每一层嵌入上，同时评估了说话人识别（SID）和自动语音识别（ASR）任务的整体性能及对不同说话人组 Elucidating the SNR-t Bias of Diffusion Probabilistic Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-elucidating-the-snr-t-bias-of-diffusion/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-elucidating-the-snr-t-bias-of-diffusion/ 这篇论文的核心贡献是识别并系统分析了扩散概率模型（DPMs）中一个基础性问题——信噪比-时间步（SNR-t）偏差。该偏差指推理时去噪样本的实际SNR与其所分配时间步t所理论对应的SNR不匹配，这种错位源于训练时的严格耦合在推理时被累积误差打破。作者通过详实的实验（滑动窗口测试、前向与反向过程对比）揭 Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-interactive-asr-towards-human-like-interaction/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-interactive-asr-towards-human-like-interaction/ 这篇论文针对传统ASR的两大盲区——WER指标对语义错误不敏感、以及系统无法通过自然交互进行纠错——提出了Interactive ASR框架。首先，作者引入S²ER（Sentence-level Semantic Error Rate），利用LLM-as-a-Judge二元判断识别结果与参考文本是否 The Acoustic Camouflage Phenomenon: Re-evaluating Speech Features for Financial Risk Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-the-acoustic-camouflage-phenomenon-re-evaluating/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-the-acoustic-camouflage-phenomenon-re-evaluating/ 本研究探讨了在企业财报电话会议中，副语言声学特征（音高、抖动、停顿等）对预测灾难性股价下跌的效用。作者基于MAEC数据集，提取了两种模态的特征：文本端使用FinBERT计算脚本化开场白与即兴Q&A之间的情感极性差异（Sentiment Delta），音频端提取临床语音压力标记的方差特征（音高方差、抖