开源工具 on 语音/音频论文速递

开源工具 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E5%BC%80%E6%BA%90%E5%B7%A5%E5%85%B7/ Recent content in 开源工具 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-text-to-text-alignment-algorithm-for-better/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-text-to-text-alignment-algorithm-for-better/ 模型评估 | 7.5/10 Constructing Composite Features for Interpretable Music-Tagging https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-constructing-composite-features-for-interpretable/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-constructing-composite-features-for-interpretable/ 音乐信息检索 | 7.5/10 Denoising Of Stochastic Ray Tracing Room Impulse Responses https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-denoising-of-stochastic-ray-tracing-room-impulse/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-denoising-of-stochastic-ray-tracing-room-impulse/ 空间音频 | 7.5/10 ECHO: Frequency-Aware Hierarchical Encoding for Variable-Length Signals https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echo-frequency-aware-hierarchical-encoding-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-echo-frequency-aware-hierarchical-encoding-for/ 音频分类 | 9.5/10 Evaluating High-Resolution Piano Sustain Pedal Depth Estimation with Musically Informed Metrics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-high-resolution-piano-sustain-pedal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-evaluating-high-resolution-piano-sustain-pedal/ 音乐信息检索 | 8.0/10 MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mnv-17-a-high-quality-performative-mandarin/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mnv-17-a-high-quality-performative-mandarin/ 语音识别 | 7.5/10 Polynomial Mixing for Efficient Self-Supervised Speech Encoders https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/ 语音识别 | 8.0/10 Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/ 语音合成 | 8.0/10 SA-SSL-MOS: Self-Supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sa-ssl-mos-self-supervised-learning-mos/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sa-ssl-mos-self-supervised-learning-mos/ 语音质量评估 | 7.0/10 Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-Scale Dataset Cleansing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sidon-fast-and-robust-open-source-multilingual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sidon-fast-and-robust-open-source-multilingual/ 语音增强 | 8.5/10 The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion to Singing Style Conversion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-singing-voice-conversion-challenge-2025-from/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-the-singing-voice-conversion-challenge-2025-from/ 歌唱语音转换 | 7.0/10 Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-via-score-to-performance-efficient-human/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-via-score-to-performance-efficient-human/ 音乐生成 | 7.5/10 Z-Scores: A Metric for Linguistically Assessing Disfluency Removal https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-z-scores-a-metric-for-linguistically-assessing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-z-scores-a-metric-for-linguistically-assessing/ 模型评估 | 6.5/10 Opening the Design Space: Two Years of Performance with Intelligent Musical Instruments https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-opening-the-design-space-two-years-of-performance/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-opening-the-design-space-two-years-of-performance/ 音乐生成 | 6.5/10 Audio Video Verbal Analysis (AVVA) for Capturing Classroom Dialogues https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-audio-video-verbal-analysis-avva-for-capturing/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-audio-video-verbal-analysis-avva-for-capturing/ 音频问答 | 6.0/10 TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-tts-prism-a-perceptual-reasoning-and/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-tts-prism-a-perceptual-reasoning-and/ 语音质量评估 | 7.5/10 语音/音频论文速递 2026-04-27 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27/ 共分析 13 篇语音/AI 论文 DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-diarizen-explained-a-tutorial-for-the-open-source/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-diarizen-explained-a-tutorial-for-the-open-source/ 说话人分离 | 6.5/10 Centering Ecological Goals in Automated Identification of Individual Animals https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-centering-ecological-goals-in-automated/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-centering-ecological-goals-in-automated/ 这篇论文旨在解决一个关键问题：为什么近年来在动物个体自动识别（基于图像或声音）上报告的高准确率算法，却很少转化为生态学实践中的常规工具？其方法核心是提出一个“以生态目标为中心”的评估与部署框架，强调自动化识别的有用性取决于其服务的具体生态问题、可用数据以及错误类型带来的实际后果。与以往主要关注算法准