假设检验 | 语音/音乐/音频论文速递

语音/音乐/音频论文速递 2026-06-23 共分析 83 篇论文 ⚡ 今日概览 📥 抓取 83 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 19篇 ███████████████ #语音合成 14篇 ██████████████ #音乐生成 3篇 ███ #说话人验证 3篇 ███ #语音增强 3篇 ███ #对比学习 2篇 ██ #自监督学习 2篇 ██ #音频水印 2篇 ██ 📊 论文评分排行榜（83 篇，按分数降序）排名论文总分分档主任务 🥇 CoughPhase-CLR: Designing an acoustics-informed foundat 10.0分前10% #对比学习 🥈 Libretto: Giving LLM Agents a Sense of Musical Structur 9.2分前50% #音乐生成 🥉 Speaker Identity in Non-Verbal Vocalizations: Condition 9.1分前25% #说话人验证 4. PHAST-Net: Attention-Guided, Physics-Informed Network f 9.0分前10% #音乐信息检索 5. Domain-incremental audio classification using domain-sp 9.0分前50% #音频分类 6. MSU-Bench: Towards Speaker-Centric Understanding in Con 9.0分前10% - 7. How Well Do Self-Supervised Speech Models Encode Age an 9.0分前50% #自监督学习 8. CAAD: Contrastive Audio-Aware Distillation for Efficien 8.9分前25% #语音识别 9. STAR-VAE: Structured Topology-Aware Regularization for 8.8分前25% #音频生成 10. An Evaluation Framework for Text-to-Speech Voice Recons 8.8分前25% #语音合成 11. An Analysis of Untrained Deep Reservoir Networks for Au 8.8分前50% #音频事件检测 12. Towards Detecting Neural Audio Codec Synthesized Heart 8.7分前50% #自监督学习 13. Bridging the Age Gap: Towards Detecting Neural Audio Co 8.6分前50% #语音伪造检测 14. ATCCaps: A Call-Sign-Aware Speech Dataset for Air Traff 8.6分前25% #语音识别 15. InstructFX2FX: A Multi-turn Text-to-Preset Demo for Ite 8.6分前50% #对比学习 16. When EER Hides Deployment Failure: Auditing Threshold T 8.6分前25% - 17. CapRiCorn-1K: A Comprehensive Benchmark for Video Capti 8.6分前50% #语音识别 18. Compiling Differentiable Audio Graphs to Real-Time DSP 8.5分前25% - 19. Improving Text-to-Music Generation with Human Preferenc 8.5分前50% #音乐生成 20. Don't Listen to Me: A Lightweight, Low-Latency Mode 8.4分前50% #语音增强 21. HALAS: A Human-Annotated Dataset of Hallucinations of M 8.4分前50% #语音识别 22. Benchmarking Large Language Models for Grapheme-to-Phon 8.4分前25% #语音合成 23. Cross-lingual Retrieval-Augmented Classification for Dy 8.4分前25% #语音识别 24. Bagpiper-TTS: Natural Language Guided Universal Speech 8.4分前25% #语音合成 25. Using Phonological-Level Wav2Vec2 for Mandarin Automati 8.3分前25% #语音识别 26. Word Lengthening as a Function of Utterance Position: A 8.1分前25% #语音合成 27. LambdaMark: Semantic Audio Watermarking for Robustness 8.0分前25% #音频水印 28. OpenWER: Improving Cross-Lingual ASR Evaluation and Ena 8.0分前50% #语音识别 29. AudioCALM: Continuous Autoregressive Language Modeling 7.9分前25% #语音合成 30. AOR-Bench: Do Large Audio Language Models Over-Refuse P 7.9分前50% #音频问答 31. Gradient-Based Learning of Parametric Engine Sound Repr 7.8分前50% #参数高效微调 32. Toward Open-Set Speaker Attribute Prediction with Keywo 7.8分前25% #多模态模型 33. Time-Frequency Weighted Losses for Phoneme Reconstructi 7.8分前25% #语音增强 34. An implicitization-based solution to the minimal 4s/6r 7.8分前50% - 35. CORTIS: Text-Only Adaptation of Spoken Language Models 7.7分前50% #语音识别 36. What Do Neural Networks Learn for TDOA Estimation? A Cr 7.7分前50% #声源定位 37. Kiwano: A Cutting-Edge Open-Source Toolkit for Speaker 7.6分前50% #说话人验证 38. Learning to Evade: Adaptive Attacks on Audio Watermarki 7.6分前50% #音频水印 39. Bagpiper-Edit: Zero-Shot Open-Ended Audio Editing via R 7.6分前25% #语音合成 40. From Text Metrics to Model Internals: A Study of Whispe 7.5分前50% #语音识别 41. Bridging Self-Supervised Learning and Speech Enhancemen 7.5分前25% #语音增强 42. Integrating Facial Generation into Full-Duplex Spoken D 7.5分前25% - 43. ESPnet3: Infrastructure for Scalable Speech and Audio R 7.5分前25% #语音识别 44. On the Effect of Segmentation Width and Cluster Size on 7.4分前25% #语音合成 45. The Anatomy of the CTC Oracle Gap: Acoustic Exhaustion 7.3分前50% #语音识别 46. FlowTTS-GRPO: Online Reinforcement Learning with Multi- 7.2分前50% - 47. DisSpeech: Low-Resource Controllable Mandarin Stuttered 7.2分前25% #语音合成 48. SDP-Codec: A Speaker-Decoupled Speech Codec with Pitch 7.2分前50% #语音编码 49. Synthesizing the Lombard Effect: Multi-Level Control of 7.2分前50% #语音合成 50. Scaling Audio Models Efficiently: A Joint Study of Comp 7.2分前50% #语音识别 51. Online Predictive Coding for Dual-Mode Self-Supervised 7.2分前50% #语音识别 52. Exploiting Neural Audio Codec Latents for Adversarial A 7.2分前50% #生成对抗网络 53. Audio Editing in the Era of Foundation Models: A Survey 7.0分前25% - 54. Adding Robust Code-Switching Capabilities to High Perfo 7.0分前50% #语音识别 55. Unlocking In-Context Learning in Audio-Language Models 7.0分前50% #联邦学习 56. Backdoor Attacks on Speech Emotion Recognition via TTS- 7.0分前50% #语音情感识别 57. LK Jam: System Architecture and Implementation of a Rea 7.0分前50% #音乐生成 58. An Acoustic Landmark Database of the English Lexicon vi 6.9分前50% #语音合成 59. Learning from Audio-Dependency Errors: Data Curation St 6.9分前50% #音频问答 60. The Watermark Shortcut: How Provenance Marking Sabotage 6.8分前50% #数据增强 61. LISE : Listenable Interpretable Speaker Embeddings 6.8分前50% #说话人验证 62. PIVOTSBench: Evaluating Fine-Grained Interpersonal Rela 6.8分前50% #基准测试 63. AugCodec: A Low-Bitrate Disentangled Neural Speech Code 6.7分前50% #数据增强 64. Vaani Benchmark V1.0: An Inclusive Multimodal Benchmark 6.7分前50% #语音识别 65. Physics-Informed Neural Operator for Speech Production 6.7分前50% #语音合成 66. Streaming T5-based Text-to-Speech Synthesis with Limite 6.7分前25% #语音合成 67. ProsoCodec: Prosody-Oriented Speech Codec for Voice Con 6.6分前50% #语音转换 68. Beyond ROC-AUC: Operating-Point Performance Reporting f 6.6分前50% - 69. ISCSLP 2026 CoT-TTS Challenge: Chain-of-Thought Reasoni 6.6分前50% #语音合成 70. A DDSP Framework for Adaptive Room Equalization 6.5分前50% #自适应滤波 71. EmoInstruct-TTS: Dual-Path Instruction-Guided Emotional 6.5分前50% - 72. Interleaved Speech Language Models Latently Work In Tex 6.4分前50% #语音识别 73. DSSCNet: A Transfer Learning Framework for Cross-Corpus 6.3分前50% #迁移学习 74. Sea-Scan: High-Accuracy, ML-based Dark Vessel Detection 6.3分前50% - 75. Catching Lies Without Sending the Video: Privacy-Preser 6.2分前50% #多模态模型 76. MindAlign: Decoding Inner Speech from fMRI Signals via 5.8分前50% #语音识别 77. Acoustic Landmark Detector based on Conformer and HuBER 5.5分前50% #语音识别 78. Explainable AI in Speaker Recognition – Attention Map 5.5分前50% #说话人识别 79. Imitation Learning for Elder-Facing Speech Synthesis 5.5分前50% #语音合成 80. Improving Engine Sound Analysis in Hot-Test Environment 4.9分后50% #音频降噪 81. Direct Raw Audio Signal Processing via Reservoir Comput 4.5分后50% #语音识别 82. A Generalized Formalism of Auto-Regressive Decoding for 4.1分后50% #自回归模型 83. Noise-Driven Instrument Based on Coherent Quantum and S 3.8分后50% - 📋 论文列表 🥇 CoughPhase-CLR: Designing an acoustics-informed foundation model for coughing sound classification 10.0/10 | 创新 2/2 | 严谨 1.5/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.5/1.5 | 开源 1.3/1.5 | 复现 0.5/0.5 | 工程 1.0/1.5 ...