语音/音乐/音频论文速递

语音/音乐/音频论文速递 2026-06-30

语音/音乐/音频论文速递 2026-06-30 共分析 35 篇论文 ⚡ 今日概览 📥 抓取 35 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 10篇 ██████████ #语音合成 4篇 ████ #自监督学习 2篇 ██ #语音编码 2篇 ██ #音乐生成 1篇 █ #音频事件检测 1篇 █ #语音分离 1篇 █ #数据集 1篇 █ 📊 论文评分排行榜（35 篇，按分数降序）排名论文总分分档主任务 🥇 Preference-ASR: A Preference-Aware Test Set for Benchma 9.5分前10% #语音识别 🥈 LeVo 2: Stable and Melodious Song Generation via Hierar 9.4分前10% #音乐生成 🥉 VIB-AVSR: Variational Information Bottleneck for Noise- 9.0分前10% #语音识别 4. Two kinds of robustness are not the same: disentangling 8.9分前25% #音频事件检测 5. DialogPII: A multilingual dataset of synthetic dialog t 8.9分前25% #语音识别 6. GigaSpeechBench: A Real-World Multilingual Speech-to-Te 8.7分前50% #语音识别 7. SICAGE: Speaker-Independent Culture-Aware Gesture Gener 8.7分前25% #语音合成 8. How to Leverage Synthetic Speech for LLM-Based ASR Syst 8.7分前50% #语音识别 9. Position-Aware Target Speaker Extraction for Long-Form 8.5分前25% #语音识别 10. wav2VOT: Automatic estimation of voice onset time, clos 8.5分前25% #自监督学习 11. Improving Large-Scale Weakly Supervised ASR by Filterin 8.4分前25% - 12. Agent-Computer Observation Interfaces Enable Dynamic Co 8.4分前10% #语音识别 13. DTM-Codec: Dynamic Token Masking for VFR Speech Coding 8.1分前25% #语音编码 14. TF-MoE: Time-Frequency Mixture-of-Experts for Efficient 8.1分前25% #语音分离 15. Underwater Source Detection and Classification for Sign 7.8分前25% #数据集 16. AMR: Adaptive Modality Routing for Multimodal Polyglot 7.8分前25% #说话人识别 17. FacePlex: Full-Duplex Joint Speech-Facial Motion Genera 7.8分前25% #语音合成 18. VeRe-Flow: Guiding Flow Matching toward Clean Speech vi 7.7分前25% #语音增强 19. CTC-Seeded Token Edit Refinement for Non-Autoregressive 7.7分前25% #语音识别 20. Evaluation of Head-Related Transfer Functions Across Fi 7.6分前25% #空间音频 21. Semi-Supervised Sound Event Detection with Conditional 7.6分前25% #对比学习 22. OLIVE: View-Augmented Latent Prediction with Waveform R 7.5分前50% #语音识别 23. EchoHawk: A Reproducible Acoustic Pipeline for Drone De 7.5分前25% - 24. LoRA-Tuned Large Language Models for Dementia Detection 7.5分前50% #参数高效微调 25. MeloDISinger: Melody-Aware & Duration-Preserving Si 7.4分前50% #语音合成 26. Child-Centric Voice Anonymization in Single and Multi-S 7.2分前50% #语音匿名化 27. SIGMA: Saliency-Guided Sparse Mask Attacks for Speech E 7.1分前50% #语音情感识别 28. Effective Depth in Joint Source-Channel Coding: An Impl 7.0分前50% #语音编码 29. SIMAX: A Scalable and Interpretable Framework for Multi 6.6分后50% #语音合成 30. Clustering Unsupervised Representations as Defense agai 6.5分前50% #自监督学习 31. Comparing Human and Automatic Recognition of Dutch Dysa 6.5分前50% #语音识别 32. Predicting Timbre Traits for Interpretable Assessment o 6.1分前50% #音频生成 33. TRACE: Temporal Relationship-Aware Conversational Entra 5.9分前50% - 34. Proteus: Automated Adversarial Robustness Testing for A 5.3分后50% #数据增强 35. Rehearsed Multi-Agent Live Product Demonstrations with 5.3分后50% #多模态模型 📋 论文列表 🥇 Preference-ASR: A Preference-Aware Test Set for Benchmarking ASR in the Era of Speech LLMs 9.5/10 | 创新 1.5/2 | 严谨 1.2/1.5 | 实验 1.2/1.5 | 清晰 1/1 | 影响 1.3/1.5 | 开源 1.4/1.5 | 复现 0.5/0.5 | 工程 1.4/1.5 ...

语音/音乐/音频论文速递 2026-06-29

语音/音乐/音频论文速递 2026-06-29 共分析 16 篇论文 ⚡ 今日概览 📥 抓取 16 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 4篇 ████ #语音合成 2篇 ██ #说话人识别 2篇 ██ #语音质量评估 1篇 █ #数据增强 1篇 █ #语音情感识别 1篇 █ #多模态模型 1篇 █ #语音增强 1篇 █ 📊 论文评分排行榜（16 篇，按分数降序）排名论文总分分档主任务 🥇 Screening Matters: A Comparative Study of Conventional 8.4分前25% #语音质量评估 🥈 From General-Purpose Audio Tagging to Spatially Grounde 8.3分前50% #数据增强 🥉 HPRO: Hierarchical Progressive Reward Optimization via 8.2分前50% #语音合成 4. Learning from Annotation Uncertainty: Entropy-Aware Cur 7.4分前50% #语音情感识别 5. MER-R1: Multimodal Emotion Reasoning via Slow-Fast Thin 7.4分前25% #多模态模型 6. A Comparison of Fusion Techniques for Multi-Modal Human 7.3分前50% - 7. Do Speech Emphasis Models Generalize across Languages a 7.0分前25% #语音识别 8. Advancing Speaker-Based Vocal Effort Classification wit 6.8分前50% #语音增强 9. HybridCodec: Modeling Discrete and Continuous Represent 6.5分前50% #语音合成 10. Grammar-Guided Hierarchical Parsing for Long-form Audio 6.2分前50% #音频事件检测 11. Room for Error: Large-Scale Simulation of Over-the-Air 6.2分前50% #语音识别 12. What Was That Again? Certified Robustness for Automatic 6.2分前50% - 13. Dialogue to Detection: A Multimodal Hybrid NLP Pipeline 6.0分后50% #说话人识别 14. From Black-Box to Clinical Insight: A Multi-Stage Expla 6.0分前50% #语音识别 15. DG^VoiC: Speaker Clustering for Fraud Investigation und 5.7分前50% #说话人识别 16. A Survey of Automated Presentation Coaching: Systems, M 5.4分后50% #语音识别 📋 论文列表 🥇 Screening Matters: A Comparative Study of Conventional and Crowdsourced Listening Tests 8.4/10 | 创新 1.4/2 | 严谨 1.3/1.5 | 实验 1.2/1.5 | 清晰 1/1 | 影响 1.2/1.5 | 开源 0.5/1.5 | 复现 0.5/0.5 | 工程 1.3/1.5 ...

语音/音乐/音频论文速递 2026-06-26

语音/音乐/音频论文速递 2026-06-26 共分析 22 篇论文 ⚡ 今日概览 📥 抓取 22 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 3篇 ███ #语音质量评估 2篇 ██ #语音合成 2篇 ██ #扩散模型 1篇 █ 歌唱评估 1篇 █ 音频编解码 1篇 █ 音频事件检测 1篇 █ 音频分离 1篇 █ 📊 论文评分排行榜（21 篇，按分数降序）排名论文总分分档主任务 🥇 DNSMOS-C: Improving End-to-end Speech Quality Models vi 9.3分前50% #语音质量评估 🥈 UnityShots: Memory-Driven Multi-Shot Audio-Video Genera 8.9分前25% #扩散模型 🥉 Listening Like a Judge: A Music-Aware Framework for Aut 8.8分前25% 歌唱评估 4. Elastic Time: Dynamic Frame Rate Bottlenecks for Neural 8.3分前50% 音频编解码 5. Soroll-IA: A Weakly Labeled Audio Dataset for Real-Worl 8.3分前25% 音频事件检测 6. A Large-Scale Database and Predictive Model of Listener 8.1分前25% #语音质量评估 7. SamaVaani: Auditing and Debiasing Multilingual Clinical 7.8分前25% #语音识别 8. CodecSep: Prompt-Driven Universal Sound Separation on N 7.7分前25% 音频分离 9. VoiceTTA: Enhancing Zero-Shot Text-to-Speech via Reinfo 7.6分前50% #语音合成 10. What We are Missing in Multimodal LLM Evaluation? 7.0分前50% - 11. RedVox: Safety and Fairness Gaps in Speech Models Acros 6.8分前50% #基准测试 12. WQ-Fusion: Dynamic Gated Attention for Cross-Domain Aud 6.7分前50% #音频分类 13. Thinking While Speaking: Inference-Time Knowledge Trans 6.7分后50% #知识蒸馏 14. When Does Quality-Aware Multimodal Fusion Matter? A Lea 6.6分前50% #语音情感识别 15. voxmap-studio: An open-source speaker diarization annot 6.5分前50% #说话人日志 16. FBK's Long-form SpeechLLMs for IWSLT 2026 Instructi 6.5分前50% #语音识别 17. wav2tok 2.0: Scalable Audio Tokenization Maintaining Ex 6.4分前50% #语音检索 18. Generative AI and Copyright Infringement: A Legal-Techn 6.0分前50% #音乐生成 19. Closing the Quality Gap in Low-Resource Text-to-Speech: 6.0分后50% #语音合成 20. Neural Speaker Diarization via Multilingual Training: E 5.5分前50% #语音分离 21. Low Resource Multimodal Translation of Nepali Spoken Wo 5.3分后50% #语音识别 22 Phonetic and semantic analyses of spoken corpora of Bei N/A - - 📋 论文列表 🥇 DNSMOS-C: Improving End-to-end Speech Quality Models via Contrastive Learning 9.3/10 | 创新 1.3/2 | 严谨 1.2/1.5 | 实验 1.4/1.5 | 清晰 1/1 | 影响 1.3/1.5 | 开源 1.2/1.5 | 复现 0.5/0.5 | 工程 1.4/1.5 ...

语音/音乐/音频论文速递 2026-06-25

语音/音乐/音频论文速递 2026-06-25 共分析 27 篇论文 ⚡ 今日概览 📥 抓取 27 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 6篇 ██████ #语音合成 5篇 █████ #语音增强 2篇 ██ #音乐生成 1篇 █ #语音翻译 1篇 █ #语音伪造检测 1篇 █ #自监督学习 1篇 █ #端到端 1篇 █ 📊 论文评分排行榜（27 篇，按分数降序）排名论文总分分档主任务 🥇 Fully Differentiable Neural Forced Alignment via Soft D 8.3分前25% - 🥈 Attractive and Repulsive Pattern Control in Sequence Ge 8.1分前25% #音乐生成 🥉 STEB: A Speech-to-Speech Translation Expressiveness Ben 7.8分前50% #语音翻译 4. Supervised Post-training of Speech Foundation Models fo 7.6分前50% #语音伪造检测 5. Joint Residual Reweighting for Classifier Free Guidance 7.5分前50% #语音合成 6. Velocity Prediction in Automatic Guitar Transcription 7.5分前25% - 7. SE-AGCNet: An End-to-End Framework for Joint Speech Enh 7.4分前50% #语音增强 8. MJEPA: A Simple and Scalable Joint-Embedding Predictive 7.4分前25% #自监督学习 9. Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese 7.3分前50% #语音合成 10. One Model, Many Latencies: Universal Speech Enhancement 7.2分前50% #语音增强 11. From Sounds to Scenes: A Benchmark for Evaluating Conte 7.2分前50% #语音识别 12. Wan-Streamer v0.1: End-to-end Real-time Interactive Fou 7.2分前25% #语音合成 13. Does Translation-Enhanced Speech Encoder Pre-training A 7.1分前50% #语音识别 14. Adaptive Oscillatory Inductive Bias for Modeling Sharp 7.0分前50% #语音合成 15. End-to-End Voice Intent Recognition for Spontaneous Hum 7.0分前50% #端到端 16. Real-Time Voice AI Hears but Does Not Listen 7.0分前50% - 17. FoleySet: A Multi-Level Human-Annotated Foley Sound Dat 7.0分前50% #音频分类 18. EmotionAI: A Privacy-Preserving Computational Intellige 6.9分前50% #语音情感识别 19. Frequency-Aware Self-Supervised Music Representation Le 6.8分前50% #音乐信息检索 20. BCoughBench: Benchmarking Respiratory Acoustic Foundati 6.7分前50% #基准测试 21. SpeechEQ: Benchmarking Emotional Intelligence Quotient 6.7分前25% #语音对话系统 22. Graph-Based Phonetic Error Correction of Noisy ASR 6.7分前50% #语音识别 23. What Does a Pathological Speech Assessment Model Know a 6.4分前50% #语音可懂度评估 24. Phoneme-Level Mispronunciation Screening in Polish-Spea 6.2分前50% #语音识别 25. Error-Aware TF-IDF Retrieval-Augmented Generation for A 6.1分前50% #语音识别 26. Evaluating Japanese Dialect Robustness Across Speech an 5.8分前50% #语音识别 27. CrossAccent-TTS: Cross-Lingual Accent-Intensity Control 5.5分前50% #语音合成 📋 论文列表 🥇 Fully Differentiable Neural Forced Alignment via Soft Dynamic Programming 8.3/10 | 创新 1.4/2 | 严谨 1.3/1.5 | 实验 1.0/1.5 | 清晰 1/1 | 影响 1.1/1.5 | 开源 1.2/1.5 | 复现 0.5/0.5 | 工程 0.8/1.5 ...

语音/音乐/音频论文速递 2026-06-24

语音/音乐/音频论文速递 2026-06-24 共分析 39 篇论文 ⚡ 今日概览 📥 抓取 39 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 6篇 ██████ #语音增强 6篇 ██████ #语音合成 2篇 ██ #多模态模型 2篇 ██ #音乐生成 2篇 ██ #信号处理基础 2篇 ██ #音频深度伪造检测 1篇 █ #对比学习 1篇 █ 📊 论文评分排行榜（39 篇，按分数降序）排名论文总分分档主任务 🥇 ZONOS2 Technical Report 10.0分前25% #语音合成 🥈 Layer-wise Probing of wav2vec 2.0 and Whisper for Conso 9.5分前50% #语音识别 🥉 CN-NewsTTS Bench: a target-level automatic benchmark fo 9.2分前10% #语音合成 4. BanglaFake: Constructing and Evaluating a Specialized B 9.0分后50% #音频深度伪造检测 5. Data Scale, Not Latency, Shapes Cross-Lingual Encoder T 9.0分前25% #语音识别 6. Breaking Shortcut Learning for Cross-Trial EEG-Guided T 8.6分前50% #对比学习 7. AVOC: Enhancing Hour-Level Audio-Video Understanding in 8.4分前25% #多模态模型 8. SphereVBx: Spherical Variational Bayes Clustering for S 8.3分前50% #无监督学习 9. ParaPairAudioBench: Paralinguistic Pairwise Audio Bench 8.2分前50% #语音质量评估 10. video-SALMONN-R\(^3\): Learning to ReWatch, ReAsk, and Re 8.2分前10% #多模态模型 11. Audio-visual Contrastive Alignment for Diffusion-based 8.1分前25% #语音增强 12. Perceptual Evaluation of Higher-Order Ambisonic Codecs 8.0分前50% #音频编码 13. DTT-BSR+: A Generative-Regression Cascade for Music Sou 8.0分前25% #生成对抗网络 14. Heterogeneous 2D/1D Signal Representation Fusion for Un 7.6分前50% - 15. Selective Capability Unlearning in End-to-End Spoken La 7.6分前25% - 16. A Multi-Stage Separation-and-Classification Framework G 7.5分前50% #音频分类 17. Progressive Alignment Objectives for Aligner-Encoder ba 7.5分前25% #语音识别 18. Comparative Reasoning: Making an Audio Language Model B 7.5分前25% #语音情感识别 19. VieSpeaker: A Large-Scale Vietnamese Speaker Recognitio 7.5分前25% #说话人识别 20. Suppressing spectral edge effects in Schroeder Harmonic 7.3分前50% #语音增强 21. Real-Time Interactive Music Generation via Data-Free St 7.1分前50% #音乐生成 22. A Methodology for Characterizing Underwater Radiated No 7.0分前50% #信号处理基础 23. A Fusion-Aware Two-Stage Framework for Mispronunciation 7.0分前25% #语音识别 24. Neuromorphic Speech Enhancement with Dual-Branch Spikin 7.0分前50% #语音增强 25. NeuroSonic: Conditional Flow Matching for EEG-to-Speech 7.0分前50% #语音生成 26. The effect of micro-changes in the pluck trajectory on 6.8分前50% #信号处理基础 27. Evaluation of Headrest-Integrated Loudspeakers for Enha 6.8分前50% - 28. Statistical validation and full-sphere extension of a B 6.7分前50% #音频质量评估 29. Beyond U-Net: A Latent-Representation-Aligned Skip-Free 6.6分前50% #语音增强 30. Measuring User's Mental Models of Speech Translatio 6.6分前50% #语音翻译 31. Audio–Image Alignment as a Continued-Pretraining Stage 6.2分前50% #语音识别 32. Poster: Exploring the Limits of Audio-Based Detection o 6.2分前50% - 33. Joint Learning of Covariance Estimation and White Noise 5.8分前50% #语音增强 34. Sonus Health: Calibrated Heart-Murmur Detection from Sm 5.7分前50% #音频事件检测 35. Autoencoder based optimized SSL representations: Comple 5.5分前50% #语音识别 36. It's Complicated: On the Design and Evaluation of A 5.5分前50% #大语言模型 37. Digital Revival: Acoustic Documentation and Digital Rea 5.3分后50% #音乐生成 38. Aligning MusicLLM with Emotion using Instruction Tuning 4.9分后50% #音乐情感识别 39. A Variational-Flow Analysis of StoRM under Noise-Power 4.4分前50% #语音增强 📋 论文列表 🥇 ZONOS2 Technical Report 10.0/10 | 创新 1.5/2 | 严谨 1.3/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.4/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.3/1.5 ...

语音/音乐/音频论文速递 2026-06-23

语音/音乐/音频论文速递 2026-06-23 共分析 83 篇论文 ⚡ 今日概览 📥 抓取 83 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 19篇 ███████████████ #语音合成 14篇 ██████████████ #音乐生成 3篇 ███ #说话人验证 3篇 ███ #语音增强 3篇 ███ #对比学习 2篇 ██ #自监督学习 2篇 ██ #音频水印 2篇 ██ 📊 论文评分排行榜（83 篇，按分数降序）排名论文总分分档主任务 🥇 CoughPhase-CLR: Designing an acoustics-informed foundat 10.0分前10% #对比学习 🥈 Libretto: Giving LLM Agents a Sense of Musical Structur 9.2分前50% #音乐生成 🥉 Speaker Identity in Non-Verbal Vocalizations: Condition 9.1分前25% #说话人验证 4. PHAST-Net: Attention-Guided, Physics-Informed Network f 9.0分前10% #音乐信息检索 5. Domain-incremental audio classification using domain-sp 9.0分前50% #音频分类 6. MSU-Bench: Towards Speaker-Centric Understanding in Con 9.0分前10% - 7. How Well Do Self-Supervised Speech Models Encode Age an 9.0分前50% #自监督学习 8. CAAD: Contrastive Audio-Aware Distillation for Efficien 8.9分前25% #语音识别 9. STAR-VAE: Structured Topology-Aware Regularization for 8.8分前25% #音频生成 10. An Evaluation Framework for Text-to-Speech Voice Recons 8.8分前25% #语音合成 11. An Analysis of Untrained Deep Reservoir Networks for Au 8.8分前50% #音频事件检测 12. Towards Detecting Neural Audio Codec Synthesized Heart 8.7分前50% #自监督学习 13. Bridging the Age Gap: Towards Detecting Neural Audio Co 8.6分前50% #语音伪造检测 14. ATCCaps: A Call-Sign-Aware Speech Dataset for Air Traff 8.6分前25% #语音识别 15. InstructFX2FX: A Multi-turn Text-to-Preset Demo for Ite 8.6分前50% #对比学习 16. When EER Hides Deployment Failure: Auditing Threshold T 8.6分前25% - 17. CapRiCorn-1K: A Comprehensive Benchmark for Video Capti 8.6分前50% #语音识别 18. Compiling Differentiable Audio Graphs to Real-Time DSP 8.5分前25% - 19. Improving Text-to-Music Generation with Human Preferenc 8.5分前50% #音乐生成 20. Don't Listen to Me: A Lightweight, Low-Latency Mode 8.4分前50% #语音增强 21. HALAS: A Human-Annotated Dataset of Hallucinations of M 8.4分前50% #语音识别 22. Benchmarking Large Language Models for Grapheme-to-Phon 8.4分前25% #语音合成 23. Cross-lingual Retrieval-Augmented Classification for Dy 8.4分前25% #语音识别 24. Bagpiper-TTS: Natural Language Guided Universal Speech 8.4分前25% #语音合成 25. Using Phonological-Level Wav2Vec2 for Mandarin Automati 8.3分前25% #语音识别 26. Word Lengthening as a Function of Utterance Position: A 8.1分前25% #语音合成 27. LambdaMark: Semantic Audio Watermarking for Robustness 8.0分前25% #音频水印 28. OpenWER: Improving Cross-Lingual ASR Evaluation and Ena 8.0分前50% #语音识别 29. AudioCALM: Continuous Autoregressive Language Modeling 7.9分前25% #语音合成 30. AOR-Bench: Do Large Audio Language Models Over-Refuse P 7.9分前50% #音频问答 31. Gradient-Based Learning of Parametric Engine Sound Repr 7.8分前50% #参数高效微调 32. Toward Open-Set Speaker Attribute Prediction with Keywo 7.8分前25% #多模态模型 33. Time-Frequency Weighted Losses for Phoneme Reconstructi 7.8分前25% #语音增强 34. An implicitization-based solution to the minimal 4s/6r 7.8分前50% - 35. CORTIS: Text-Only Adaptation of Spoken Language Models 7.7分前50% #语音识别 36. What Do Neural Networks Learn for TDOA Estimation? A Cr 7.7分前50% #声源定位 37. Kiwano: A Cutting-Edge Open-Source Toolkit for Speaker 7.6分前50% #说话人验证 38. Learning to Evade: Adaptive Attacks on Audio Watermarki 7.6分前50% #音频水印 39. Bagpiper-Edit: Zero-Shot Open-Ended Audio Editing via R 7.6分前25% #语音合成 40. From Text Metrics to Model Internals: A Study of Whispe 7.5分前50% #语音识别 41. Bridging Self-Supervised Learning and Speech Enhancemen 7.5分前25% #语音增强 42. Integrating Facial Generation into Full-Duplex Spoken D 7.5分前25% - 43. ESPnet3: Infrastructure for Scalable Speech and Audio R 7.5分前25% #语音识别 44. On the Effect of Segmentation Width and Cluster Size on 7.4分前25% #语音合成 45. The Anatomy of the CTC Oracle Gap: Acoustic Exhaustion 7.3分前50% #语音识别 46. FlowTTS-GRPO: Online Reinforcement Learning with Multi- 7.2分前50% - 47. DisSpeech: Low-Resource Controllable Mandarin Stuttered 7.2分前25% #语音合成 48. SDP-Codec: A Speaker-Decoupled Speech Codec with Pitch 7.2分前50% #语音编码 49. Synthesizing the Lombard Effect: Multi-Level Control of 7.2分前50% #语音合成 50. Scaling Audio Models Efficiently: A Joint Study of Comp 7.2分前50% #语音识别 51. Online Predictive Coding for Dual-Mode Self-Supervised 7.2分前50% #语音识别 52. Exploiting Neural Audio Codec Latents for Adversarial A 7.2分前50% #生成对抗网络 53. Audio Editing in the Era of Foundation Models: A Survey 7.0分前25% - 54. Adding Robust Code-Switching Capabilities to High Perfo 7.0分前50% #语音识别 55. Unlocking In-Context Learning in Audio-Language Models 7.0分前50% #联邦学习 56. Backdoor Attacks on Speech Emotion Recognition via TTS- 7.0分前50% #语音情感识别 57. LK Jam: System Architecture and Implementation of a Rea 7.0分前50% #音乐生成 58. An Acoustic Landmark Database of the English Lexicon vi 6.9分前50% #语音合成 59. Learning from Audio-Dependency Errors: Data Curation St 6.9分前50% #音频问答 60. The Watermark Shortcut: How Provenance Marking Sabotage 6.8分前50% #数据增强 61. LISE : Listenable Interpretable Speaker Embeddings 6.8分前50% #说话人验证 62. PIVOTSBench: Evaluating Fine-Grained Interpersonal Rela 6.8分前50% #基准测试 63. AugCodec: A Low-Bitrate Disentangled Neural Speech Code 6.7分前50% #数据增强 64. Vaani Benchmark V1.0: An Inclusive Multimodal Benchmark 6.7分前50% #语音识别 65. Physics-Informed Neural Operator for Speech Production 6.7分前50% #语音合成 66. Streaming T5-based Text-to-Speech Synthesis with Limite 6.7分前25% #语音合成 67. ProsoCodec: Prosody-Oriented Speech Codec for Voice Con 6.6分前50% #语音转换 68. Beyond ROC-AUC: Operating-Point Performance Reporting f 6.6分前50% - 69. ISCSLP 2026 CoT-TTS Challenge: Chain-of-Thought Reasoni 6.6分前50% #语音合成 70. A DDSP Framework for Adaptive Room Equalization 6.5分前50% #自适应滤波 71. EmoInstruct-TTS: Dual-Path Instruction-Guided Emotional 6.5分前50% - 72. Interleaved Speech Language Models Latently Work In Tex 6.4分前50% #语音识别 73. DSSCNet: A Transfer Learning Framework for Cross-Corpus 6.3分前50% #迁移学习 74. Sea-Scan: High-Accuracy, ML-based Dark Vessel Detection 6.3分前50% - 75. Catching Lies Without Sending the Video: Privacy-Preser 6.2分前50% #多模态模型 76. MindAlign: Decoding Inner Speech from fMRI Signals via 5.8分前50% #语音识别 77. Acoustic Landmark Detector based on Conformer and HuBER 5.5分前50% #语音识别 78. Explainable AI in Speaker Recognition – Attention Map 5.5分前50% #说话人识别 79. Imitation Learning for Elder-Facing Speech Synthesis 5.5分前50% #语音合成 80. Improving Engine Sound Analysis in Hot-Test Environment 4.9分后50% #音频降噪 81. Direct Raw Audio Signal Processing via Reservoir Comput 4.5分后50% #语音识别 82. A Generalized Formalism of Auto-Regressive Decoding for 4.1分后50% #自回归模型 83. Noise-Driven Instrument Based on Coherent Quantum and S 3.8分后50% - 📋 论文列表 🥇 CoughPhase-CLR: Designing an acoustics-informed foundation model for coughing sound classification 10.0/10 | 创新 2/2 | 严谨 1.5/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.5/1.5 | 开源 1.3/1.5 | 复现 0.5/0.5 | 工程 1.0/1.5 ...

语音/音乐/音频论文速递 2026-06-22

语音/音乐/音频论文速递 2026-06-22 共分析 1 篇论文 ⚡ 今日概览 📥 抓取 1 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音乐生成 1篇 █ 📊 论文评分排行榜（1 篇，按分数降序）排名论文总分分档主任务 🥇 Co-policy: Responsive Human-Robot Co-Creation for Music 8.5分前50% #音乐生成 📋 论文列表 🥇 Co-policy: Responsive Human-Robot Co-Creation for Musical Performances 8.5/10 | 创新 1.5/2 | 严谨 1.2/1.5 | 实验 1.1/1.5 | 清晰 1/1 | 影响 1.0/1.5 | 开源 1.2/1.5 | 复现 0.5/0.5 | 工程 1.0/1.5 ...

语音/音乐/音频论文速递 2026-06-19

语音/音乐/音频论文速递 2026-06-19 共分析 40 篇论文 ⚡ 今日概览 📥 抓取 40 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音合成 10篇 ██████████ #语音识别 8篇 ████████ #语音转换 2篇 ██ #语音增强 2篇 ██ #自监督学习 2篇 ██ #说话人验证 1篇 █ #模型压缩 1篇 █ #多模态模型 1篇 █ 📊 论文评分排行榜（40 篇，按分数降序）排名论文总分分档主任务 🥇 FlowEdit: Associative Memory for Lifelong Pronunciation 10.0分前25% #语音合成 🥈 Low-Burden Data Augmentation for Dysarthric ASR via Zer 8.7分前25% #语音识别 🥉 S-JEPA : Soft Clustering Anchors for Self-Supervised Sp 8.7分前25% #语音识别 4. Personalized Keyword Spotting for User-Defined Keywords 8.6分前25% #说话人验证 5. FlowFake: Liquid Networks for Audio Deepfake Detection 8.5分前25% #模型压缩 6. Systematic Study of Dysarthric Speech Recognition: Spec 8.3分前50% #语音识别 7. PerceptionDLM: Parallel Region Perception with Multimod 8.1分前25% #多模态模型 8. RIVET: Robust Idempotent Voice Attribute Editing 8.0分前50% #语音转换 9. Repurposing a Speech Classifier for Guided Diffusion-Ba 7.9分前50% #语音合成 10. Exploring Feature Extraction Technique Parameters for A 7.9分前50% #音频事件检测 11. Transcript-Free Flow-Matching Text-to-Speech via Speech 7.7分前25% #语音合成 12. How Do Instructions Shape Speech? Cross-Attention Attri 7.7分前50% #语音合成 13. Hybrid Diffusion Transformer for Instruction-Guided Aud 7.6分前50% #Transformer 14. Improving Code-Switching ASR with Code-Mixing Guided Sy 7.6分前25% #语音识别 15. PolSeT: Polish Semantics of Timbre Dataset 7.5分后50% - 16. IHBench: Evaluating Post-Interruption Recovery in Voice 7.5分前25% #语音对话系统 17. A Survey of Full-Duplex Spoken Dialogue Systems: Archit 7.4分前50% #语音合成 18. PhysDrift: Bridging the Embodiment Gap in Humanoid Co-S 7.4分前50% #语音合成 19. PrefSQA: Pairwise Preference Prediction for Speech Qual 7.3分前50% #语音质量评估 20. Latency-Configurable Streaming Speech Enhancement via A 7.2分前50% #语音增强 21. A Comparative Study of Pretrained Transformer Models fo 7.2分前50% #语音识别 22. Pitch Spelling Jazz Lead Sheets, Solo Transcriptions, C 7.2分前50% - 23. Stuttering Classification and Segmentation with Attenti 7.0分前50% - 24. Time-Unconditional Generative Speech Enhancement via Au 7.0分前25% #语音增强 25. Investigating Human-Model Discrepancies in Speech Quali 6.9分前25% #语音合成 26. Prismriver: Formalization of Music Theory and Algorithm 6.9分前50% - 27. NEST: Narrative Event Structures in Time for Long Video 6.8分前50% - 28. Cross-Dataset, Age, and Gender Generalization: A Compre 6.7分前50% #语音识别 29. Exploring Pre-training Benefits on Phoneme Addition thr 6.7分前50% - 30. Analyzing Language and Geographical Variation in Speech 6.5分前50% #语音识别 31. Improving End-to-End Speech Recognition for Dysarthric 6.5分前50% #语音识别 32. Segment-Level Mandarin Chinese Speech-Based Cognitive I 6.5分前50% #对比学习 33. Light-weight Pronunciation Assessment via Discrete Spee 6.4分前50% #自监督学习 34. ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Co 6.2分前50% #语音合成 35. Zero-VC: Zero-Lookahead Streaming Voice Conversion via 6.1分前50% #语音转换 36. MixProLAP: Mixture-Induced Uncertainty Modeling for Pro 5.7分前50% #音频检索 37. MaineCoon: Pursuing A Real-Time Audio-Visual Social Wor 5.7分前50% #语音合成 38. Leveraging systems' non-linearity to tackle the sca 5.5分后50% #数据增强 39. Interpreting Content and Speaker Characteristics in Fac 5.0分后50% #语音合成 40. Beyond Speaker Independence: Evaluating Cross-Lingual A 4.9分后50% #自监督学习 📋 论文列表 🥇 FlowEdit: Associative Memory for Lifelong Pronunciation Adaptation in Flow-Matching TTS 10.0/10 | 创新 2/2 | 严谨 1.5/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.5/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-06-18

语音/音乐/音频论文速递 2026-06-18 共分析 36 篇论文 ⚡ 今日概览 📥 抓取 36 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 7篇 ███████ #多模态模型 5篇 █████ #语音合成 5篇 █████ #空间音频 1篇 █ #音乐生成 1篇 █ #模型评估 1篇 █ #声源定位 1篇 █ #音频生成 1篇 █ 📊 论文评分排行榜（36 篇，按分数降序）排名论文总分分档主任务 🥇 IndicContextEval: A Benchmark for Evaluating Context Ut 9.5分前25% #语音识别 🥈 Native Active Perception as Reasoning for Omni-Modal Un 9.1分前10% #语音识别 🥉 Who Wins the Conflict? Mechanistic Interpretability of 8.8分前25% #多模态模型 4. Generalised Transcoding Framework for Arbitrary Spatial 8.7分前50% #空间音频 5. Closing the Loop: PID Feedback Control for Interpretabl 8.7分前50% #音乐生成 6. GRIDEX: Grid-Grounded Forensic Explanations for Deepfak 8.6分前50% #语音合成 7. Continuous-Speech Parkinson's Disease Detection Usi 8.3分前25% - 8. Mitigating Scoring Errors and Compensating for Nonverba 8.0分前25% #多模态模型 9. A Survey of Methods for the Discretization of Phonograp 8.0分前50% - 10. Adaptive Speech-to-Spike Encoding for Spiking Neural Ne 8.0分前25% - 11. MagpieTTS-LF: Inference-Time Long-Form Speech Generatio 7.9分前25% #语音合成 12. Beyond AHI: An Interpretable Causal-Discovery-Guided Fr 7.9分前25% - 13. Evaluating Dynamic Range Compressor Models Using Contro 7.8分前50% #模型评估 14. NeuralMUSIC: A Hybrid Neural-Subspace Framework for Rob 7.8分前50% #声源定位 15. Fair Cognitive Impairment Detection Through Unlearning 7.7分前25% #多模态模型 16. Audio-to-Audio via Diffusion Warm Initialization 7.6分前25% #音频生成 17. FineCombo-TTS: Collaborative and Precise Controllable S 7.6分前25% #语音合成 18. Constraining to Generalize: Subspace Tuning for Few-sho 7.5分前25% #音频分类 19. Learning Robust Pair Confidence for Multimodal Emotion- 7.5分前50% #多模态模型 20. Montreal Forced Aligner and the state of speech-to-text 7.5分前25% #语音识别 21. Scoring Backends Matter More Than Pooling: A Systematic 7.4分前50% - 22. Reliable Neural-Codec Text-to-Speech by ASR Self-Verifi 7.4分前50% #语音合成 23. Reference-Driven Multi-Speaker Audio Scene Generation f 7.3分前50% #语音合成 24. QC-GAN: A Parameter-Efficient Quaternion Conformer GAN 7.1分前50% #语音增强 25. Augmenting Dysarthric Speech Severity Assessment with M 7.0分前50% #语音质量评估 26. Continuous Audio Thinking for Large Audio Language Mode 6.9分前50% - 27. Human-AI Coevolution Dynamics: A Formal Theory of Socia 6.7分前50% - 28. DASH: Dual-View Self-Distillation with Multi-Layer Hidd 6.6分前50% #语音识别 29. Reference-Based Recursive Least-Squares Mitigation of R 6.6分前50% - 30. Responsible ASR: Overcoming Challenges of Foundational 6.5分前50% #语音识别 31. Risk Stratification for ICU Delirium using Pervasive Am 6.5分前50% #多模态模型 32. ThinkDeception: A Progressive Reinforcement Learning Fr 6.3分前50% #强化学习 33. EMORSION: Examining the Impact of Audio Parameters on E 6.0分前50% - 34. Speech-Driven End-to-End Language Discrimination toward 5.8分前50% #语音识别 35. Low-resource Language Discrimination Towards Chinese Di 5.5分前50% #语音识别 36. SingFox: A Multi-Lingual Singfake Detection Corpus 5.4分后50% #语音伪造检测 📋 论文列表 🥇 IndicContextEval: A Benchmark for Evaluating Context Utilisation in Audio Large Language Models Across 8 Indic Languages 9.5/10 | 创新 1.5/2 | 严谨 1.3/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.0/1.5 | 开源 1.4/1.5 | 复现 0.5/0.5 | 工程 1.3/1.5 ...

语音/音乐/音频论文速递 2026-06-17

语音/音乐/音频论文速递 2026-06-17 共分析 35 篇论文 ⚡ 今日概览 📥 抓取 35 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 9篇 █████████ #语音合成 4篇 ████ #音频分类 3篇 ███ #语音增强 2篇 ██ #多模态模型 2篇 ██ #强化学习 1篇 █ #语音活动检测 1篇 █ #说话人验证 1篇 █ 📊 论文评分排行榜（35 篇，按分数降序）排名论文总分分档主任务 🥇 One-Step Token-to-Waveform Generation with MeanFlow in 9.3分前10% #语音合成 🥈 Synergizing Zero-Shot Cross-Lingual Alzheimer Detection 9.1分前25% - 🥉 When Multiple Scripts Matter: Evaluating ASR in Clinica 9.1分前10% #语音识别 4. Grounding Spoken LLMs in Multi-Speaker Audio via Diariz 8.5分前25% #语音识别 5. ELSA: Acoustic Event-Level Semantic Alignment for Fine- 8.5分前25% - 6. A 399uW 114.3 dB DR Companding Readout ASIC for MEMS Mi 8.2分前25% - 7. Are you speaking my languages? On spoken language adher 8.0分后50% #语音识别 8. From Signals to Patterns: Non-Invasive Tuberculosis Det 7.9分前25% - 9. Next-Turn: Duration-Aware Streaming Endpoint Detection 7.9分前50% #语音合成 10. Decision-Driven Geosteering Under Uncertainty: A Unifie 7.8分前50% #强化学习 11. Perceptual compensation for tonal context in self-super 7.7分前50% #语音识别 12. JoyAI-VL-Interaction: Real-Time Vision-Language Interac 7.7分前50% #语音合成 13. PhASE-Flow: Phonetic-Conditioned Acoustic Flow Matching 7.6分前25% #语音增强 14. Non-Autoregressive Minimum Bayes' Risk Decoding for 7.6分前25% - 15. SpeechDx: A Multi-Task Benchmark for Clinical Speech AI 7.6分前25% #语音识别 16. Vibrato Expression Control for Singing Voice Conversion 7.5分前25% - 17. Improving low-resource ASR using bilingual fine-tuning 7.5分前50% #语音识别 18. Turning music identification into a neural forward pass 7.4分前50% #音频分类 19. Direction of arrival estimation from distant microphone 7.3分前50% #语音活动检测 20. DeSRPA: Decoupled Speech Role-Playing Agent via Inferen 7.3分前50% #语音合成 21. L-Proto: Language-Aware Episodic Prototypical Training 7.1分前50% #说话人验证 22. Single frequency filtering based multi-speaker directio 7.0分前50% #语音增强 23. MLLP-VRAIN UPV system for the IWSLT 2026 Simultaneous S 6.9分前50% #语音识别 24. Reading between the Lines: Leveraging Large Language Mo 6.8分前50% #语音情感识别 25. A Closer Look at Failure Modes in Temporal Understandin 6.6分前50% #多模态模型 26. MVEB: Massive Video Embedding Benchmark 6.5分前50% #基准测试 27. Transductive Zero-Shot Audio Classification with Audio- 6.4分前50% #音频分类 28. A Neuromorphic Trigger for Efficient Audio Event Detect 6.2分前50% #音频事件检测 29. Learning task-specific subspaces via interventional pos 6.2分前50% #自监督学习 30. Embedded Machine Learning for Microcontroller-Class Edg 6.0分前50% - 31. Descriptor: Certus Caliber Classification Gunshot Datas 5.9分前50% #音频分类 32. AI-based Cognitive-linguistic Features for Dementia Ass 5.8分前50% #语音识别 33. An Analysis of the Effectiveness of Synthetic Speech Da 5.7分前50% #语音识别 34. OlfactProfile: Profile-Conditioned Odor Prediction from 5.6分前50% #多模态模型 35. Intelligibility of Speech in Noise: Investigating Contr 5.5分前50% - 📋 论文列表 🥇 One-Step Token-to-Waveform Generation with MeanFlow in Latent Space 9.3/10 | 创新 1.4/2 | 严谨 1.2/1.5 | 实验 1.1/1.5 | 清晰 1/1 | 影响 1.2/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.4/1.5 ...