📄 语音/音乐/音频论文速递

每日自动抓取 arxiv/huggingface 最新语音/音乐/音频AI 论文，AI 深度分析后发布

语音/音乐/音频论文速递 2026-07-24

语音/音乐/音频论文速递 2026-07-24 共分析 18 篇论文 ⚡ 今日概览 📥 抓取 18 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 4篇 ████ #语音交互 2篇 ██ #语音情感识别 2篇 ██ #多模态模型 1篇 █ #数据集 1篇 █ #语音伪造检测 1篇 █ #语音分离 1篇 █ #语音合成 1篇 █ 📊 论文评分排行榜（18 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 DONDO: Open w2v-BERT Speech-Recognition Base Models for 8.1分前25% 系统技术报告 #语音识别 🥈 Designed Vocalizations Dataset: Sound-Designed Human an 7.9分前25% 数据集与基准 #语音转换 🥉 VibeVoice-ASR-BitNet Technical Report 7.8分前25% 系统技术报告 #语音识别 4. Faster IndexTTS-2: Accelerating and Streaming Autoregre 7.6分前25% 系统技术报告 #语音合成 5. From Read Speech to Spoken Digits: A Task-Specific Eval 7.5分前25% 应用研究 #语音识别 6. Instruct-FD: Can Your Full-Duplex Speech System Follow 7.2分前50% 数据集与基准 #语音交互 7. OPOD: On-Policy Omni Distillation 7.1分前50% 方法研究 #多模态模型 8. X\(^3\)-OPD: Distilling Reasoning into Large Audio-Langua 7.1分前50% 方法研究 #音频理解 9. Toward Interpretable Speech Deepfake Detection using Ar 7.0分前50% 方法研究 #语音伪造检测 10. Toward Generalizable Cognitive Impairment Detection wit 7.0分前50% 方法研究 #语音情感识别 11. Safeguards for Speech2Speech LLM-Assistants: A Case Stu 6.5分前50% 系统技术报告 #语音交互 12. Investigating Codec-Internal Latent Audio Watermarking 6.4分前50% 系统技术报告 #音频水印 13. TF-MossFormer: Integrating Convolution Gated Local-Glob 6.3分前50% 模型报告 #语音分离 14. Phonetic forced alignment for low-resource language var 6.2分前50% 方法研究 #语音识别 15. SCoPE: Shift-Aware Speaker-Conditioned Priors for Emoti 6.0分前50% 方法研究 #语音情感识别 16. Word meaning co-determines vowel-inherent spectral chan 5.9分前50% 方法研究 #语音属性识别 17. An Evaluation Framework for Structured Audio Captions V 5.3分后50% 数据集与基准 #数据集 18. Improving the performance of an ASV system using hybrid 5.0分后50% 方法研究 #说话人验证 📋 论文列表 🥇 DONDO: Open w2v-BERT Speech-Recognition Base Models for African Languages 8.1/10 | 创新 1.2/2 | 严谨 1/1.5 | 实验 1/1.5 | 清晰 0.8/1 | 影响 1.3/1.5 | 开源 1.2/1.5 | 复现 0.3/0.5 | 工程 1.3/1.5 ...

语音/音乐/音频论文速递 2026-07-23

语音/音乐/音频论文速递 2026-07-23 共分析 21 篇论文 ⚡ 今日概览 📥 抓取 21 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音频理解 5篇 █████ #语音交互 2篇 ██ #语音唤醒 2篇 ██ #音乐生成 2篇 ██ #音频事件检测 2篇 ██ #Transformer 1篇 █ #大语言模型 1篇 █ #语音合成 1篇 █ 📊 论文评分排行榜（21 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 Ultra-Compact CNN Architectures for Tropical Bird Audio 9.3分前10% 系统技术报告 #音频事件检测 🥈 Multimodal Speaker Verification as a Threat to Speaker 9.2分前10% 方法研究 #说话人验证 🥉 A Diagnostic Evaluation Framework for AI-Generated Cove 8.0分前25% 数据集与基准 #音频质量评估 4. RIME: Enabling Large-Scale Agentic Post-Production 7.6分前25% 数据集与基准 #大语言模型 5. Efficient Chain-of-Modality Reasoning via Progressive C 7.6分前25% 方法研究 #语音交互 6. Learning the Arabic Dialect Continuum as a Continuous S 7.5分前25% 方法研究 #Transformer 7. Layer-Wise Decision Fusion for Fake Audio Detection Usi 7.5分前25% 方法研究 #音频理解 8. SimulS2ST-Omni: Data-Efficient Streaming Speech-to-Spee 7.3分前50% 系统技术报告 #语音翻译 9. Scalable Keyword Spotting via Modular Network Expansion 7.1分前50% 方法研究 #语音唤醒 10. StellarTTS: Sparse Temporal Embedding for Low-Latency a 7.0分前50% 系统技术报告 #语音合成 11. OmniReasoner: Thinking with Long Audio-Video via Native 6.9分前50% 方法研究 #音视频理解 12. RPPNet: Perceptually-Grouped Rhythm-Pitch Primitives fo 6.9分前50% 方法研究 #音乐生成 13. Audio-Zero: Label-Free Self-Evolution for Fine-Grained 6.8分前50% 方法研究 #音频理解 14. Pushing the Frontier of Full-Song Generation: Hierarchi 6.8分前50% 系统技术报告 #音乐生成 15. CAPS: A Cascaded Reconstruction Model to Power Saving i 6.6分前50% 系统技术报告 #语音增强 16. Validating the Single Item Kawaii Measure 6.4分前50% 方法研究 #音频理解 17. Cross-Subject Semantic Decoding with Shared-Space Align 6.3分前50% 方法研究 #语音交互 18. Improved Monitoring of Honey bee Colony Strength via Au 6.3分前50% 应用研究 #音频事件检测 19. The Giant Hippocampus: From Structural Monoculture to a 6.0分前50% 理论研究 #音频理解 20. Cumsum-Composable Phase Transport for Low-Cost Streamin 5.9分前50% 系统技术报告 #语音唤醒 21. Black-Box Optimization for Identifying and Inverting Au 4.0分后50% 方法研究 #音频理解 📋 论文列表 🥇 Ultra-Compact CNN Architectures for Tropical Bird Audio Detection on Microcontrollers 9.3/10 | 创新 1.3/2 | 严谨 1.1/1.5 | 实验 1.3/1.5 | 清晰 1/1 | 影响 1.1/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-07-22

语音/音乐/音频论文速递 2026-07-22 共分析 20 篇论文 ⚡ 今日概览 📥 抓取 20 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 5篇 █████ #语音合成 3篇 ███ #音频分类 2篇 ██ #基准测试 1篇 █ #语音交互 1篇 █ #语音分离 1篇 █ #语音增强 1篇 █ #语音情感识别 1篇 █ 📊 论文评分排行榜（20 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 Content is What Remains: Invariant Speech Tokenization 9.2分前10% 方法研究 #语音编码 🥈 Fusion Embedding: A Unified Embedding Space for Text, I 8.6分前25% 系统技术报告 #音频检索 🥉 End-to-End Markov State Sequence Learning for Auditory 8.3分前25% 方法研究 #语音交互 4. Staged Depth-Pruning Distillation of a Flow-Matching Te 7.9分前25% 系统技术报告 #语音合成 5. Constrained CTC Decoding for Efficient Diacritic Restor 7.7分前25% 方法研究 #语音识别 6. Fretiq: Browser-Native Electric Guitar String Classific 7.5分前25% 系统技术报告 #音频分类 7. MeetingToM: Evaluating Multimodal LLMs on Theory-of-Min 7.2分前50% 数据集与基准 #基准测试 8. Transcription Policy as a Latent Variable: Activating C 7.1分前50% 方法研究 #语音识别 9. Benchmarking Human and Automatic Speech Recognition of 7.0分前50% 系统技术报告 #语音识别 10. A Situational Speech Synthesizer for Yoruba: System Des 6.7分前50% 系统技术报告 #语音合成 11. From a Multilingual Streaming ASR Backbone to Kenyan-La 6.5分前50% 系统技术报告 #语音识别 12. Towards Array-Invariant Speech Enhancement via Geometry 6.3分前50% 方法研究 #语音增强 13. Comparing Spectrogram Front-Ends for Abnormal Heart-Sou 5.7分前50% 方法研究 #音频分类 14. EmoEUS: Uncertainty Supervision for Multimodal Emotion 5.6分前50% 方法研究 #语音情感识别 15. Summary of DCASE 2026 Task 5: Audio-Dependent Question 5.4分后50% 数据集与基准 #音频理解 16. Towards a reproducible cross-venue method for quantifyi 5.4分后50% 方法研究 #音频质量评估 17. CS-ETS: Chaos-Inspired Samba-Based EMG-To-Speech Synthe 5.3分后50% 方法研究 #语音合成 18. Addressing Limited Data in Auditory Attention Decoding 5.1分后50% 应用研究 #语音分离 19. What the Waveform Knows: Transparent-first Speech and A 4.8分后50% 系统技术报告 #语音识别 20. Teleportation Game: Quantum Teleportation in Multi-Agen 4.4分后50% 系统技术报告 #音乐生成 📋 论文列表 🥇 Content is What Remains: Invariant Speech Tokenization from Parallel Utterances 9.2/10 | 创新 1.5/2 | 严谨 1.5/1.5 | 实验 1.3/1.5 | 清晰 0.9/1 | 影响 1.2/1.5 | 开源 1.2/1.5 | 复现 0.3/0.5 | 工程 1.3/1.5 ...

语音/音乐/音频论文速递 2026-07-21

语音/音乐/音频论文速递 2026-07-21 共分析 34 篇论文 ⚡ 今日概览 📥 抓取 34 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音情感识别 3篇 ███ #音频理解 3篇 ███ #语音伪造检测 2篇 ██ #语音翻译 2篇 ██ #说话人验证 2篇 ██ #音频事件检测 2篇 ██ #基准测试 1篇 █ #多模态模型 1篇 █ 📊 论文评分排行榜（34 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 HARP: Harmonic-Aware Residual Partitioning for Neural A 9.6分前10% 方法研究 #音频编码 🥈 SALMONN-2: Advancing General-Purpose Hearing Abilities 9.4分前10% 模型报告 #音频理解 🥉 Pseudo-label distillation for discriminative anomalous 9.0分前10% 方法研究 #音频事件检测 4. ESCUCHA: A Spanish Speech Benchmark for Heterogeneous A 8.8分前25% 数据集与基准 #基准测试 5. RealDESED: A Real-World Domestic Sound Event Detection 7.9分前25% 数据集与基准 #音频事件检测 6. FlowSonic: Stable Zero-Shot Music Editing via High-Orde 7.9分前25% 方法研究 #音乐生成 7. Time-Frequency Consistency Learning for Robust Speech D 7.9分前25% 方法研究 #语音伪造检测 8. AMECxSV: Adaptive Metadata-Driven Embedding-Fusion Cali 7.8分前25% 方法研究 #说话人验证 9. X-Translator: A Real-Time Multilingual Speaker-Aware Sp 7.8分前25% 系统技术报告 #语音翻译 10. Dense-Sparse Dynamic Time Warping for Customizing Piano 7.8分前25% 系统技术报告 #音乐源分离 11. Do Speech Tokens Leak Voiceprints? Speaker Inversion At 7.7分前25% 方法研究 #说话人验证 12. Is One Score Enough? Assessing Singing Quality of Songs 7.6分前25% 方法研究 #音乐理解 13. FlashRT: Agent Harness for Guiding Agents to Deploy Rea 7.5分前25% 系统技术报告 #音视频生成 14. AI_LectureNote: A Retrospective Pilot Study of a Post-A 7.2分前50% 系统技术报告 #语音识别 15. Should Missing Modalities Always Be Necessary to Repair 7.0分前50% 方法研究 #多模态模型 16. Re-Sonance: A Dysarthric Asynchronous Real-Time Speech 6.9分前50% 系统技术报告 #语音转换 17. NABEATs: Noise-Aware Audio Representation Learning 6.7分前50% 方法研究 #音频理解 18. When to Use Extra Context: Evidence-Grounded Terminolog 6.7分前50% 系统技术报告 #语音翻译 19. How Reliable Are Multimodal Signals of Conversational S 6.6分前50% 方法研究 #鲁棒性 20. SSTMark: Robust Training-Free Semantic-Level Speech Wat 6.5分前50% 系统技术报告 #音频水印 21. The tttAI System for the TSA-ASR Task of the SmartGlass 6.5分前50% 系统技术报告 #说话人日志 22. Audio Cross Verification Using Dual Alignment Likelihoo 6.5分前50% 方法研究 #音频伪造检测 23. Component-Level Ensemble Fusion for Speech and Environm 6.4分前50% 系统技术报告 #语音伪造检测 24. Adaptive Momentum Enhanced Distributed Multichannel Act 6.3分前50% 应用研究 #音频理解 25. Robust Summarization of Doctor-Patient Conversations: T 6.3分前50% 系统技术报告 #语音交互 26. An Audio Language Model-Based Voice Concept Bottleneck 6.2分前50% 应用研究 #语音质量评估 27. FillGauss: Fine-Grained Filling-Aware Impact Sound Gene 6.2分前50% 方法研究 #音频生成 28. Harness TTS: Towards Context-Aware Expressive Speech Sy 6.2分前50% 方法研究 #语音合成 29. Modeling turn-taking with distant viewing: investigatin 6.2分前50% 系统技术报告 #音视频 30. Efficient Audio-Visual Event Recognition via Knowledge 5.8分前50% 方法研究 #音视频理解 31. Multi-Level Privacy-Preserving Dementia Detection from 5.5分前50% 方法研究 #语音属性识别 32. Explainable Lightweight Compact Deep Models for Speech 5.4分后50% 方法研究 #语音情感识别 33. Team RAS in 11th ABAW Competition: Multimodal Ambivalen 5.3分后50% 系统技术报告 #语音情感识别 34. EII-SCL: Harnessing Emotional Inertia for Multimodal Em 5.2分后50% 方法研究 #语音情感识别 📋 论文列表 🥇 HARP: Harmonic-Aware Residual Partitioning for Neural Audio Codecs 9.6/10 | 创新 1.4/2 | 严谨 1.3/1.5 | 实验 1.4/1.5 | 清晰 1/1 | 影响 1.2/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.3/1.5 ...

语音/音乐/音频论文速递 2026-07-20

语音/音乐/音频论文速递 2026-07-20 共分析 15 篇论文 ⚡ 今日概览 📥 抓取 15 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音视频理解 3篇 ███ #基准测试 2篇 ██ #语音识别 2篇 ██ #自回归模型 1篇 █ #语音交互 1篇 █ #语音合成 1篇 █ #语音质量评估 1篇 █ #说话人验证 1篇 █ 📊 论文评分排行榜（15 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 StemFX: Learning Mixing Style Representations via Autor 9.6分前10% 方法研究 #自回归模型 🥈 A Geometry-Limited Identification Floor and Its Consequ 8.8分前25% 方法研究 #说话人验证 🥉 Proof-Carrying Multimodal Timelines: Finite-Trace Modal 8.6分前25% 系统技术报告 #基准测试 4. A Study of Parallelizable Alternatives to Dynamic Time 8.1分前25% 系统技术报告 #基准测试 5. Estimating the Reliability of Dynamic Time Warping Alig 7.6分前25% 方法研究 #音乐理解 6. Controlling Implicit Shortcut Reliance in L2 Spoken Eng 7.5分前25% 方法研究 #语音质量评估 7. Segmental DTW: A Parallelizable Alternative to Dynamic 7.0分前50% 方法研究 #音频检索 8. AuEmoChat: Authentic Emotion Understanding and Renderin 6.9分前50% 方法研究 #语音合成 9. Constrained Hebbian Learning Supports Efficient Represe 6.7分前50% 方法研究 #音视频理解 10. SpeechGuard: Online Defense against Backdoor Attacks on 6.0分前50% 方法研究 #语音识别 11. Audio-Visual Flamingo: Open Audio-Visual Intelligence f 6.0分前50% 系统技术报告 #音视频理解 12. AV-JEPA: Extending LeJEPA to Audio-Visual Self-Supervis 5.7分前50% 方法研究 #音视频理解 13. Data-driven Video Codec with Implicit Neural Representa 5.3分后50% 系统技术报告 #音频编码 14. AnovaX: A Local, Multi-Agent Voice Assistant with LLM P 4.8分后50% 系统技术报告 #语音交互 15. Natural Backdoor Attacks on Speech Recognition Models 3.5分后50% 方法研究 #语音识别 📋 论文列表 🥇 StemFX: Learning Mixing Style Representations via Autoregressive FX Chain Prediction on Source-Separated Stems 9.6/10 | 创新 1.8/2 | 严谨 1.2/1.5 | 实验 1/1.5 | 清晰 0.8/1 | 影响 1.5/1.5 | 开源 1.5/1.5 | 复现 0.3/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-07-17

语音/音乐/音频论文速递 2026-07-17 共分析 15 篇论文 ⚡ 今日概览 📥 抓取 15 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音乐生成 3篇 ███ #多模态模型 2篇 ██ #语音合成 2篇 ██ #语音伪造检测 1篇 █ #语音分离 1篇 █ #音视频理解 1篇 █ #音视频生成 1篇 █ #音频事件检测 1篇 █ 📊 论文评分排行榜（15 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 Can Tokens Compete? Token Representations against Super 8.3分前25% 系统技术报告 #音频事件检测 🥈 SLT 2026 REAL-TSE Challenge: Real-world Target Speaker 8.1分前25% 系统技术报告 #语音分离 🥉 MIDI-RAE-JEPA: Hierarchical Representation Learning and 7.9分前25% 系统技术报告 #音乐生成 4. RW-Voice-EQ Bench: A Real World Benchmark for Evaluatin 7.9分前25% 数据集与基准 #语音合成 5. Dialogs: a studio-quality expressive conversational Rus 7.8分前25% 数据集与基准 #语音合成 6. WanSong v1.0 Technical Report 7.6分前25% 系统技术报告 #音乐生成 7. InCarEmo: A Multimodal Dataset for In-Cabin Emotion Rec 7.3分前50% 数据集与基准 #多模态模型 8. What does the model actually see? Evaluation protocols 7.2分前50% 方法研究 #音频质量评估 9. SceneBind: Binding What and Where Across Vision, Audio 6.6分前50% 方法研究 #音视频理解 10. ITGPT: A Transformer Based Architecture for the Generat 6.5分前50% 系统技术报告 #音乐生成 11. AlphaWiSE: Adaptive Weight Interpolation for Continual 6.4分前50% 方法研究 #音频检索 12. MultiRef-Compass: Towards Comprehensive Evaluation of M 6.3分前50% 数据集与基准 #音视频生成 13. Large Audio Language Models for Spoofing-Aware Speaker 6.2分前50% 方法研究 #语音伪造检测 14. Stop Thinking, Start Looking: Efficient Post-Training f 5.6分前50% 方法研究 #多模态模型 15. Video = World + Event Stream 4.9分后50% 系统技术报告 #音频理解 📋 论文列表 🥇 Can Tokens Compete? Token Representations against Supervised CNN Backbones for BirdCLEF+ 2026 8.3/10 | 创新 1.2/2 | 严谨 1/1.5 | 实验 1.3/1.5 | 清晰 1/1 | 影响 1/1.5 | 开源 1/1.5 | 复现 0.3/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-07-16

语音/音乐/音频论文速递 2026-07-16 共分析 20 篇论文 ⚡ 今日概览 📥 抓取 20 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音频理解 3篇 ███ #声源定位 2篇 ██ #音乐理解 2篇 ██ #音频分类 2篇 ██ #音频生成 2篇 ██ #语音情感识别 1篇 █ #语音翻译 1篇 █ #语音质量评估 1篇 █ 📊 论文评分排行榜（20 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 AVSCap: Orchestrating Audio-Visual Synergy for Omni-mod 9.2分前10% 方法研究 #音视频理解 🥈 MetaPerch: Learning from metadata for bioacoustics foun 9.0分前10% 方法研究 #音频分类 🥉 Auditing Protocol-Level Shortcuts in Large Audio Langua 8.2分前25% 系统技术报告 #语音质量评估 4. Self-supervised Speech Comparison for L2 Phone, Rhythm, 7.7分前25% 方法研究 #音频理解 5. Efficient Text-to-Audio Generation via Pruning 7.6分前25% 方法研究 #音频生成 6. From Prediction to Collaboration: Interactive Symbolic 7.5分前25% 系统技术报告 #音乐理解 7. Live Gurbani Tracking: A Benchmark and Reference System 7.4分前50% 系统技术报告 #音频字幕生成 8. Music-to-Dance Generation via Atomic Movements 7.4分前50% 方法研究 #音乐生成 9. Improving Text-to-Audio Instruction Following via Fine- 7.2分前50% 方法研究 #音频生成 10. Cover First, Disagree Softly: Rethinking Mismatch-First 6.7分前50% 方法研究 #音频事件检测 11. Rethinking Speech Foundation Model Fine-tuning: Better 6.7分前50% 方法研究 #语音情感识别 12. VIP-MINGLE: A Corpus for Videoconference and In-Person 6.5分前50% 数据集与基准 #音频理解 13. A Hybrid Mamba for Audio-Visual Navigation 6.3分前50% 方法研究 #声源定位 14. Greedy Volume Maximization of Gradient Embeddings for L 6.3分前50% 方法研究 #音频分类 15. From Continuous Deployment to Queryable Dataset: Teraby 6.1分前50% 系统技术报告 #音频理解 16. Adapting a Diffusion-Based Music Synthesis Model to Hum 6.0分前50% 方法研究 #语音转换 17. Genre Bias or Aesthetic Perception? Identifying and Mit 6.0分前50% 方法研究 #音乐理解 18. Do LLMs Need Architectural Changes for Simultaneous Spe 5.7分前50% 方法研究 #语音翻译 19. Bring Music The Horizon: Music-Driven 360\(^\circ\) Video 5.3分后50% 系统技术报告 #音视频生成 20. Task-Oriented Sensing and Covert Transmissions for Coll 4.9分后50% 方法研究 #声源定位 📋 论文列表 🥇 AVSCap: Orchestrating Audio-Visual Synergy for Omni-modal Video Captioning 9.2/10 | 创新 1.5/2 | 严谨 1.3/1.5 | 实验 1.5/1.5 | 清晰 0.9/1 | 影响 1.2/1.5 | 开源 1/1.5 | 复现 0.3/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-07-15

语音/音乐/音频论文速递 2026-07-15 共分析 25 篇论文 ⚡ 今日概览 📥 抓取 25 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音乐理解 3篇 ███ #声源定位 2篇 ██ #语音伪造检测 2篇 ██ #语音合成 2篇 ██ #语音增强 2篇 ██ #语音识别 2篇 ██ #说话人日志 2篇 ██ #音频事件检测 2篇 ██ 📊 论文评分排行榜（25 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 ChartGenEval: Corruption-Tested Multi-Dimensional Feedb 8.8分前25% 方法研究 #音乐生成 🥈 Contrasting statistical patterns in melodic and molecul 8.7分前25% 方法研究 #音乐理解 🥉 Open-Source Intelligence and Music Information Retrieva 7.9分前25% 应用研究 #音乐理解 4. HSEmotion Team at the 11th ABAW Challenge: Multi-Task L 7.9分前25% 系统技术报告 #音视频 5. Low-Latency Neural Models for Real-Time Music Enhanceme 7.7分前25% 系统技术报告 #音乐源分离 6. Do We Really Need Multimodal Emotion Language Models La 7.4分前50% 方法研究 #语音情感识别 7. ZipL-Dialog: Memory-Efficient Long-Form Spoken Dialog S 7.3分前50% 系统技术报告 #语音合成 8. The Sound of Absence: Audio-Language Embedding Models S 7.1分前50% 系统技术报告 #音频检索 9. Real-time Generation of Listener Nodding via Prediction 6.9分前50% 方法研究 #语音交互 10. Spatial-Frequency Cued Generative Fixed-Filter Active N 6.9分前50% 方法研究 #声源定位 11. UD-ASD: A Unified Diffusion Model for Anomalous Sound D 6.6分前50% 方法研究 #音频事件检测 12. Investigating the Integration of Spatial Information in 6.6分前50% 方法研究 #说话人日志 13. Segregate, Refine, Integrate: Decomposing Multimodal Fu 6.5分前50% 方法研究 #音频事件检测 14. AutoSIFT: Automatic Style Sifting for Controllable Spee 6.5分前50% 方法研究 #语音合成 15. Listen first: Output-based multi-microphone speech enha 6.4分前50% 方法研究 #语音增强 16. Neural Morphing: Sequence-Optimized Token-Level Morphin 6.4分前50% 系统技术报告 #音频编码 17. Hybrid Continual Learning for Low-Resource Australian A 6.3分前50% 方法研究 #语音识别 18. Explainable-by-Design Audio Deepfake Detection via Wien 6.1分前50% 方法研究 #语音伪造检测 19. Traceback Translators Against Forgetting in Continual F 6.0分前50% 方法研究 #语音伪造检测 20. Automated Synthesis of Facial Mechanisms for Conversati 5.9分前50% 系统技术报告 #音频理解 21. PolarBM: Complex-valued Boltzmann Machine for Modeling 5.8分前50% 方法研究 #语音增强 22. Audio-Native Speech Recognition with a Frozen Discrete- 5.7分前50% 方法研究 #语音识别 23. What is a Musical Scale? Regularity and Convention in t 5.6分前50% 理论研究 #音乐理解 24. DOA Estimation from One-Bit Magnitude-Only Measurements 5.1分后50% 方法研究 #声源定位 25. Audio Diarization: A New Paradigm for Exploring Audio R 4.5分后50% 方法研究 #说话人日志 📋 论文列表 🥇 ChartGenEval: Corruption-Tested Multi-Dimensional Feedback for Rhythm-Game Chart Generation 8.8/10 | 创新 1.7/2 | 严谨 1.3/1.5 | 实验 1.1/1.5 | 清晰 0.8/1 | 影响 0.6/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.3/1.5 ...

语音/音乐/音频论文速递 2026-07-14

语音/音乐/音频论文速递 2026-07-14 共分析 53 篇论文 ⚡ 今日概览 📥 抓取 53 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 5篇 █████ #音乐生成 5篇 █████ #音频理解 5篇 █████ #音频生成 4篇 ████ #多模态模型 3篇 ███ #语音伪造检测 3篇 ███ #语音分离 3篇 ███ #语音质量评估 3篇 ███ 📊 论文评分排行榜（53 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 Simple Features and Honest Calibration for Ambivalence 9.0分前10% 系统技术报告 #模型集成 🥈 PC-Mix: Partial-Component Audio Spoofing Detection unde 8.9分前25% 数据集与基准 #音频伪造检测 🥉 BeatEdit: Symbolic Music Generation as Explicit Editing 8.9分前25% 方法研究 #音乐生成 4. CHARM: Charge Calibration and Acoustic Rescue for LLM-b 8.8分前25% 方法研究 #提示学习 5. FdAudio: MeanFlow-Anchored Fréchet-Distance Post-Traini 8.6分前25% 方法研究 #音频生成 6. Evaluating SSL and ViViT Architectures for Cross-Corpus 8.3分前25% 系统技术报告 #语音质量评估 7. ECHOv2: Two-Level Band-Splitting Representation Learnin 8.2分前25% 方法研究 #音频事件检测 8. GigaAM Multilingual: Foundation Model for Underrepresen 8.1分前25% 系统技术报告 #语音识别 9. Evidence Subspace Projection: Measuring How Much Eviden 8.1分前25% 方法研究 #语音伪造检测 10. VoxENES 2026: Benchmarking Generalization of Speech Spo 8.1分前25% 数据集与基准 #语音伪造检测 11. WaveNet-Style Guitar Amplifier Model Pruning for Real-T 8.0分前25% 系统技术报告 #音频生成 12. TabPFN beyond Tabular Data: Calibration and Accuracy on 7.9分前25% 应用研究 #音频分类 13. ARIMA: Reconstruction-Grounded Predictive Representatio 7.7分前25% 方法研究 #自监督学习 14. Qwen-Audio-VAE Technical Report 7.7分前25% 系统技术报告 #音频编码 15. Local Multimodal Music Alignment from Global Supervisio 7.6分前25% 方法研究 #对比学习 16. MeloBottleneck: Self-Supervised Melody Skeleton Extract 7.5分前25% 方法研究 #音乐理解 17. Dance to Music Generation leveraging Pre-training with 7.5分前25% 方法研究 #音乐生成 18. GigaChat Audio: Time-aware Large Audio Language Model 7.4分前50% 系统技术报告 #音频理解 19. Difference-Driven Gating: Adaptive Feature Fusion for U 7.4分前50% 方法研究 #语音分离 20. BackgroundMellow: A Multi-Modal Cohesive Framework for 7.4分前50% 系统技术报告 #音频生成 21. Qwen-Music Technical Report 7.4分前50% 系统技术报告 #音乐生成 22. CoFi-Lite: Pushing the Limits of Ultra-Lightweight Spee 7.3分前50% 方法研究 #语音增强 23. MusicMark: A Robust Generative Watermarking Framework f 7.3分前50% 方法研究 #音频水印 24. Unified Gradient Projection: Language-Balanced Continua 7.2分前50% 方法研究 #语音识别 25. Data Augmentation for L2 English Speaking Assessment us 7.0分前50% 方法研究 #语音质量评估 26. A Production-Oriented Framework for Evaluation of SFX G 6.9分前50% 系统技术报告 #音频生成 27. Learn2Chat: Rethinking Dyadic Talking Heads via Interac 6.8分前50% 方法研究 #音视频生成 28. Tight-Frame Reconstruction for Acoustic Intensity Estim 6.8分前50% 理论研究 #声源定位 29. The SonicAGI System for the REAL-TSE Challenge 6.8分前50% 系统技术报告 #语音分离 30. Anysynth:Zero-Shot Instrument Cloning via In-Context Le 6.8分前50% 方法研究 #音乐生成 31. Where Speech Enhancement Hurts Recognition: An Inferenc 6.7分前50% 方法研究 #语音识别 32. Teaching Speech Enhancement Models to Sing: Domain Adap 6.7分前50% 方法研究 #音乐源分离 33. What You Train Is What You Get: Gender Bias, Training C 6.6分前50% 应用研究 #语音伪造检测 34. Listen to the Features: Voice Anonymization Driven by C 6.5分前50% 方法研究 #语音克隆 35. Efficiently Adapting Spoken Language Models for the Sin 6.5分前50% 系统技术报告 #语音交互 36. Which Languages Transfer Best to Warlpiri? A Similarity 6.5分前50% 应用研究 #语音识别 37. Encoder-Side Neuron Identification and Amplification fo 6.4分前50% 方法研究 #音频理解 38. Breaking the Quality–Intelligibility Trade-off in Stre 6.3分前50% 方法研究 #语音分离 39. An Objective Intelligibility Metric Evaluation on Spani 6.2分前50% 数据集与基准 #语音质量评估 40. Hearing Like Humans? Sound Symbolism and Perceptual Ali 6.1分前50% 方法研究 #多模态模型 41. Anamnesis: An Open-Source Platform for Large-Scale Back 6.1分前50% 系统技术报告 #提示学习 42. LOGOS: A Living Logic for AI Agent Teams That Evolve Wi 6.1分前50% 系统技术报告 #多模态模型 43. Verifier-Guided Twelve-Tone Composition: A Generate-Ver 6.0分前50% 系统技术报告 #音乐生成 44. MRUF: Multi-granularity Routing with Uncertainty-Aware 5.9分前50% 方法研究 #多模态模型 45. Omni-Decision: A Progressive Evidence-State Agent Syste 5.9分前50% 系统技术报告 #音频理解 46. Graph Representation of RaagBase: A Unique Dataset for 5.7分前50% 数据集与基准 #音乐理解 47. Synchronized Three-Dimensional Vocal-Tract Motion for S 5.7分前50% 系统技术报告 #语音合成 48. LightMem-Ego: Your AI Memory for Everyday Life 5.6分前50% 系统技术报告 #流式处理 49. Casting Everything to Online API Services? A Survey of 5.4分后50% 综述 #语音识别 50. A Closed-Form Noise-Sensitivity Asymmetry for Causal Br 5.3分后50% 理论研究 #音频理解 51. Semantic Sampling via Learnable Observation Front Ends 5.1分后50% 方法研究 #音频理解 52. Transcript-Free Lightweight Detection of Alzheimer’s Di 4.9分后50% 方法研究 #语音属性识别 53. Perceived Annoyance in Multi-source Electric Vehicle AV 3.5分后50% 应用研究 #音频质量评估 📋 论文列表 🥇 Simple Features and Honest Calibration for Ambivalence and Hesitancy Recognition in Video 9.0/10 | 创新 1.2/2 | 严谨 1.4/1.5 | 实验 1.5/1.5 | 清晰 0.9/1 | 影响 0.5/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-07-13

语音/音乐/音频论文速递 2026-07-13 共分析 14 篇论文 ⚡ 今日概览 📥 抓取 14 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 2篇 ██ #语音合成 2篇 ██ #音乐生成 2篇 ██ #音视频理解 2篇 ██ #音频理解 1篇 █ #多模态模型 1篇 █ #音视频语音识别 1篇 █ #语音分离 1篇 █ 📊 论文评分排行榜（14 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 Tokenizer Transplantation: Mitigating Autoregressive Co 8.8分前25% 方法研究 #语音识别 🥈 Phone Segmentation and Recognition through Phonological 7.7分前25% 方法研究 #语音识别 🥉 FreyaTTS Technical Report 7.7分前25% 系统技术报告 #语音合成 4. ReGen: Hierarchical Multi-Prompt Representation Generat 7.5分前25% 方法研究 #语音合成 5. Clean2FX: Label-conditioned modeling for clean-to-effec 7.3分前50% 系统技术报告 #音频理解 6. Event-Based Token Sequences for Audio-Conditioned Music 7.2分前50% 方法研究 #音乐生成 7. Dual-BEATs: Unlocking Zero-Shot Stereo Audio Perception 7.1分前50% 方法研究 #多模态模型 8. Optimal Transport-based Semantic Alignment for LLM-base 6.9分前50% 方法研究 #音视频语音识别 9. Technical Report for MERL’s Real-TSE Challenge Submissi 6.6分前50% 系统技术报告 #语音分离 10. SVF-CR: Synchronized Visual-Facial Cross-Refinement for 6.4分前50% 方法研究 #音视频理解 11. Beyond Time Shifts: Adapting Omni-LLM as a Reference-Fr 6.0分前50% 方法研究 #音视频理解 12. Wan-Dancer: A Hierarchical Framework for Minute-scale C 5.6分前50% 方法研究 #音乐生成 13. Tonnetz-Driven Graph Wedgelet for Harmonic Complexity R 5.3分后50% 方法研究 #音乐理解 14. Immersive Social Interaction with VR and LLM-Assisted H 4.7分后50% 系统技术报告 #语音交互 📋 论文列表 🥇 Tokenizer Transplantation: Mitigating Autoregressive Collapse in Edge-Efficient Bengali ASR 8.8/10 | 创新 1.5/2 | 严谨 1.3/1.5 | 实验 0.8/1.5 | 清晰 1/1 | 影响 1.2/1.5 | 开源 1.2/1.5 | 复现 0.3/0.5 | 工程 1.5/1.5 ...