语音/音乐/音频论文速递

语音/音乐/音频论文速递 2026-07-11

语音/音乐/音频论文速递 2026-07-11 共分析 1 篇论文 ⚡ 今日概览 📥 抓取 1 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音频事件检测 1篇 █ 📊 论文评分排行榜（1 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 HeadRoom: Lightweight, Edge-deployable Pipeline for Ada 7.2分前50% 系统技术报告 #音频事件检测 📋 论文列表 🥇 HeadRoom: Lightweight, Edge-deployable Pipeline for Adaptive Notification Routing 7.2/10 | 创新 1.3/2 | 严谨 1.1/1.5 | 实验 0.8/1.5 | 清晰 1/1 | 影响 0.5/1.5 | 开源 1.2/1.5 | 复现 0.1/0.5 | 工程 1.2/1.5 ...

语音/音乐/音频论文速递 2026-07-10

语音/音乐/音频论文速递 2026-07-10 共分析 19 篇论文 ⚡ 今日概览 📥 抓取 19 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 4篇 ████ #音乐转录 2篇 ██ #语音质量评估 2篇 ██ #多模态模型 2篇 ██ #音乐生成 1篇 █ #音频事件检测 1篇 █ #语音分离 1篇 █ #语音情感识别 1篇 █ 📊 论文评分排行榜（19 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 A Quantized Native Runtime for On-Device Semantic Audio 8.4分前25% 系统技术报告 #音乐生成 🥈 MuScriptor: An Open Model for Multi-Instrument Music Tr 8.3分前25% 系统技术报告 #音乐转录 🥉 A Self-Supervised Approach for Minimal-Annotation Hydro 8.3分前25% 系统技术报告 #音频事件检测 4. COALA: Robust Contextualized Speech-augmented Language 8.2分前25% 方法研究 #语音识别 5. PS4: Proxy-Supervised Joint Training for Real Target Sp 8.0分前25% 系统技术报告 #语音分离 6. MulTTiPop: A Multitrack Transcription Dataset for Pop M 7.7分前25% 数据集与基准 #音乐转录 7. SHAP-Weighted Cross-Modal Expert Fusion for Emotion and 7.7分前25% 方法研究 #语音情感识别 8. When Synthetic Speech Is All You Have: Better Call GRPO 7.7分前25% 方法研究 #语音识别 9. Structural Bottlenecks on Frequency Representation in E 7.6分前25% 方法研究 #音频生成 10. A Reliability Assessment of LALM Audio Judges for Full- 7.1分前50% 系统技术报告 #语音质量评估 11. Inverse-designed meta processing units for multi-task n 6.9分前50% 系统技术报告 #音频理解 12. Multimodal Unlearning Across Vision, Language, Video, a 6.9分前50% 综述 #多模态模型 13. Best-of-$N$ TTS Evaluation is Confounded by ASR Family 6.7分前50% 方法研究 #语音质量评估 14. Why Do You Say It Like That? A Phoneme-Level Framework 6.5分前50% 方法研究 #语音伪造检测 15. It Takes Few to TANGO: A Quantized Distributed Model fo 6.5分前50% 系统技术报告 #语音增强 16. On the Role of Conversational Timing in Synthetic Train 6.4分前50% 方法研究 #语音识别 17. Diarization-Guided Qwen-ASR Adaptation for Multilingual 5.7分前50% 系统技术报告 #语音识别 18. Multimodal Digital Biomarker for Asthma: Complementary 5.3分后50% 应用研究 #多模态模型 19. Vidu S1: A Real-Time Interactive Video Generation Model 5.2分后50% 系统技术报告 #音视频交互 📋 论文列表 🥇 A Quantized Native Runtime for On-Device Semantic Audio Generation 8.4/10 | 创新 1.3/2 | 严谨 1.3/1.5 | 实验 1/1.5 | 清晰 0.8/1 | 影响 1.2/1.5 | 开源 1/1.5 | 复现 0.3/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-07-09

语音/音乐/音频论文速递 2026-07-09 共分析 13 篇论文 ⚡ 今日概览 📥 抓取 13 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 4篇 ████ #音乐理解 2篇 ██ #基准测试 1篇 █ #语音交互 1篇 █ #语音情感识别 1篇 █ #语音活动检测 1篇 █ #音乐生成 1篇 █ #说话人验证 1篇 █ 📊 论文评分排行榜（13 篇，按分数降序）排名论文总分分档主任务 🥇 MMGenre: Benchmarking Singing Voice Synthesis across Mu 8.3分前25% #基准测试 🥈 Decoupling Conversational Dynamics in Full-Duplex Spoke 8.2分前25% #语音交互 🥉 MADB: A Large-Scale Music Aesthetics Dataset with Profe 8.1分前25% #音乐理解 4. Gradient-Based Speech-to-Text Alignment for Any ASR Mod 7.3分前50% #语音识别 5. UBG-Net: An Uncertainty-aware Bayesian Gating Network f 7.1分前50% #语音识别 6. Compress the Cache, Not the Speech Embedding: KV Compre 7.0分前50% #语音识别 7. Audio Sentiment Analysis via Distillation and Cross-Mod 6.9分前50% #语音情感识别 8. Multimodal Voice Activity Projection for Turn-Taking in 6.7分前50% #语音活动检测 9. Extending Xenakis: From Architectural Geometry to Sonif 5.6分前50% #音乐生成 10. Text-Independent Speaker Verification Using Discrete Au 5.2分后50% #说话人验证 11. Transformer-based segmentation of prosodic boundaries i 4.0分后50% #语音识别 12. Rag Classification of Tagore Songs using Symbolic Music 3.0分后50% #音乐理解 13. EscFOA: Enhancing Spatial Learning for Visually Impaire 2.8分后50% #教育 📋 论文列表 🥇 MMGenre: Benchmarking Singing Voice Synthesis across Multiple Musical Genres 8.3/10 | 创新 1.5/2 | 严谨 1.2/1.5 | 实验 1.1/1.5 | 清晰 1/1 | 影响 0.8/1.5 | 开源 1.5/1.5 | 复现 0.2/0.5 | 工程 1/1.5 ...

语音/音乐/音频论文速递 2026-07-08

语音/音乐/音频论文速递 2026-07-08 共分析 26 篇论文 ⚡ 今日概览 📥 抓取 26 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音属性识别 3篇 ███ #音频分类 3篇 ███ #语音合成 3篇 ███ #语音识别 3篇 ███ #声源定位 2篇 ██ #音乐生成 2篇 ██ #语音交互 1篇 █ #音频事件检测 1篇 █ 📊 论文评分排行榜（26 篇，按分数降序）排名论文总分分档主任务 🥇 Hierarchical Acoustic-Semantic Modeling: Modality Separ 9.2分前10% #语音交互 🥈 Propose and Attend: Training-free MLLM Grounding Confid 8.2分前25% #音频事件检测 🥉 Music I Care About: Automated Multimodal Benchmarking o 7.8分前25% #音乐理解 4. Escaping the Procrustean Bed: Groupwise Orthogonal Conn 7.8分前25% #语音属性识别 5. TriA Pipeline: A Large-Scale Automatic Audio Annotation 7.4分前50% #音频分类 6. InsideSSL: Understanding Self-Supervised Speech Represe 7.4分前50% #语音属性识别 7. Precise Video-to-Audio Generation with Cross-Modal Alig 7.4分前50% #音视频生成 8. WordVoice: Explicit and Decoupled Multi-Dimensional Wor 7.2分前50% #语音合成 9. ForestIR: Physics-Informed Forest Sound Simulation for 7.2分前50% #声源定位 10. Uncovering Latent Depression Severity for Binary Depres 7.0分前50% #音视频理解 11. Determinantal point process sampling for bioacoustic ac 6.9分前50% #音频分类 12. From Sinhala to Dhivehi: Cross-Lingual Transfer Learnin 6.6分前50% #语音识别 13. Goodbye Equal Error Rate, Hello Local Information Discl 6.5分前50% #语音转换 14. BlueMagpie-TTS: A Token-Efficient Tokenizer, Language M 6.5分前50% #语音合成 15. Fréchet Distance Loss on Speech Representations for Tex 6.5分前50% #语音合成 16. NAVER LABS System Re-implementation for the IWSLT 2026 6.4分前50% #语音翻译 17. Few-Shot Class-Incremental Audio Classification Using P 6.3分前50% #音频分类 18. Gemma 4 Technical Report 6.2分前50% #语音识别 19. Revisiting the Relation Between Language Model Perplexi 6.0分前50% #语音识别 20. Multimodal Video-to-Music Recommendation via Semantic R 5.4分后50% #音乐检索 21. Designing Maintainable Hybrid Generative Systems: A Qua 5.3分后50% #音乐生成 22. Learning-based Physics-Constrained Neural Kernel for So 5.2分后50% #声源定位 23. Distributed Multichannel Wiener Filtering for Topology- 5.1分后50% #语音增强 24. Flow Matching-Based Speech Source Separation with Best- 4.9分后50% #语音分离 25. Umm… With Transformers? Insights from Filled Pause Us 4.8分后50% #语音属性识别 26. From Textural Counterpoint to Feature Encoding: A Multi 2.1分后50% #音乐生成 📋 论文列表 🥇 Hierarchical Acoustic-Semantic Modeling: Modality Separation and Semantic Coherence for Full-Duplex SLMs 9.2/10 | 创新 1.8/2 | 严谨 1.2/1.5 | 实验 1.1/1.5 | 清晰 0.8/1 | 影响 1.3/1.5 | 开源 1.2/1.5 | 复现 0.3/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-07-07

语音/音乐/音频论文速递 2026-07-07 共分析 58 篇论文 ⚡ 今日概览 📥 抓取 58 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 11篇 ███████████ #语音伪造检测 5篇 █████ #音频理解 4篇 ████ #语音交互 3篇 ███ #音频事件检测 3篇 ███ #语音转换 3篇 ███ #音视频理解 3篇 ███ #语音合成 3篇 ███ 📊 论文评分排行榜（58 篇，按分数降序）排名论文总分分档主任务 🥇 Doppelganger: Sound Effects and Their Synthetic Twins 9.1分前10% #音频检索 🥈 SPEARBench: A Benchmark for Naturalness Evaluation in S 8.9分前25% #语音交互 🥉 Metronome: Bound the Cache, Keep the Beat for Real-Time 8.7分前25% #语音交互 4. Auto-AEG: Scalable Data Construction for Open-Vocabular 8.3分前25% #音频事件检测 5. RABBiT: Rapidly adaptive BOLD foundation model via brai 8.1分前25% #音频理解 6. TRACE-EVC: Text-Guided Relative Affective Control for Z 8.0分前25% #语音转换 7. Parallelized Autoregressive Decoding for Omni-Modal Den 8.0分前25% #音视频理解 8. Speaker-Disentangled Chunk-Wise Regression for Syllabic 7.9分前25% #语音编码 9. Speaker-Aware Temporal Aggregation Strategies on Segmen 7.9分前25% #语音属性识别 10. REDDIT: Correcting Model-Generated Timestamp Drift in A 7.8分前25% #语音识别 11. Deriving Benchmarking Datasets from Long-Form Recording 7.7分前25% #基准测试 12. ProPS: Prompted Profile Synthesis for Natural Language- 7.6分前25% #语音合成 13. DELTA-TTS: Adapting Autoregressive Model into Diffusion 7.5分前25% #语音合成 14. TokAN: Accent Normalization Using Self-Supervised Speec 7.5分前25% #语音转换 15. Listen, Think, Transcribe: Continuous Latent Test-Time 7.5分前25% #语音识别 16. $C^3$ASD: Multi-Level Consistency-Driven Representation 7.5分前25% #音视频理解 17. Training-Free Model Selection and Domain-Aware Score Ca 7.3分前50% #音频事件检测 18. CHILDES-Aligned: A Curated Children's Speech Datase 7.2分前50% #语音识别 19. Taste-aware music retrieval from audio embeddings 6.9分前50% #音乐检索 20. Lights, Camera, Carbon: Architectural Scaling Laws for 6.9分前50% #音视频生成 21. Unified Audio Intelligence Without Regressing on Text I 6.8分前50% #音频交互 22. Ranking the Impact of Contextual Specialization in Neur 6.7分前50% #语音增强 23. SynSFX: Multi-Model Sound Effects Synthesis Dataset for 6.5分前50% #音频伪造检测 24. Evaluating the Effect of Linguistic Relatedness on Cros 6.5分前50% #语音识别 25. MOSAIC: Interpretable Multi-Token Cross-Attention of Bi 6.3分前50% #语音伪造检测 26. CARD: Cross-component Audio Representation Distillation 6.3分前50% #音频字幕生成 27. Probing Low-Level Acoustic Attribute Encoding in CLAP A 6.2分前50% #音频理解 28. Trajectory Variance: AnUnsupervised Measure of Developm 6.2分前50% #音频理解 29. Adaptive Diversity-Uncertainty Active Learning with Red 6.2分前50% #音频事件检测 30. Adaptive Loss Balancing for Multi-Task Bioacoustic Clas 6.1分前50% #音频分类 31. An Intervention-Based Framework for Shortcut Diagnosis 6.1分前50% #语音伪造检测 32. QuaSR: Quality-Aware Sample Reweighting for Pacific Ind 6.0分前50% #语音识别 33. CaReCoS: A Spectrogram based Visual Benchmark for Cardi 6.0分前50% #音频理解 34. Open-Set Source Tracing as Compositional Factors via St 6.0分前50% #语音伪造检测 35. Context-Aware ASR for Mandarin Technical Lectures 6.0分前50% #语音识别 36. Streaming Neural Speech Codecs through Time-Invariant R 6.0分前50% #语音编码 37. Physiological Noise Augmentation Improves Non-Invasive 6.0分前50% #语音识别 38. DuplexChat: Constructing Speaker-Separated Full-Duplex 5.9分前50% #语音交互 39. Noisy Environment Adaptation of Neural Speech Codec via 5.9分前50% #语音增强 40. NouveauVoice: Generating Novel Pseudo Speakers for Voic 5.9分前50% #语音转换 41. OmniFocus: Query-Guided Modality-Balanced Token Compres 5.9分前50% #音视频问答 42. Jointly Improving Dialect Identification and ASR in Ind 5.8分前50% #语音识别 43. S-DiverSe: Spanish Diverse Speech 5.8分前50% #语音识别 44. Towards Robust Uncertainty-Aware Speaker Modeling 5.7分前50% #说话人验证 45. Towards Language-Agnostic Speech Inversion 5.6分前50% #语音属性识别 46. Layer-wise Cross-Lingual Depression Detection from Spee 5.5分前50% #语音情感识别 47. Wan-Streamer v0.2: Higher Resolution, Same Latency 5.4分后50% #音视频交互 48. Mixture-Constrained Max Pooling Improves Separation-Bas 5.3分后50% #音频分类 49. Reinforcement Learning for Data-Efficient Code-Switched 5.3分后50% #语音识别 50. Physics-Informed Direction-of-Arrival Estimation Over D 5.3分后50% #声源定位 51. Sampling Bias Compensation for Robust Evaluation of Aud 4.9分后50% #音频分类 52. UniSkip-Mamba: A Frequency-Aware State Space Model for 4.8分后50% #音视频理解 53. Progressive Refinement: An Iterative Pseudo-Labeling Ap 4.6分后50% #语音识别 54. Weakly Guided and Autoregressive Beamformer Parameteriz 4.3分后50% #语音分离 55. DETECT-3B-Omni is Agnostic of Content and Demographics 4.2分后50% #语音伪造检测 56. Towards Digital Preservation of Efik: TTS for a Low-Res 4.0分后50% #语音合成 57. Quantum-Inspired Harmonic Decision Models: A Computatio 2.3分后50% #音乐生成 58. Information-Geometric Superposed Vowel Evaluation: Part 1.9分后50% #语音伪造检测 📋 论文列表 🥇 Doppelganger: Sound Effects and Their Synthetic Twins 9.1/10 | 创新 1.5/2 | 严谨 1.4/1.5 | 实验 1.4/1.5 | 清晰 0.9/1 | 影响 0.8/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.1/1.5 ...

语音/音乐/音频论文速递 2026-07-06

语音/音乐/音频论文速递 2026-07-06 共分析 1 篇论文 ⚡ 今日概览 📥 抓取 1 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音视频交互 1篇 █ 📊 论文评分排行榜（1 篇，按分数降序）排名论文总分分档主任务 🥇 VisionAId: An Offline-First Multimodal Android Assistan 6.6分前50% #音视频交互 📋 论文列表 🥇 VisionAId: An Offline-First Multimodal Android Assistant for People with Visual Impairment, Featuring Personalized Object Retrieval 6.6/10 | 创新 1.2/2 | 严谨 1/1.5 | 实验 0.5/1.5 | 清晰 0.8/1 | 影响 0.3/1.5 | 开源 1/1.5 | 复现 0.3/0.5 | 工程 1.5/1.5 ...

ICML 2026 语音/音频论文详细分析

ICML 2026 语音/音频论文详细分析共分析 137 篇 ICML 2026 论文 🎯 任务分类点击任务标签查看该方向所有论文：音视频理解（18篇）音视频生成（10篇）音频分类（9篇）音频理解（8篇）音乐生成（8篇）语音合成（8篇）音视频问答（8篇）语音识别（5篇）语音伪造检测（4篇）语音交互（4篇）语音增强（4篇）语音编码（4篇）多模态模型（3篇）音频伪造检测（3篇）音频分离（2篇）空间音频（2篇）音频编码（2篇）音频修复（2篇）语音属性识别（2篇）音频生成（2篇） ⚡ 会议概览 📥 ICML 2026 接收 6341 篇论文 → 🔍 关键词 + LLM 筛选 137 篇音频/语音/音乐相关 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音视频理解 18篇 ██████████████████ #音视频生成 10篇 ██████████ #音频分类 9篇 █████████ #音频理解 8篇 ████████ #音乐生成 8篇 ████████ #语音合成 8篇 ████████ #音视频问答 8篇 ████████ #语音识别 5篇 █████ #语音伪造检测 4篇 ████ #语音交互 4篇 ████ #语音增强 4篇 ████ #语音编码 4篇 ████ #多模态模型 3篇 ███ #音频伪造检测 3篇 ███ #音频分离 2篇 ██ 📊 论文评分排行榜（137 篇，按分数降序）排名论文评分分档主任务 🥇 TimeChat-Captioner: Scripting Multi-Scene Videos with T 9.4分前10% #音视频理解 🥈 Joint Enhancement and Classification using Coupled Diff 9.3分前10% #语音识别 🥉 Learning Tight Rejection Boundaries without Negatives f 9.3分前10% #语音伪造检测 4. AVTrack: Audio-Visual Tracking in Human-centric Complex 9.3分前10% #音视频理解 5. A Semantically Consistent Dataset for Data-Efficient Qu 9.2分前10% #音频分离 6. SAM Audio: Segment Anything in Audio 9.2分前10% #音频分离 7. MECAT: A Multi-Experts Constructed Benchmark for Fine-G 9.1分前10% #音频理解 8. $\tau$-Voice: Benchmarking Full-Duplex Voice Agents on 9.1分前10% #语音交互 9. PhaseCoder: Microphone Geometry-Agnostic Spatial Audio 8.7分前25% #空间音频 10. BAT: Better Audio Transformer Guided by Convex Gated Pr 8.6分前25% #音频分类 11. SPEAR: A Unified SSL Framework for Learning Speech and 8.4分前25% #音频理解 12. Dual-View Predictive Diffusion: Lightweight Speech Enha 8.4分前25% #语音增强 13. Unlocking Cross-Modal Biosignal Synthesis: A Temporally 8.3分前25% - 14. CoLA: Cross-Modal Low-rank Adaptation for Multimodal Do 8.3分前25% #音视频理解 15. Speech-Audio Compositional Attacks on Multimodal LLMs a 8.3分前25% #音频理解 16. MoST: Mixing Speech and Text with Modality-Aware Mixtur 8.2分前25% - 17. IVQ: Structured and Lightweight Vector Quantization via 8.2分前25% #音频编码 18. Spherical Procrustes Alignment for Reliable Medical Aud 8.2分前25% #音频分类 19. Attend to Anything: Foundation Model for Unified Human 8.2分前25% #音视频理解 20. VocSim A Training-free Benchmark for Zero-shot Content 8.2分前25% #音频检索 21. JAEGER: Joint 3D Audio-Visual Grounding and Reasoning i 8.1分前25% #声源定位 22. LALM-as-a-Judge: Benchmarking Large Audio-Language Mode 8.1分前25% #语音交互 23. Pianist Transformer: Towards Expressive Piano Performan 8.1分前25% #音乐生成 24. Simultaneous Speech-to-Speech Translation Without Align 8.0分前25% #语音翻译 25. PHALAR: Phasors for Learned Musical Audio Representatio 8.0分前25% #音乐生成 26. Optimality of FSQ Tokens for Continuous Diffusion for C 8.0分前25% #语音合成 27. SonicMaster: Towards Controllable All-in-One Music Rest 8.0分前25% #音频修复 28. Do Audio LLMs Listen or Read? Analyzing and Mitigating 8.0分前25% #语音属性识别 29. Multiple Choice Learning of Low-Rank Adapters for Langu 8.0分前25% #多模态模型 30. Bridging the Stability-Expressivity Gap: Synthetic Data 8.0分前25% #语音合成 31. FutureOmni: Evaluating Future Forecasting from Omni-Mod 8.0分前25% #音视频问答 32. Acoustic Interference: A New Paradigm Weaponizing Acous 8.0分前25% #音频理解 33. ReGen: Hierarchical Multi-Prompt Representation Generat 8.0分前25% #语音编码 34. DiscoForcing: A Unified Framework for Real-Time Audio-D 8.0分前25% #音乐生成 35. DreamID-Omni: Unified Framework for Controllable Human- 8.0分前25% #音视频生成 36. AgentSteerTTS: A Multi-Agent Closed-Loop Framework for 7.9分前25% #语音合成 37. STAR-VAE: Structured Topology-Aware Regularization for 7.9分前25% #音频生成 38. HyperPotter: Spell the Charm of High-Order Interactions 7.9分前25% #音频伪造检测 39. T2AV-Compass: Towards Unified Evaluation for Text-to-Au 7.9分前25% #音视频生成 40. Decoupling The “What” and “Where” With Polar Coordinate 7.8分前25% #音乐生成 41. V-LynX: Token Interface Alignment for Video+X LLMs 7.8分前25% #音视频问答 42. Ariadne’s Thread of LipSync: Unraveling Forgeries via I 7.8分前25% #音视频理解 43. SONAR: Spectral‑Contrastive Audio Residuals for General 7.8分前25% #语音伪造检测 44. TMD-Bench: A Multi-Level Evaluation Paradigm for Music– 7.7分前25% #音视频生成 45. AudioMosaic: Contrastive Masked Audio Representation Le 7.7分前25% #音频分类 46. BFCL Audio: An Audio Function Calling Evaluation for La 7.7分前25% #语音交互 47. SALSA-V: Shortcut-Augmented Long-form Synchronized Audi 7.6分前25% #音视频生成 48. BEAT: Tokenizing and Generating Symbolic Music by Unifo 7.6分前25% #音乐生成 49. From Inpainting to Editing: Unlocking Robust Mask-Free 7.6分前25% #扩散模型 50. Hearing Without Noticing? Attention-Aware Stealthy Blac 7.6分前25% #语音识别 51. AVGen-Bench: A Task-Driven Benchmark for Multi-Granular 7.6分前25% #音视频生成 52. Alethia: a Foundational Encoder for Voice Deepfakes 7.6分前25% #语音伪造检测 53. AG-REPA: Causal Layer Selection for Representation Alig 7.6分前25% #语音合成 54. AVI-Bench: Toward Human-like Audio-Visual Intelligence 7.6分前25% #音视频理解 55. Two-dimensional quantization for geometry-aware audio c 7.6分前25% #语音编码 56. Abstraction Induces the Brain Alignment of Language and 7.5分前25% #语音编码 57. Self-Guidance: Enhancing Neural Codecs via Decoder Mani 7.5分前25% #语音编码 58. OmniVideo-R1: Reinforcing Audio-visual Reasoning with Q 7.5分前25% #音视频问答 59. Listening Through the Noise: Cauchy-Driven Diffusion Br 7.4分前50% #音频修复 60. MoshiRAG: Asynchronous Knowledge Retrieval for Full-Dup 7.4分前50% - 61. Omni-Perception Policy Optimization for Multimodal Emot 7.4分前50% #音视频理解 62. video-SALMONN S: Memory-Enhanced Streaming Audio-Visual 7.3分前50% #音视频问答 63. Group Cognition Learning: Making Everything Better Thro 7.3分前50% #音视频理解 64. REST: Diffusion-based Real-time End-to-end Streaming Ta 7.3分前50% #音视频生成 65. PhoStream: Benchmarking Real-World Streaming for Omnimo 7.3分前50% #音视频问答 66. ProactiveLLM: Learning Active Interaction for Streaming 7.2分前50% #语音识别 67. Stream RAG: Instant and Accurate Spoken Dialogue System 7.2分前50% #流式处理 68. Probing Cross-modal Information Hubs in Audio-Visual LL 7.2分前50% #音视频理解 69. Efficient Multi-modal Dataset Distillation via Analytic 7.2分前50% #对比学习 70. Self-Supervised Flow Matching for Scalable Multi-Modal 7.2分前50% #音视频生成 71. CoCoEmo: Composable and Controllable Human-Like Emotion 7.1分前50% #语音合成 72. Scaling Transformers for End-to-End Discrete Audio Toke 7.1分前50% #音频编码 73. Query-Based Asymmetric Modeling with Decoupled Input–Ou 7.1分前50% #语音增强 74. OmniSIFT: Modality-Asymmetric Token Compression for Eff 7.1分前50% #音视频问答 75. Sparse Autoencoders for Interpretable Emotion Control i 7.0分前50% #语音合成 76. The Silent Thought: Modeling Internal Cognition in Full 7.0分前50% #知识蒸馏 77. Hidden in Plain Tokens: Simply Robust, Gradient-Free Wa 7.0分前50% #音频水印 78. Efficient Distributed MLLM Training with Cornstarch 7.0分前50% #音视频理解 79. Reasoning LLM Improves Speaker Recognition in Long-form 7.0分前50% - 80. Real-World Unsupervised Models Generalize to Predict Br 6.9分前50% #模型评估 81. From Talking to Singing: A New Challenge for Audio-Visu 6.9分前50% #音视频理解 82. OmniShow: Unifying Multimodal Conditions for Human-Obje 6.9分前50% #音视频生成 83. E-VAds: An E-commerce Short Videos Understanding Benchm 6.9分前50% #音视频问答 84. STARCaster: Spatio-Temporal AutoRegressive Video Diffus 6.8分前50% #音视频生成 85. Zero-Shot Rankability: Revealing Latent Ordinal Structu 6.8分前50% #音视频理解 86. An Exterior Method for Nonnegative Matrix Factorization 6.8分前50% #音频分类 87. FoeGlass: Simple In-Context Learning Is Enough for Red 6.8分前50% #语音伪造检测 88. Native Active Perception as Reasoning for Omni-Modal Un 6.8分前50% #音视频理解 89. Unlocking Speech–Text Compositional Powers: Instruction 6.7分前50% #语音交互 90. UltraLIF: Fully Differentiable Spiking Neural Networks 6.7分前50% #音频分类 91. Towards Streaming Synchronized Spatial Audio Generation 6.6分前50% #音视频生成 92. TextME: Bridging Unseen Modalities Through Text Descrip 6.6分前50% - 93. Evaluating and Rewarding LALMs for Expressive Role-Play 6.6分前50% #语音合成 94. PADS-TAL: Padding-Annealed Diffusion Sampling in Text-A 6.6分前50% #音乐生成 95. ADEPT: RL-Aligned Agentic Decoding of Emotion via Evide 6.5分前50% #语音情感识别 96. Universal Algorithm-Implicit Learning 6.5分前50% #音频分类 97. SARSteer: Safeguarding Large Audio Language Models via 6.5分前50% - 98. MetaPerch: Learning from metadata for bioacoustics foun 6.5分前50% #音频分类 99. Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Mus 6.5分前50% #音乐生成 100. CMI-RewardBench: Evaluating Music Reward Models with Co 6.4分前50% #音乐生成 101. Multimodal Fact-Level Attribution for Verifiable Reason 6.4分前50% #音频理解 102. MedMosaic: A Challenging Large Scale Benchmark of Diver 6.4分前50% #音频理解 103. INFER: Learning Implicit Neural Frequency Response Fiel 6.4分前50% #空间音频 104. Characterizing the Predictive Impact of Modalities with 6.4分前50% - 105. PCRNet: Phase-aware Complex Refinement Network for EEG- 6.4分前50% #实时处理 106. OmniFit: Bridging Modalities via Layer-Adaptive Token C 6.3分前50% #音视频理解 107. EchoingPixels: Aliasing-Resistant Joint Token Reduction 6.3分前50% #音视频理解 108. Quaternion Self-Attention with Shared Scores 6.3分前50% #语音增强 109. LightAVSeg: Lightweight Audio-Visual Segmentation 6.3分前50% #模型压缩 110. SURF: Separation via Unsupervised Remixing Flow 6.2分前50% #语音分离 111. Neural-Inspired Modeling of Auditory Selection and Comp 6.2分前50% #音视频语音分离 112. AuTAgent: A Reinforcement Learning Framework for Tool-A 6.2分前50% #音频理解 113. Multimodal Latent Language Modeling with Next-Token Dif 6.1分前50% #语音合成 114. FakeWorld 1.0: An Omni-modal Benchmark for Fake Media a 6.1分前50% #可解释性 115. ConsMSA: Semantic Distribution Consistency Learning for 6.1分前50% #多模态模型 116. MusicDET: Zero-Shot AI-Generated Music Detection 6.1分前50% #音频伪造检测 117. Convex Low-resource Accent-Robust Language Detection in 6.0分前50% #语音识别 118. NeuroCLUS: A Foundation Model with Functional Clusterin 6.0分前50% #语音识别 119. Sparse Tokens Suffice: Jailbreaking Audio Language Mode 5.9分前50% #模型剪枝 120. Scaling Behavior in Model Fine-tuning for Audio DeepFak 5.9分前50% #音频伪造检测 121. Bioacoustic Geolocation: Species Sounds as Geographic S 5.8分前50% #音频理解 122. AudioChat: Unified Audio Storytelling, Editing, and Und 5.8分前50% #音频生成 123. Omni-Diffusion: Unified Multimodal Understanding and Ge 5.8分前50% - 124. Robust Signal Enhancement via Fractional Detail Views a 5.7分前50% #语音增强 125. Multimodal Fusion via Self-Consistent Task-Gradient Fie 5.5分前50% #鲁棒性 126. NAACA: Training-Free NeuroAuditory Attentive Cognitive 5.5分前50% #音频事件检测 127. Language Model Augmented Semi-Supervised Statistical In 5.4分后50% #语音属性识别 128. MER-DG: Modality-Entropy Regularization for Multimodal 5.4分后50% #音视频理解 129. Towards Understanding Modality Interaction in Multimoda 5.3分后50% #音视频理解 130. Stable Spectral Copula Alignment for Robust Multimodal 5.2分后50% #鲁棒性 131. Multimodal Meta-Verifier with Explicit Structured Recal 5.2分后50% #多模态模型 132. WaveSSM: Multiscale State-Space Models for Non-stationa 4.8分后50% #音频分类 133. Efficient, Property-Aligned Fan-Out Retrieval via RL-Co 4.7分后50% #音乐检索 134. VIBE: Disentangling Social Dynamics via Kinematics-Info 4.6分后50% - 135. UniFLoW: Universal Multi-Modal Federated LoRA Fine-Tuni 4.4分后50% #音视频问答 136. Rethinking Attention in Spiking Transformers: Overcomin 3.6分后50% #音频分类 137. PRIM：Cooperative Dynamic Token Compression for Efficien 3.6分后50% #音视频理解 📋 论文列表 🥇 TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions 🔥 9.4/10 | 前10% | #音视频理解 | 创新 1.7/2 | 严谨 1.3/1.5 | 实验 1.4/1.5 | 影响 0.9/1.5 | 开源 1.5/1.5 ...

语音/音乐/音频论文速递 2026-07-03

语音/音乐/音频论文速递 2026-07-03 共分析 31 篇论文 ⚡ 今日概览 📥 抓取 31 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音频分类 4篇 ████ #声源定位 4篇 ████ #语音识别 4篇 ████ #语音交互 3篇 ███ #语音合成 3篇 ███ #音视频理解 2篇 ██ #语音增强 2篇 ██ #音乐理解 1篇 █ 📊 论文评分排行榜（31 篇，按分数降序）排名论文总分分档主任务 🥇 Unlocking Speech-Text Compositional Powers: Instruction 8.5分前25% #语音交互 🥈 Decomposer: Learning to Decompile Symbolic Music to Pro 8.4分前25% #音乐理解 🥉 A global predicted-fMRI drive signal from TRIBE does no 7.7分前25% #音视频理解 4. Cross Domain Few-Shot Class-Incremental Audio Classific 7.4分前50% #音频分类 5. Self-Supervised Test-Time Tuning for Packet Loss Concea 7.4分前50% #音频修复 6. Reasoning LLM Improves Speaker Recognition in Long-form 7.2分前50% #音视频理解 7. SelectTSL: Prompt-Guided Selective Target Sound Localiz 7.1分前50% #声源定位 8. Enhancing Acoustic-to-Articulatory Inversion with Multi 7.0分前50% #语音交互 9. TurnNat: Automatic Evaluation of Turn-Taking Naturalnes 7.0分前50% #语音交互 10. Audio-Based Understanding of Audiobook Narration Appeal 6.9分前50% #语音属性识别 11. H-SAGE: Holistic Speaker-Aware Guided Experts for MoE-b 6.9分前50% #语音识别 12. An Efficient vLLM-Based Inference Pipeline for Unified 6.8分前50% #语音合成 13. Few-Shot Open-Set Audio Classification Using Attention 6.8分前50% #音频分类 14. Beyond Words: Towards Effective Modeling of Non-Verbal 6.4分前50% #语音识别 15. LMPAN: A Lightweight Multi-Path Alignment Network for J 6.2分前50% #语音增强 16. NAVER LABS Europe Submission to the Instruction-followi 6.2分前50% #语音翻译 17. Pmeta-TLA: Backdoor Attacks for Speech Classification M 6.0分前50% #语音唤醒 18. Neural Audio Codec with Adjustable Token Temporal Resol 5.8分前50% - 19. SPARCLE: SPeaker-aware Aligned Representations via Cont 5.8分前50% #语音合成 20. Speaker head orientation estimation with a single micro 5.8分前50% #声源定位 21. Towards a Phonology-Informed Evaluation of Multilingual 5.7分前50% #语音质量评估 22. Rethinking Speech-LLM Integration for ASR: Effective Jo 5.6分前50% #语音识别 23. RT-Tango: Real-Time Distributed Binaural Speech Enhance 5.5分前50% #语音增强 24. Quantifying the Uncertainty of Blindly Estimated Room E 5.2分后50% #音频检索 25. CNN Models for Microphone Array Covariance Matrix Upsam 5.0分后50% #声源定位 26. A Multi-Branch Hierarchy-Aware Framework for Heterogene 4.9分后50% #音频分类 27. From Monolingual to Multilingual: Evaluating Mamba for 4.8分后50% #语音识别 28. DRL-CLBA: A Clean Label Backdoor Attack for Speech Clas 4.7分后50% #音频分类 29. Spatial Speech Perception Systems: A Survey of Sound So 4.1分后50% #声源定位 30. UT-AISTimprt submission for ICME 2026 Grand Challenge o 4.1分后50% #音乐生成 31. Using embeddings to predict spoken word duration and pi 4.0分后50% #语音合成 📋 论文列表 🥇 Unlocking Speech-Text Compositional Powers: Instruction-Following Speech Language Models without Instruction Tuning 8.5/10 | 创新 1.6/2 | 严谨 1.3/1.5 | 实验 1.3/1.5 | 清晰 0.8/1 | 影响 1.2/1.5 | 开源 1.1/1.5 | 复现 0.4/0.5 | 工程 0.8/1.5 ...

语音/音乐/音频论文速递 2026-07-02

语音/音乐/音频论文速递 2026-07-02 共分析 16 篇论文 ⚡ 今日概览 📥 抓取 16 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音频理解 3篇 ███ #说话人验证 2篇 ██ #语音合成 2篇 ██ #语音识别 1篇 █ #音视频理解 1篇 █ #语音增强 1篇 █ #语音情感识别 1篇 █ #音乐生成 1篇 █ 📊 论文评分排行榜（16 篇，按分数降序）排名论文总分分档主任务 🥇 NPUsper: Eliminating Redundant Computation for Real-Tim 9.0分前10% #语音识别 🥈 AV-SyncBench: Decoupled Benchmarking of Temporal and Se 8.5分前25% #音视频理解 🥉 ORCA: Open-ended Response Correctness Assessment for Au 7.9分前25% #音频理解 4. AmbiDrop: Ambisonics-Based Array-Agnostic Neural Speech 7.5分前25% #语音增强 5. From Objectives to Applications: Aligning Architectural 7.5分前25% #音频理解 6. Positive-Incentive Noise Predictor for Adversarial Puri 7.4分前50% #说话人验证 7. Automatic Detection of Stress from Speech in the Trier 7.4分前50% #语音情感识别 8. Enhancing Flow Matching with A Unified Guidance Framewo 7.1分前50% #语音合成 9. MG-RWKV: Multi-Grained Context-Aware RWKV for Temporal 6.9分前50% - 10. A Text-Steerable Instrument for Sketching Procedural So 6.8分前50% #音乐生成 11. A Geometric Perspective on Composable Emotion Steering 6.6分前50% #语音合成 12. Do Multimodal Large Language Models Need Reasoning to C 6.5分前50% #语音属性识别 13. Evaluating Pretrained Music Embeddings for Cross-Perfor 5.8分前50% #音乐检索 14. Disentangling Speaker and Language Effects in Cross-Lin 5.6分前50% #说话人验证 15. Adaptive Perturbation Selection for Contrastive Audio D 5.3分后50% #音频理解 16. Speech Playground: An Interactive Tool for Speech Analy 4.1分后50% - 📋 论文列表 🥇 NPUsper: Eliminating Redundant Computation for Real-Time Whisper on Mobile NPUs 9.0/10 | 创新 1.4/2 | 严谨 1.4/1.5 | 实验 1.0/1.5 | 清晰 0.8/1 | 影响 1.2/1.5 | 开源 1.2/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-07-01

语音/音乐/音频论文速递 2026-07-01 共分析 35 篇论文 ⚡ 今日概览 📥 抓取 35 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 8篇 ████████ #语音合成 7篇 ███████ #自监督学习 2篇 ██ #音频分类 2篇 ██ #生成模型 2篇 ██ #语音情感识别 2篇 ██ #数据集 1篇 █ #知识蒸馏 1篇 █ 📊 论文评分排行榜（35 篇，按分数降序）排名论文总分分档主任务 🥇 Dilemmadata: On the Interoperability of Heterogeneous R 10.0分前50% #数据集 🥈 SwiftAudio: Data-Efficient Caption-Only Distillation fo 10.0分前50% #知识蒸馏 🥉 Attacking UTMOS: Probing the Robustness of a Speech Qua 8.6分前25% #语音质量评估 4. Enhancing BEST-RQ Pseudo-Label Quality through Online R 8.6分前50% #语音识别 5. Linguistic Bias Mitigation for Spoofing Detection via G 8.6分前25% #自监督学习 6. Building an ASR Solution for Training and Assessing Chi 8.5分前50% #语音识别 7. Beyond Cross-Reconstruction: Probing-Based Disentanglem 8.1分前50% #语音编码 8. MuseBench: Benchmarking Intent-Level Audiovisual Arts U 7.9分前50% #语音合成 9. Detecting Audio Deepfakes on the Edge:Lightweight SSL-B 7.7分前25% - 10. Beyond Binary Instrument QA: Probing Instrument Groundi 7.6分前25% #音频分类 11. SyncCache: Exploiting Asymmetric Dynamics for Fast Audi 7.5分前25% #语音合成 12. Probing-Guided Layer Selection from Self-Supervised Spe 7.5分前25% #集成学习 13. A First Exploration of Neuromorphic OT-CFM for Multi-Sp 7.5分前25% #生成模型 14. LuxEmo: Expressive Text-to-Speech Corpus for Luxembourg 7.5分前25% #语音合成 15. A Fair and Transparent Framework for Speech-Based Depre 7.4分前50% #语音情感识别 16. ALM2Vec: Learning Audio Embeddings for Universal Audio 7.4分前50% #音频检索 17. ASR-Agnostic Multimodal Spectrotemporal Modeling for Ea 7.4分前50% #多模态模型 18. UniSAE: Unified Speech Attribute Editing on Speaker, Em 7.3分前50% #语音合成 19. Tone-Conditioned Curriculum Learning for Low-Resource B 7.3分前50% #语音识别 20. What Counts as an Error? Dual-Reference Benchmarking fo 7.3分前50% #语音识别 21. Is Natural Always Appropriate? Investigating Naturalnes 7.2分前25% #语音合成 22. FlexiSLM: A Dynamic and Controllable Frame Rate Spoken 7.2分前25% #语音合成 23. ZEBRA: Zero-Shot Entropy-Regularized Prompt Learning fo 7.1分前50% #音频分类 24. Preserving Speech-to-Text LLM Capabilities in Speech-to 7.0分前50% #语音识别 25. Listening Between the Lines: Joint Learning of ASR Embe 7.0分前50% #数据增强 26. BEST-RQ-2: Contextualize-Then-Predict, a Two-Step Appro 6.9分前50% #语音识别 27. Improving multichannel speech enhancement through accur 6.8分前50% #语音增强 28. Amplifying Membership Signal Through Chained Regenerati 6.6分前50% #生成模型 29. AVTok: 1D Unified Tokenization for Holistic Audio-Video 6.5分前25% #语音合成 30. LOPA: Enhancing Spoken Language Assessment via Latent O 6.2分前50% #低资源 31. Adapting Foundation ASR Models to Dysarthric Speech: A 6.2分前50% #语音识别 32. How Bilingual Are SSL Speech Models? Cross-Lingual Prob 5.8分前50% #自监督学习 33. Gated Multi-Graph Fusion via Graph Attention Networks f 5.2分后50% #语音情感识别 34. Building a Multimodal Dataset of Academic Paper for Key 5.2分后50% #语音识别 35. Reference-Based Prosody and Rhythm Evaluation for Spoke 4.7分后50% #语音对话系统 📋 论文列表 🥇 Dilemmadata: On the Interoperability of Heterogeneous Roman Numeral Datasets 10.0/10 | 创新 2/2 | 严谨 1.5/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.5/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...