语音/音频论文速递 2026-04-19
语音/音频论文速递 2026-04-19 共分析 42 篇论文 ⚡ 今日概览 📥 抓取 42 篇 → 🔬 深度分析完成 🏷️ 热门方向 方向 数量 分布 #音频理解 12篇 ████████████ #基准测试 10篇 ██████████ #音频大模型 9篇 █████████ #多模态模型 7篇 ███████ #信号处理 6篇 ██████ #强化学习 6篇 ██████ #自监督学习 6篇 ██████ #大语言模型 5篇 █████ 📊 论文评分排行榜(42 篇,按分数降序) 排名 论文 评分 🥇 ControlFoley: Unified and Controllable Video-to-Audio G 9.2分 🥈 ClariCodec: Optimising Neural Speech Codes for 200bps C 9.0分 🥉 X-VC: Zero-shot Streaming Voice Conversion in Codec Spa 9.0分 4 Why Your Tokenizer Fails in Information Fusion: A Timin 9.0分 5 Hijacking Large Audio-Language Models via Context-Agnos 8.8分 6 UniPASE: A Generative Model for Universal Speech Enhanc 8.5分 7 VoxSafeBench: Not Just What Is Said, but Who, How, and 8.5分 8 Who is Speaking or Who is Depressed? A Controlled Study 8.5分 9 ProSDD: Learning Prosodic Representations for Speech De 8.5分 10 MoshiRAG: Asynchronous Knowledge Retrieval for Full-Dup 8.5分 11 Four Decades of Digital Waveguides 8.5分 12 Audio-Cogito: Towards Deep Audio Reasoning in Large Aud 8.5分 13 An Ultra-Low Latency, End-to-End Streaming Speech Synth 8.5分 14 Listen, Pause, and Reason: Toward Perception-Grounded H 8.5分 15 Geo2Sound: A Scalable Geo-Aligned Framework for Soundsc 8.5分 16 SpotSound: Enhancing Large Audio-Language Models with F 8.5分 17 Beyond Transcription: Unified Audio Schema for Percepti 8.5分 18 CoSyncDiT: Cognitive Synchronous Diffusion Transformer 8.5分 19 Diffusion Language Models for Speech Recognition 8.5分 20 WavAlign: Enhancing Intelligence and Expressiveness in 8.5分 21 AVID: A Benchmark for Omni-Modal Audio-Visual Inconsist 8.5分 22 SpeakerRPL v2: Robust Open-set Speaker Identification t 8.3分 23 Towards Fine-grained Temporal Perception: Post-Training 8.3分 24 Room compensation for loudspeaker reproduction using a 8.2分 25 StreamMark: A Deep Learning-Based Semi-Fragile Audio Wa 8.2分 26 From Reactive to Proactive: Assessing the Proactivity o 8.2分 27 Elastic Net Regularization and Gabor Dictionary for Cla 8.2分 28 Sky-Ear: An Unmanned Aerial Vehicle-Enabled Victim Soun 8.0分 29 Contextual Biasing for ASR in Speech LLM with Common Wo 8.0分 30 A Manual Bar-by-Bar Tempo Measurement Protocol for Poly 7.8分 31 Classical Machine Learning Baselines for Deepfake Audio 7.8分 32 Adaptive Test-Time Scaling for Zero-Shot Respiratory Au 7.8分 33 Dual-Axis Generative Reward Model Toward Semantic and T 7.8分 34 Tora3: Trajectory-Guided Audio-Video Generation with Ph 7.8分 35 Few-Shot and Pseudo-Label Guided Speech Quality Evaluat 7.5分 36 VoxEffects: A Speech-Oriented Audio Effects Dataset and 7.5分 37 TokenSE: a Mamba-based discrete token speech enhancemen 7.5分 38 Audio Source Separation in Reverberant Environments usi 7.5分 39 On the Distillation Loss Functions of Speech VAE for Un 7.5分 40 Listening Deepfake Detection: A New Perspective Beyond 7.5分 41 Comparison of window shapes and lengths in short-time f 6.5分 42 Transformer Based Machine Fault Detection From Audio In 6.5分 📋 论文列表 🥇 ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling 🔥 9.2分 | #音频生成 #多模态模型 #扩散模型 #基准测试 | arxiv ...