语音/音乐/音频论文速递

语音/音乐/音频论文速递 2026-06-16

语音/音乐/音频论文速递 2026-06-16 共分析 62 篇论文 ⚡ 今日概览 📥 抓取 62 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 9篇 █████████ #语音合成 6篇 ██████ #多模态模型 5篇 █████ #自监督学习 4篇 ████ #音频生成 3篇 ███ #生成模型 2篇 ██ #语音生成 2篇 ██ #音乐信息检索 2篇 ██ 📊 论文评分排行榜（62 篇，按分数降序）排名论文总分分档主任务 🥇 TuneJury: An Open Metric for Improving Music Generation 9.7分前25% #多模态模型 🥈 Acoustic, VOC, and Multimodal Stress Source Localizatio 9.7分前50% #声源定位 🥉 VoxWatermark: A Large-Scale Benchmark for Audio Waterma 9.4分前50% #鲁棒性 4. Phonetically Explainable Speech Deepfake Detection 9.0分前50% #语音伪造检测 5. FreeSonic: Training-Free Temporal-Aware Decoupled Atten 9.0分前25% #音频生成 6. MambAdapter: Lightweight Mamba-Based Adapters for Param 8.9分前25% #语音识别 7. XAI-Grounded Explanation Generation for Speech Deepfake 8.9分前25% #多模态模型 8. Unified Audio Generation and Editing via Joint Conditio 8.7分前25% #音频生成 9. AdaTT: Text-Guided Instrument Timbre Transfer with Targ 8.7分前25% #音频生成 10. DuraMark: Duration-Embedded Watermarking in LLM-based T 8.7分前25% #生成模型 11. When the Same Musical Knowledge Forgets Differently: A 8.6分前10% - 12. Probing Low Frame Rate Degradation in Neural Audio Code 8.6分前25% #语音生成 13. Rhythm of the Deep: A Computational-Linguistic Test of 8.5分前25% #自监督学习 14. Beyond Artifacts: Towards Generalizable Synthetic Song 8.4分前25% #音乐信息检索 15. Acoustic Prompting via Stage-wise Modulation for Few-Sh 8.3分前50% #音频分类 16. ArtNet: A JEPA-Like Articulatory Predictive Framework f 8.3分前50% #语音识别 17. MatchLM2Lite: A Scalable MLLM-to-Lite Framework for Rep 8.3分前25% #音频分类 18. Bridging the SEA Gap: An Initial Benchmark for Neural A 8.2分前25% #语音合成 19. An Empirical Study on Learning Latent Representations f 8.2分后50% #语音合成 20. From Physics to Representation: Audio Learning with Syn 8.2分前25% #自监督学习 21. An Asymmetric Formula for Interval Consonance and its R 8.0分前25% #音乐信息检索 22. Universal adaptive beamforming: A Bayesian approach 8.0分前50% #自适应滤波 23. Learning Input-Channel Permutation Equivariance for Mul 7.9分前50% #音乐源分离 24. Stabilizing Short Duration Speaker Verification through 7.9分前50% #说话人验证 25. AUDEDIT: Inversion-Free Text-Guided Editing with Pretra 7.8分前25% #生成模型 26. Interpretable and Frugal Learning Systems Employing Mul 7.8分前25% - 27. MuVAP: Multimodal Multiparty Voice Activity Projection 7.8分前25% #语音对话系统 28. Dynamic Prosody Prediction in LLM-based TTS for Improvi 7.6分前25% #语音合成 29. Scaling Human and G2P Supervision for Robust Phonetic T 7.6分前25% #语音识别 30. SPRI: SVD-Partitioned Residual Initialization for Data- 7.6分前25% #语音翻译 31. CraBERT: Efficient Phoneme Encoder Pre-Training via Cas 7.5分前50% #语音合成 32. Pixel-TTS: Image based Text Rendering for Robust Text-t 7.5分前50% #语音合成 33. AP-GRPO: Anchor-Gated Phonetic Alignment with Policy Op 7.4分前50% #语音识别 34. Spectro-Temporal Interference Confounds Phase Encoding 7.4分前50% #自监督学习 35. Teacher-Student Structure for Domain Adaptation in Ense 7.4分前50% #多模态模型 36. SciText2Eq: Assessing LLMs for Explainable Equation Gen 7.3分前50% #大语言模型 37. Confidence Score Guided Incremental and Speaker Adaptiv 7.2分前50% #语音识别 38. Geometrically Constrained Decentralized Independent Vec 7.2分前50% #语音分离 39. Dual-Granularity Orthogonal Disentanglement for General 7.2分前50% #课程学习 40. Data-Driven Decoding of Russell's Circumplex Model 7.2分前50% #语音情感识别 41. Connecting Speech to Words through Images 7.1分前50% #无监督学习 42. Bridging the Usability Gap: Lessons from Interpreting S 7.1分前50% #语音翻译 43. TMASC: Transmasculine Attitude and Speech Corpus 7.0分前50% - 44. MUNI: Multimodal Unified Latent Diffusion for Coherent 6.9分前50% #语音生成 45. Decoding while Adapting: Zero-Shot Online Speaker Adapt 6.8分前50% #语音识别 46. Joycent: Diffusion-based Accent TTS without Accented Ph 6.8分前50% #语音合成 47. Semi-Supervised Speech Confidence Detection using Pseud 6.8分前50% - 48. Robust Spoofed Speech Detection via Temporal Pyramid Mo 6.7分前50% #音频深度伪造检测 49. From Awareness to Adherence: Bridging the Context Gap i 6.7分前50% #语音识别 50. ArtBoost: Synthetic Articulatory Data Augmentation for 6.5分前50% #语音识别 51. DDPO-VC: Speaker De-Identification via Diffusion Denois 6.5分前50% #语音转换 52. NVMOS: Non-Verbal Vocalization Quality Assessment in Sp 6.2分前50% #自监督学习 53. Unifying Acoustic Features and Text with Multimodal LLM 6.2分前50% #多模态模型 54. ROMPAR: Morphological Completion and Demographic Unlear 6.2分前50% #语音识别 55. EChO-Agent: Evidence Chain Orchestration Agent for Audi 6.1分前50% #音频问答 56. Beyond Classification: A Cough Regression Benchmark for 6.0分前50% #音频事件检测 57. Towards Robust Generative Speech Enhancement Using Vect 5.9分前50% #语音增强 58. Fast When, Careful Who: Dual-Process Multiparty Turn-Ta 5.9分前50% #语音活动检测 59. MAF: Multimodal Adaptive Few-shot Prompting for Sentime 5.9分前50% #多模态模型 60. An auscultation location specific study on the relation 5.8分前50% - 61. Closed-Loop Triplet Synergistic Generation for Long-For 5.5分前50% - 62. LLM-Based Synthetic Ground Truth Generation for Audio-B 5.3分后50% #数据增强 📋 论文列表 🥇 TuneJury: An Open Metric for Improving Music Generation Preference Alignment 9.7/10 | 创新 1.5/2 | 严谨 1.3/1.5 | 实验 1.4/1.5 | 清晰 1.0/1 | 影响 1.5/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.0/1.5 ...

语音/音乐/音频论文速递 2026-06-15

语音/音乐/音频论文速递 2026-06-15 共分析 26 篇论文 ⚡ 今日概览 📥 抓取 26 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 4篇 ████ #语音合成 4篇 ████ #说话人识别 3篇 ███ #数据增强 2篇 ██ #音频问答 2篇 ██ #语音增强 1篇 █ #音乐信息检索 1篇 █ #强化学习 1篇 █ 📊 论文评分排行榜（26 篇，按分数降序）排名论文总分分档主任务 🥇 Listening with Attention: Entropy-Guided Explainability 9.6分前25% #语音识别 🥈 MaskedFOP: Polyglot Speaker Identification under Missin 9.2分前25% #说话人识别 🥉 HIDVAS: A Hearing Instrument Dataset in Various Acousti 9.0分前25% #语音增强 4. BayLing-Duplex: Native Full-Duplex Speech Dialogue with 9.0分前10% #语音合成 5. Moonlight in Latent Space: Chirality and Structural Cor 8.7分前50% #音乐信息检索 6. Who Spoke When in Multi-Conversation: Target Speaker Ta 8.6分前50% #说话人识别 7. Learning to Hear Hesitation: Continual Learning for Dis 8.3分前25% #语音识别 8. The Holistic Storage of Verb+Up Phrases in Text-based a 8.2分前50% #语音识别 9. OmniVideo-100K: A Dataset for Audio-Visual Reasoning th 8.2分前50% #数据增强 10. Orchestra-o1: Omnimodal Agent Orchestration 8.1分前50% #强化学习 11. Unsupervised Approaches for Global Prosodic Embedding E 7.8分前25% #语音合成 12. Instantaneous Pitch Estimation via Wave-U-Net-Based Fun 7.7分前25% #数据增强 13. A Deep Zero-Inflated Model of North Atlantic Right Whal 7.6分前50% #概率图模型 14. FAConformer: Frequency-Aware Convolutional Transformer 7.5分前25% #Transformer 15. From Self-Supervised Speech Models to Mixture-of-Expert 7.5分前50% #自监督学习 16. The Perceived Fragility of Explanations in Audio Models 7.5分前25% - 17. A Multi-Domain Feature Fusion Framework for Generalizab 7.4分前50% #多模态模型 18. AudioDER: A Deduplication-Enhanced Reasoning Dataset fo 7.3分前50% #音频问答 19. Beyond task performance: Decoding bioacoustic embedding 7.1分前50% - 20. Explainable and Trustworthy Speech Emotion Recognition 7.0分前50% #语音情感识别 21. FoleyGenEx: Unified Video-to-Audio Generation with Mult 7.0分前50% #语音合成 22. Spatio-Temporal Audio Language Modeling for Dynamic Sou 6.9分前25% #音频问答 23. Mask, Sample, Revise: A Revisable CTMC Inference Stack 6.8分前25% #语音合成 24. MoDiCoL: A Modular Diagnostic Continual Learning Datase 6.5分前50% #语音识别 25. Multimodal Speaker Identification in Classroom Environm 6.0分前50% #说话人识别 26. Efficiency-Performance Trade-offs in Neural Speaker Dia 5.1分后50% #说话人日志 📋 论文列表 🥇 Listening with Attention: Entropy-Guided Explainability for Transformer-Based Audio Models 9.6/10 | 创新 1.5/2 | 严谨 1.4/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.5/1.5 | 开源 1.0/1.5 | 复现 0.5/0.5 | 工程 1.2/1.5 ...

语音/音乐/音频论文速递 2026-06-12

语音/音乐/音频论文速递 2026-06-12 共分析 27 篇论文 ⚡ 今日概览 📥 抓取 27 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音合成 6篇 ██████ #语音识别 4篇 ████ #音频分类 2篇 ██ #语音翻译 2篇 ██ #语音增强 2篇 ██ #音频生成 1篇 █ #多模态模型 1篇 █ #说话人识别 1篇 █ 📊 论文评分排行榜（27 篇，按分数降序）排名论文总分分档主任务 🥇 Self-Guidance: Enhancing Neural Codecs via Decoder Mani 9.7分前25% #语音合成 🥈 Ontology Memory-Augmented ASR Correction for Long Text- 9.6分前25% #语音识别 🥉 Emo-LiPO: Listwise Preference Optimization for Fine-Gra 9.3分前50% #语音合成 4. AudioX-Turbo: A Unified Framework for Efficient Anythin 9.0分前10% #音频生成 5. M*: A Modular, Extensible, Serving System for Multimoda 8.9分前25% #多模态模型 6. Decoding Insect Song: A Multitask Semisupervised Orthop 8.7分前50% #音频分类 7. Missing-Token Prompted Reliability-Aware Fusion for Rob 8.6分前25% #说话人识别 8. Leveraging Audio-LLMs to Filter Speech-to-Speech Traini 8.4分前25% #语音翻译 9. Endpoint Anticipation for Low-Latency Spoken Dialogue 8.2分前25% #多任务学习 10. A Dual-Mode Faust-to-CLAP Compilation System 8.1分前50% - 11. PRISM: Prosody-Integrated Multi-Agent Reasoning Framewo 8.1分前25% #语音合成 12. Positional Encoding in the Context of Memristor-Based A 8.0分前50% #语音识别 13. From Tokens to Faces: Investigating Discrete Speech Rep 7.9分前25% #语音合成 14. Low-Latency Real-Time Audio Game Commentary System via 7.9分前25% #语音合成 15. MiniMax Sparse Attention 7.7分前25% #高效推理 16. BASENet: Band-Adapted Speech Enhancement Network with C 7.5分前50% #语音增强 17. Dolph2Vec: Self-Supervised Representations of Dolphin V 7.2分前50% #音频分类 18. Balancing ASR and diarization in end-to-end LLMs for mu 7.1分前50% #语音识别 19. NaturalFlow: Reducing Disruptive Pauses for Natural Spe 7.0分前50% #语音翻译 20. Adaptive Turn-Taking for Real-time Multi-Party Voice Ag 6.7分后50% #数据增强 21. Predicting Cognitive Load from Speech and Interaction D 6.7分前50% #语音情感识别 22. PiDA: Phonetically-Informed Data Augmentation for Robus 6.5分前50% - 23. Generating Training Targets for Real-World Speech Enhan 6.4分前50% #语音增强 24. Towards Personalized Federated Learning for Dysarthric 6.2分前50% #语音识别 25. The Moving Drone: Negotiating Agency Between the Voice 6.0分前50% - 26. Generative Modeling of Bach-Style Symbolic Music: A Com 5.7分前50% #音乐生成 27. Vocal Identity Under Siege by AI Voice Cloning Technolo 3.2分前50% #语音合成 📋 论文列表 🥇 Self-Guidance: Enhancing Neural Codecs via Decoder Manifold Alignment 9.7/10 | 创新 1.6/2 | 严谨 1.2/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.2/1.5 | 开源 1.4/1.5 | 复现 0.5/0.5 | 工程 1.3/1.5 ...

语音/音乐/音频论文速递 2026-06-11

语音/音乐/音频论文速递 2026-06-11 共分析 36 篇论文 ⚡ 今日概览 📥 抓取 36 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 7篇 ███████ #语音合成 7篇 ███████ #基准测试 2篇 ██ #音乐信息检索 2篇 ██ #语音情感识别 2篇 ██ #低资源 1篇 █ #音频问答 1篇 █ #音频质量评估 1篇 █ 📊 论文评分排行榜（36 篇，按分数降序）排名论文总分分档主任务 🥇 Massive Open-Vocabulary Keyword Spotting 9.8分前50% #语音识别 🥈 Tight Boundary Prediction in Speaker Diarization Using 9.6分前25% #低资源 🥉 RAIL: Rethinking Auditory Intelligence in Large Audio-L 9.6分前10% #音频问答 4. Quality Adaptive Angular Margin Learning for Respirator 9.5分前50% #音频质量评估 5. CS-YODAS: A Mined Dataset of In-the-Wild Code-Switched 9.2分前50% #多语言 6. Gumbel-BEARD: Automatic Layer Selection for Self-Superv 9.1分前25% #语音识别 7. PianoKontext: Expressive Performance Rendering from Dea 9.1分前50% #音乐生成 8. Benchmarking Neural Speech Compression from a Rate-Dist 9.0分前25% #基准测试 9. Fast-SDE: Efficient Single-Microphone Sound Source Dist 8.8分前50% - 10. Evaluating Bias in Phoneme-Based Automatic Speech Recog 8.8分前50% #语音识别 11. Real-Time Language Model Jamming: A Case Study for Live 8.7分前25% #音乐信息检索 12. HALO: Half-Frame-Rate Adaptive Learnable Operator for L 8.4分前50% #语音增强 13. The Dynamics of Human and AI-Generated Language: How Se 8.1分前25% #语音合成 14. UR-BERT: Scaling Text Encoders for Massively Multilingu 8.1分前25% #语音合成 15. SARA: A Dual-Stream VAE for High-Fidelity Speech Genera 7.9分前25% #语音合成 16. SpAArSIST: Sparsified AASIST for Efficient and Reliable 7.7分前50% #模型压缩 17. Interpreting and Steering a Text-to-Speech Language Mod 7.7分前25% #语音合成 18. Which Speech Representation Better Matches Text-Native 7.5分前50% #语音识别 19. MA-DLE: Speech-based Automatic Depression Level Estimat 7.5分前25% #语音情感识别 20. The Hidden Cost of Pairwise Verification in Synthetic S 7.5分前50% #语音合成 21. Sensitivity Analysis of Generative Spatial Audio Metric 7.2分前50% #音频生成 22. Snapping Matters: Context-Aware Onset Refinement for Au 7.1分前25% #音乐信息检索 23. Feature-Aligned Speech Watermarking for Robustness to R 7.1分前25% #鲁棒性 24. Context-Aware Multimodal Claim Verification in Spoken D 7.1分前50% #多模态模型 25. Afrispeech Semantics: Evaluating Audio Semantic Reasoni 7.0分前50% #数据集 26. Lung-SRAD: Spectral-Aware Regularized Audio DASS with D 6.8分前50% #对比学习 27. Lip Forcing: Few-Step Autoregressive Diffusion for Real 6.8分前50% #语音合成 28. Frozen Multimodal Embeddings for Personality and Cognit 6.7分前50% #语音情感识别 29. Fast Speech Foundation Model Distillation Using Interle 6.6分前50% #知识蒸馏 30. Steering Where to Listen: Instruction-Based Activation 6.5分前50% - 31. Pretrained self-supervised speech models can recognize 6.5分前50% #语音识别 32. Towards Data-free and Training-free Compression for Spe 6.4分前50% #语音识别 33. Additive Noise, Shift Recovery, and Signed Signals in t 6.1分前50% #信号处理基础 34. I Understand How You Feel: Enhancing Deeper Emotional S 5.8分前50% #语音识别 35. Overcoming State Inertia in Full-Duplex Spoken Language 5.5分前50% #基准测试 36. BadRobot: Jailbreaking Embodied LLM Agents in the Physi 5.2分后50% #语音合成 📋 论文列表 🥇 Massive Open-Vocabulary Keyword Spotting 9.8/10 | 创新 1.6/2 | 严谨 1.5/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 0.7/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-06-10

语音/音乐/音频论文速递 2026-06-10 共分析 45 篇论文 ⚡ 今日概览 📥 抓取 45 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 13篇 █████████████ #数据增强 3篇 ███ #自监督学习 2篇 ██ #语音合成 2篇 ██ #多模态模型 1篇 █ #语音对话系统 1篇 █ #语音生成 1篇 █ #参数高效微调 1篇 █ 📊 论文评分排行榜（45 篇，按分数降序）排名论文总分分档主任务 🥇 ViP-VL: Vietnamese Self-supervised Speech Pretraining M 9.7分前25% #语音识别 🥈 Spatial-Omni: Spatial Audio Understanding Integration i 9.4分前25% #多模态模型 🥉 Multi-Faceted Interactivity Alignment in Full-Duplex Sp 9.3分前25% #语音对话系统 4. OmniCap-IF: Benchmarking and Improving Instruction Foll 9.1分前25% #语音生成 5. RAT: Reference-Augmented Training for ASV Anti-Spoofing 8.8分前25% #数据增强 6. Recovering the Zipfian Distribution in Unsupervised Ter 8.7分前50% #自监督学习 7. LLM can Read Spectrogram: Encoder-free Speech-Language 8.6分前25% #语音识别 8. ParaBridge: Bridging Paralinguistic Perception and Dial 8.6分前25% #参数高效微调 9. Time-frequency localization of bird calls in dense soun 8.5分前25% #信号处理基础 10. Ethical and Technical Limits of Deepfake Speech Dataset 8.4分前25% - 11. Speech Meets ELF: Audio Conditional Continuous-Target D 8.3分前25% #语音识别 12. DeRA-MOS: Optimizing Text-to-Music Evaluation via Decou 8.2分前25% #音乐评估 13. Anchoring the Unknown: Open-Set Model Attribution via P 8.0分前25% #多语言 14. ANCHOR: Autoregressive Non-intrusive Chunk-Ordered Refi 8.0分前25% #语音质量评估 15. ContextCodec: Content-Focused Context Guidance for Ultr 7.9分前25% #语音编码 16. GlobeAudio: A Multilingual Multicultural Benchmark for 7.9分前25% #语音识别 17. Dual-Branch Gated Fusion for Open-Set Audio Deepfake So 7.8分前25% #音频深度伪造检测 18. Data Journalist Agent: Transforming Data into Verifiabl 7.7分前25% - 19. GC-LoRA: Gated Convolutional LoRA for Parameter-Efficie 7.6分前25% #语音识别 20. What Do Deepfake Speech Detectors Actually Hear? 7.6分前25% - 21. KFC-KWS: Keyframe Fusion with CTC for User-Defined Keyw 7.6分前25% #关键词检测 22. Entropy-Aware Domain-Routed Mixture-of-Experts Speech-L 7.5分前25% #语音识别 23. Linguistically Augmented Audio Speech Data (LinguAS) 7.5分后50% #语音伪造检测 24. AudioProcessBench: Benchmark for Identifying Process Er 7.5分前50% - 25. Cross-Modal Knowledge Distillation without Paired Data: 7.5分前50% #语音识别 26. AuRA: Internalizing Audio Understanding into LLMs as Lo 7.5分前25% #语音问答 27. TRADE: Transducer-Augmented Decoder for Speech LLM 7.4分前25% #语音识别 28. Inside the Latent Flow: Causal Deciphering of Attention 7.3分前50% #语音分离 29. Optimality of FSQ Tokens for Continuous Diffusion for C 7.3分前50% #语音合成 30. Speech Encoder Fusion for LLM-based Automatic Speech Re 7.2分后50% #语音识别 31. Enhancing Multilingual LLM-based ASR with Mixture of Ex 7.0分前50% - 32. Phoneme-First Prediction for LLM-Based Speech Recogniti 6.9分前50% #语音识别 33. Profy: Interpretable Visualization of Expertise-Depende 6.9分前50% #音乐信息检索 34. Optimizing 2D Input Representations and Sub-phase Fusio 6.8分前50% #数据增强 35. SSL-GMMVC: Interpretable Voice Conversion via Locally L 6.8分前50% #语音转换 36. Deploying Speech-Driven 3D Facial Animation in Unreal E 6.6分前50% #语音合成 37. RespiraMFM: A Multimodal Foundation Model with Contrast 6.5分前50% #对比学习 38. From Senses to Decisions: The Information Flow of Audit 6.5分前50% #语音识别 39. Speaker Group Encoding in Self-supervised Speech Recogn 6.5分前50% #语音识别 40. Towards Robust Arabic Speech Emotion Recognition with D 6.4分前50% #语音情感识别 41. Multilingual Word-Level Forced Alignment with Self-Supe 6.3分前50% #自监督学习 42. Overview of ESDD2: Environment-Aware Speech and Sound D 6.3分前50% #数据增强 43. Towards Deep Contextual Reasoning from Broad Descriptio 6.2分前50% #语音识别 44. A Lightweight Dual-Factor Acoustic Authentication Syste 6.0分前50% #说话人验证 45. Automated Pronunciation Evaluation for Korean Toddler S 6.0分前50% #说话人日志 📋 论文列表 🥇 ViP-VL: Vietnamese Self-supervised Speech Pretraining Model with Vector-Quantization Learning 9.7/10 | 创新 1.5/2 | 严谨 1.3/1.5 | 实验 1.3/1.5 | 清晰 1/1 | 影响 1.1/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-06-09

语音/音乐/音频论文速递 2026-06-09 共分析 48 篇论文 ⚡ 今日概览 📥 抓取 48 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音合成 10篇 ██████████ #语音识别 9篇 █████████ #自监督学习 3篇 ███ #多模态模型 3篇 ███ #语音增强 2篇 ██ #音频生成 2篇 ██ #说话人验证 2篇 ██ #大语言模型 1篇 █ 📊 论文评分排行榜（48 篇，按分数降序）排名论文总分分档主任务 🥇 A Finetuned SpeechLLM for Joint Multi-Granular L2 Asses 10.0分前25% #大语言模型 🥈 G-MaP-SE: Guided Speech Enhancement via GMM-Based Prior 9.3分前50% #语音增强 🥉 HoliDubber: Holistic Video Dubbing for Complex Acoustic 9.0分前10% #语音合成 4. Probing Token Spaces under Generator Shift in AI-Genera 9.0分前10% #音频编码 5. A Comparative Study of Pre-trained Speech Encoders and 8.9分前50% #自监督学习 6. AVI-Bench: Toward Human-like Audio-Visual Intelligence 8.8分前25% #语音识别 7. Liberating LLM Capabilities in Full-Duplex Speech Model 8.7分前25% #多模态模型 8. MeCo: One-Step MeanFlow-based Corrector for Multi-Chann 8.4分前25% #语音分离 9. Your U-Net Dereverberation Model is Secretly an RIR Enc 8.3分前50% #对比学习 10. Predictive Fixed-Filter Active Noise Control (PFANC) Us 8.3分前25% - 11. TLDR: Compressing Audio Tokens for Efficient Autoregres 8.2分前25% #语音合成 12. Subtitle-Aligned Fine-Tuning of Whisper for Swiss Germa 8.2分前25% #语音识别 13. Discovering Functionally Selective Brain Regions with a 8.2分前25% #多模态模型 14. Parameter-Efficient Continual Learning for Automatic Sp 8.1分前25% #语音识别 15. OmniMem: Perturbation-aware Memory Compression for Stre 8.0分前25% #高效推理 16. OpenBibleTTS: Large-Scale Speech Resources and TTS Mode 8.0分前25% #语音合成 17. FlashTTS: Fast Streaming TTS with MTP Acceleration and 7.9分前25% #语音合成 18. Multi-View Speech Representation Learning for Parkinson 7.9分前50% #自监督学习 19. Is Text All You Need? Text as a Universal Information B 7.6分前50% #语音识别 20. End-to-End Training for Discrete Token LLM based TTS Sy 7.6分前50% #语音合成 21. Conan-embedding-v3: Fusing Modality-Specific Models for 7.6分前25% #音频检索 22. Cross-Modal Masking for Robust Silent Speech Synthesis 7.5分前50% #语音合成 23. Rethinking Depth: A study of the Recursive-Transformer 7.5分前25% #语音识别 24. What Makes Synthetic Speech Sound Sarcastic? A Prosody- 7.5分前25% #语音合成 25. FXplorer: A Map-Based Interface for Exploratory Audio E 7.5分前25% #音频生成 26. Assessing the Energy and Carbon Emissions of Neural Spe 7.4分前50% #说话人验证 27. Exploring the Scale and Diversity of Speech Anti-spoofi 7.4分前50% #数据增强 28. From A to B to A: Palindromic Zero-Shot Voice Conversio 7.3分前50% - 29. A study on the impact of region specific data on the pe 7.2分前50% #语音识别 30. Speaker-Invariant Representation Learning for Spoofing 7.1分前25% #对抗训练 31. BareWave: Waveform-Native Flow-Matching Text-to-Speech 7.0分前50% #语音合成 32. SMC-ITA: Sequential Monte Carlo Inference-Time Alignmen 7.0分前50% #音频生成 33. Quality-Diversity Search in Sound Generation: Investiga 7.0分前50% - 34. Can LLMs understand LilyPond? A benchmark for symbolic 7.0分前50% #音乐生成 35. NüshuVoice: Reviving the Voice of Endangered Nüshu with 7.0分前50% #语音合成 36. Factors affecting ASR performance: A study using state 6.9分前50% #语音识别 37. MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice 6.9分前50% #语音转换 38. Few-shot Class-variable Incremental Audio Classificatio 6.9分前50% #音频分类 39. A Hierarchical Feature Engineering Framework for Automa 6.8分前50% - 40. Fast and Robust On-Device Speaker Diarization: Relative 6.6分前50% #说话人分离 41. On Low-Bit Quantization Errors in Speaker Verification: 6.6分前50% #说话人验证 42. Paediatric-HGNN: A Hybrid Heterogeneous Graph Neural Ne 6.5分后50% #语音合成 43. TinyGiantALM: A Compact Audio-Language Model for Intent 6.4分前50% #多模态模型 44. Overcoming Decoder Inconsistencies in Whisper for Dravi 6.2分后50% #语音识别 45. Bridging Traditional Explainability Methods and Multimo 5.4分后50% #语音识别 46. Sound Field Interpolation Using Physics-Informed Extrem 5.3分后50% #语音增强 47. A Comparison of SSL-Based Feature Extractors and Back-E 5.0分后50% #自监督学习 48. AeroSpectra Sentinel: An Auditable LLM Prompt-Chaining 4.5分后50% #音频事件检测 📋 论文列表 🥇 A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales 10.0/10 | 创新 2.0/2 | 严谨 1.5/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.5/1.5 | 开源 1.0/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-06-08

语音/音乐/音频论文速递 2026-06-08 共分析 38 篇论文 ⚡ 今日概览 📥 抓取 38 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音合成 7篇 ███████ #语音识别 6篇 ██████ #音频生成 3篇 ███ #数据增强 3篇 ███ #多模态模型 3篇 ███ #语音情感识别 2篇 ██ #音乐生成 2篇 ██ #音乐信息检索 1篇 █ 📊 论文评分排行榜（38 篇，按分数降序）排名论文总分分档主任务 🥇 Audio-Oscar: A Multi-Agent System for Complex Audio Sce 9.9分前10% #音频生成 🥈 Assessing True Generalisability of Audio-Visual Speech 9.5分前10% #语音识别 🥉 VoxCPM2 Technical Report 9.5分前50% #语音合成 4. Beyond Semantic Dominance: Cognitive Affective Reasonin 9.2分前10% #语音合成 5. Hearing the Unspoken: Language Model Priors for Acousti 9.2分前25% #语音识别 6. dots.tts Technical Report 9.0分前25% #语音合成 7. How Far Can Chord-Symbol Time-Series Adaptation Carry G 8.8分前50% #音乐信息检索 8. Where Rectified Flows Leak: Characterising Membership S 8.7分前25% #音频生成 9. BiEAR: A Human Auditory-Inspired Adaptive Binaural Fron 8.5分前25% #声源定位 10. Mitigating Proxy-to-Wild Domain Gap in Deepfake Speech 8.4分前25% #数据增强 11. Multilingual Multi-Speaker Unit Vocoders: A Systematic 8.4分前25% #语音合成 12. Geometric Second-Order Feature Correlation Learning for 7.9分前50% #语音情感识别 13. Whisper Hallucination Detection and Mitigation via Hidd 7.9分前50% #语音识别 14. Acoustic Cue Alignment in Audio Language Models for Spe 7.8分前50% #语音情感识别 15. Towards Unified Song Generation and Singing Voice Conve 7.7分前25% #语音合成 16. Phonetic Error Analysis of Raw Waveform Acoustic Models 7.6分前50% #语音识别 17. SEAM: Shortcut-Aware Real-Time Detection of Scripted vs 7.5分前25% #语音增强 18. DirectAudioEdit: Inversion-Free Text-Guided Audio Editi 7.5分前25% #扩散模型 19. MMAE: A Massive Multitask Audio Editing Benchmark 7.5分前50% #语音编辑 20. Leveraging Soft Distributions of SSL-Derived Discrete S 7.4分前50% #语音识别 21. MyGardenBird: A Machine-Learning-Ready Bird Sound Datas 7.2分前50% #音频事件检测 22. FIGMA: Towards FIne-Grained Music retrievAl 7.2分前50% #对比学习 23. KIT's Submission to Cross-Lingual Voice Cloning in 7.2分前50% #语音合成 24. Contrastive Training with LLM-generated Near-Misses for 7.1分前50% #语音识别 25. A Large-Scale Per-Speaker Analysis of Re-identification 7.1分前50% #语音匿名化 26. SVHighlights: Towards Extremely Long Sport Video Highli 7.0分前50% #多模态模型 27. TargetSEC: Plug-and-Play In-the-Wild Speech Emotion Con 6.8分前50% #语音转换 28. Making the Most of Limited Data: Score-Aware Training f 6.7分前50% #音乐生成 29. IRAF: Interference-Resilient Adaptive Fusion for Noise- 6.5分前50% #语音对话系统 30. Towards Event-Robust Acoustic Scene Classification 6.5分前50% #数据增强 31. FSC-Net: Integrating Fast Fourier Convolutions and Prog 6.4分前50% #音频质量评估 32. Watch, Remember, Reason: Human-View Video Understanding 6.4分前50% #多模态模型 33. Hierarchical Semantic-Constrained Heterogeneous Graph f 6.2分前50% #多模态模型 34. Audio Imitator: Controlling Timbre and Tempo in Video2A 6.0分前50% #音频生成 35. HybridCodec: Fast Dual-Stream, Semantically Enhanced Ne 5.7分前50% #语音合成 36. SpectCount: Spectrotemporal Counting via Synthetic Sign 5.5分前50% #数据增强 37. Entropy as a Structural Prior: How a Log-Barrier on DiT 4.2分后50% #音乐生成 38. VISA: A Visual Information Strengthened Audio-Reasoning 3.9分前50% #音频问答 📋 论文列表 🥇 Audio-Oscar: A Multi-Agent System for Complex Audio Scene Generation, Orchestration, and Refinement 9.9/10 | 创新 1.6/2 | 严谨 1.3/1.5 | 实验 1.2/1.5 | 清晰 1/1 | 影响 1.4/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.4/1.5 ...

语音/音乐/音频论文速递 2026-06-05

语音/音乐/音频论文速递 2026-06-05 共分析 47 篇论文 ⚡ 今日概览 📥 抓取 47 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 11篇 ███████████ #语音合成 6篇 ██████ #语音情感识别 3篇 ███ #大语言模型 2篇 ██ #语音增强 2篇 ██ #说话人识别 2篇 ██ #流式处理 1篇 █ #音频编码 1篇 █ 📊 论文评分排行榜（47 篇，按分数降序）排名论文总分分档主任务 🥇 Audio Interaction Model 9.8分前50% #流式处理 🥈 USAD 2.0: Scaling Representation Distillation for Unive 9.0分前25% #音频编码 🥉 M2S-AVSR: Modality-aware Multi-view Self-supervised Rep 9.0分前25% #语音识别 4. Vortex: Efficient and Programmable Sparse Attention Ser 8.9分前25% #大语言模型 5. UniVoice: A Unified Model for Speech and Singing Voice 8.7分前25% #语音合成 6. Ouvia: A User-centered Framework for Measuring Usabilit 8.6分前25% #语音翻译 7. Age-Aware Adapter Tuning for Children's Speech Reco 8.4分前25% #语音识别 8. MCBench: A Multicontext Safety Assessment Benchmark for 8.4分后50% #语音识别 9. SuperMemory-VQA: An Egocentric Visual Question-Answerin 8.4分前25% #基准测试 10. GLASS: GRPO-Trained LoRA for Acoustic Style Steering in 8.2分前25% #语音合成 11. A Model of Multi-turn Human Persuadability Using Probab 8.2分前50% - 12. Learning Emotion-discriminative Representations for Zer 8.1分前25% #语音情感识别 13. FORTE: FOL-guided Optimal Refinement for Text-audio rEt 8.1分前25% #参数高效微调 14. FiLM-Based Speaker Conditioning of a SpeechLLM for Path 8.0分前50% #语音识别 15. Task-Vector Arithmetic for Emotional Expressivity Contr 7.9分前25% #语音合成 16. An Ultra-Low-Bitrate Neural Speech Codec with Plain-to- 7.7分前25% #语音合成 17. Exploring LLMs for South Asian Music Understanding and 7.7分前50% #音乐生成 18. SB-RF: Schrödinger Bridge Rectified Flow for One-Step R 7.6分前25% #语音增强 19. nnAudio 2: Overcoming Dynamic Compilation Barriers and 7.5分前50% #开源工具 20. Beyond Waveform Robustness: Robust Feature-Vocoder Adve 7.5分前25% #语音识别 21. FoeGlass: Simple In-Context Learning Is Enough for Red 7.5分前25% #音频生成 22. ProSarc: Prosody-Aware Sarcasm Recognition Framework vi 7.5分前25% #语音情感识别 23. Probing Spatial Structure in Pretrained Audio Represent 7.4分前25% - 24. Forgive or forget: Understanding the context of hate in 7.4分前50% #音频检索 25. SpeechJBB: Probing Safety Alignment and Comprehension i 7.3分前25% #语音识别 26. VoCodec: A Low-bitrate Streamable Neural Speech Codec w 7.2分前50% #语音编码 27. F3-Tokenizer: Taming Audio Autoencoder Latents for Unde 7.2分前25% #语音合成 28. Beyond WER: A Paired Acoustic Stress Test for Ambient C 7.1分前50% #语音识别 29. InfoShield: Privacy-Preserving Speech Representations f 7.1分前50% - 30. Multi-task Learning is Not Enough: Representational Ent 6.9分前50% #语音识别 31. Sound Effects Dataset Unification With the Universal Ca 6.9分前50% #音频分类 32. To Be Multimodal or Not to Be: Query-Adaptive Audio-Vis 6.8分前50% #说话人识别 33. SHALA-LLM: Smartly Handling Ambiguous Labels in Alignin 6.8分前50% #语音情感识别 34. SagnacAssisted Enhanced OTDR for Distributed Acoustic S 6.6分前50% #信号处理基础 35. Domain-Aware Mispronunciation Detection and Diagnosis U 6.6分前50% #图神经网络 36. CoSTA: Cognitive-State-Conditioned TTS Data Augmentatio 6.5分前50% #语音合成 37. Beyond Text Following: Repairable Arbitration Reversals 6.4分前50% #音频问答 38. Enhancing Audio Captioning with Auxiliary AudioSet Sema 6.3分前50% - 39. Do speech foundation models perceive speaker similarity 6.3分前50% #说话人识别 40. Efficient Punctuation Restoration via Weighted Lookahea 6.3分前50% #大语言模型 41. Automatic Labelling of Speech Translation Errors 6.1分前50% #语音识别 42. Towards Truly Multilingual ASR: Generalizing Code-Switc 5.9分前50% #语音识别 43. An ERP Study on Recursive Locative Processing in Mandar 5.9分前50% - 44. Multilingual Detection of Alzheimer's Disease from 5.7分后50% #迁移学习 45. DBHN-Net: Dual-Branch Hybrid Neural Network For Low-Com 5.4分前25% #语音增强 46. Beyond Generative Decoding: Discriminative Hidden-State 5.3分前50% #多模态模型 47. Revisiting Lexicon Evaluation in Unsupervised Word Disc 1.0分前25% #语音识别 📋 论文列表 🥇 Audio Interaction Model 9.8/10 | 创新 1.5/2 | 严谨 1.3/1.5 | 实验 1.4/1.5 | 清晰 1.0/1 | 影响 1.5/1.5 | 开源 1.1/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-06-04

语音/音乐/音频论文速递 2026-06-04 共分析 22 篇论文 ⚡ 今日概览 📥 抓取 22 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 3篇 ███ #音频分类 2篇 ██ #音频生成 2篇 ██ #语音增强 2篇 ██ #多模态模型 1篇 █ #语音编码 1篇 █ #空间音频 1篇 █ #音乐生成 1篇 █ 📊 论文评分排行榜（22 篇，按分数降序）排名论文总分分档主任务 🥇 Multilingual Long-Form Speech Instruction Following: KI 10.0分前10% #语音识别 🥈 Drift-Augmented Scoring: Text-Derived Noise Robustness 10.0分前25% #音频分类 🥉 DetectZoo: A Unified Toolkit for AI-Generated Content D 9.3分前25% #多模态模型 4. CleanCodec: Efficient and Robust Speech Tokenization vi 8.8分前25% #语音编码 5. Read What You Hear: Reference-Free Hypotheses Evaluatio 8.6分前25% #语音识别 6. UAT: Unified Audio-Text Diffusion for Audio Generation, 8.5分前25% #音频生成 7. Flow-HOA: Generative Joint Optimization for Ambisonics 7.9分前25% #空间音频 8. Test-Time Compute Scaling for ASR with Depth-Conditione 7.8分前25% #语音识别 9. Channel-Oriented Design for EEG-to-Music Reconstruction 7.7分前25% #音乐生成 10. Entity Binding Failures in Speech LLM Reasoning: Diagno 7.5分前25% #语音问答 11. Video2LoRA: Parametric Video Internalization for Vision 7.5分前50% #参数高效微调 12. Feasibility of Time-Domain DNN-Based Speech Enhancement 7.2分前50% #语音增强 13. Differentiable Articulatory Copy-Synthesis of Biphonic 7.1分前50% #音频生成 14. The Differentiable Auditory Loop (DAL): An ML Framework 7.1分前50% #语音增强 15. Masked Wavelet Scattering Transform Neural Field for So 6.7分前50% #音频质量评估 16. SHB-AE: Spherical harmonic beamforming based Ambisonics 6.7分前50% #音频编码 17. SURF: Separation via Unsupervised Remixing Flow 6.4分前25% #无监督学习 18. Gauss Circle Lattices with Geometric Convolutions for S 6.0分前50% - 19. Plan First, Judge Later, Run Better: A DMAIC-Inspired A 5.8分前50% #工业应用 20. Representation Matters in Randomized Smoothing for Audi 5.7分前50% #音频分类 21. Neural Radiated-Noise Fields for Unmanned Underwater Ve 5.1分前50% - 22. A Second-Order Cepstral Signature of Contact-Vibration 4.8分后50% #信号处理基础 📋 论文列表 🥇 Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026 10.0/10 | 创新 1.5/2 | 严谨 1.2/1.5 | 实验 1.3/1.5 | 清晰 1/1 | 影响 1.5/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

语音/音乐/音频论文速递 2026-06-03

语音/音乐/音频论文速递 2026-06-03 共分析 40 篇论文 ⚡ 今日概览 📥 抓取 40 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音合成 7篇 ███████ #语音识别 7篇 ███████ #音乐生成 3篇 ███ #音频生成 2篇 ██ #语音增强 2篇 ██ #多模态模型 2篇 ██ #语音情感识别 2篇 ██ #语音翻译 2篇 ██ 📊 论文评分排行榜（40 篇，按分数降序）排名论文总分分档主任务 🥇 AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Ev 10.0分前10% #语音合成 🥈 Cosmos 3: Omnimodal World Models for Physical AI 10.0分前10% #音频生成 🥉 WavTTS: Towards High-Quality Zero-Shot TTS via Direct R 9.2分前25% #语音合成 4. CoughSense: Five-Class Respiratory Disease Classificati 9.1分前25% #数据增强 5. SoulX-Transcriber: A Robust End-to-End Framework for Mu 8.8分前50% #语音识别 6. SVHalluc: Benchmarking Speech-Vision Hallucination in A 8.7分前25% #语音识别 7. Benchmarking Speech-to-Speech Translation Models 8.7分前25% #语音合成 8. The DeepSpeak-Agentic Dataset 8.7分前50% #语音合成 9. EntangleCodec: A Unified Discrete Audio Tokenizer via S 8.6分前10% #语音合成 10. SketchSong: Hierarchical Song Generation with Sketch Pl 8.6分前25% #音乐生成 11. SegTune: Structured and Fine-Grained Control for Song G 8.5分前25% #音乐生成 12. Exploiting Noise Inseparability for Weakly-Supervised D 8.5分前50% #语音增强 13. A Comparison of Generative and Discriminative Methods f 8.3分前25% #语音增强 14. FSA-GRPO: Teaching Auditory LLMs to Use Few-shot Demons 8.1分前50% #语音识别 15. Tonal parsimony in chord-sequence analysis: combining m 8.1分前25% #音乐信息检索 16. Efficient ASR Training with Conversations that Never Ha 8.0分前50% #语音识别 17. LiveBand: Live Accompaniment Generation in the Audio Do 8.0分前25% #音乐生成 18. Sandboxed Coding Agents are Competitive Omni-modal Task 7.9分前25% #强化学习 19. OmniHalluc-L: Counterfactual Benchmarking and Modality- 7.8分前25% #多模态模型 20. BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR 7.8分前25% #语音识别 21. Speech Emotion Recognition using Attention-based LSTM-N 7.5分前50% #语音情感识别 22. SpeakerCard-1M: An Evidence-Grounded Speaker Card Corpu 7.4分前25% #说话人验证 23. C2GA: A Class-Controllable Generative Augmentation Fram 7.3分前50% #音频分类 24. AlignAtt4LLM: Fast AlignAtt for Decoder-Only LLMs at IW 7.3分前50% #语音翻译 25. Before Fusion, Ask What to Keep: Contextual Calibration 7.2分前50% #语音情感识别 26. Diffusion-Based Heart Sound Generation: Evaluation with 7.1分前50% #语音合成 27. SiamCTC: Learning Speech Representations through Monoto 7.0分前50% #语音识别 28. Foley-Omni: A Unified Multimodal Generation Model from 7.0分前25% #音频生成 29. Inference-Time Scaling for Joint Audio-Video Generation 6.9分前50% #语音合成 30. Breaking the Pair: Evaluating Dyadic Interaction via Sp 6.9分前50% - 31. Localizing broadband noise sources using the Loève spec 6.9分前50% #声源定位 32. A Pocket Offline Model for Simultaneous Speech Translat 6.8分前50% #语音翻译 33. Stable Hybrid Cross-Attention Fusion for Audio-Visual E 6.7分后50% #自监督学习 34. A Training-Efficient Transformer-Based Anti-Spoofing Ne 6.7分后50% #Transformer 35. MoDAl: Self-Supervised Neural Modality Discovery via De 6.6分前25% #自监督学习 36. Audio Spotforming via Post-Filtering Using Cross-Array 6.6分前50% #维纳滤波 37. Logit Distillation on Manifolds: Mapping by Learning 6.5分前50% #语音识别 38. Domain-Agnostic Incremental Learning for Sound Classifi 6.1分前50% - 39. Wavelet as Tokenizer: Preliminary Results on a Shared W 5.4分后50% #多模态模型 40. In-the-Loop Training of Deep Feedback Cancellation for 5.3分前50% #自适应滤波 📋 论文列表 🥇 AnyAudio-Judge: A Dynamic Rubric-Based Benchmark and Evaluator for Audio Instruction Following 10.0/10 | 创新 1.8/2 | 严谨 1.4/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.4/1.5 | 开源 1.3/1.5 | 复现 0.5/0.5 | 工程 1.3/1.5 ...