论文速递 | 语音/音乐/音频论文速递

Polyphonia: Training-Free Context-Aware Music Editing with Acoustic-Informed Attention Calibration

📄 Polyphonia: Training-Free Context-Aware Music Editing with Acoustic-Informed Attention Calibration ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Position: Beyond Text The Text-Centric Bias in Foundation Models Must Be Revisited for a Speech-First Future

📄 Position: Beyond Text The Text-Centric Bias in Foundation Models Must Be Revisited for a Speech-First Future ✅ 6.8/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Position: Towards Responsible Evaluation for Text-to-Speech

📄 Position: Towards Responsible Evaluation for Text-to-Speech 📝 2.6/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

PRIM：Cooperative Dynamic Token Compression for Efficient Large Multimodal Models

📄 PRIM：Cooperative Dynamic Token Compression for Efficient Large Multimodal Models 📝 4.8/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

ProactiveLLM: Learning Active Interaction for Streaming Large Language Models

📄 ProactiveLLM: Learning Active Interaction for Streaming Large Language Models ✅ 6.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Probing Cross-modal Information Hubs in Audio-Visual LLMs

📄 Probing Cross-modal Information Hubs in Audio-Visual LLMs 📝 5.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Quaternion Self-Attention with Shared Scores

📄 Quaternion Self-Attention with Shared Scores ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Query-Based Asymmetric Modeling with Decoupled Input–Output Rates for Speech Restoration

📄 Query-Based Asymmetric Modeling with Decoupled Input–Output Rates for Speech Restoration ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Real-World Unsupervised Models Generalize to Predict Brain Responses to Out-of-Distribution Stimuli

📄 Real-World Unsupervised Models Generalize to Predict Brain Responses to Out-of-Distribution Stimuli ✅ 7.8/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

📄 Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递