Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training

📄 Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training ✅ 7.8/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

Polyphonia: Training-Free Context-Aware Music Editing with Acoustic-Informed Attention Calibration

📄 Polyphonia: Training-Free Context-Aware Music Editing with Acoustic-Informed Attention Calibration ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 20 words

Position: *Beyond Text* The Text-Centric Bias in Foundation Models Must Be Revisited for a Speech-First Future

📄 Position: Beyond Text The Text-Centric Bias in Foundation Models Must Be Revisited for a Speech-First Future ✅ 6.8/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 27 words

Position: Towards Responsible Evaluation for Text-to-Speech

📄 Position: Towards Responsible Evaluation for Text-to-Speech 📝 2.6/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 17 words

PRIM:Cooperative Dynamic Token Compression for Efficient Large Multimodal Models

📄 PRIM:Cooperative Dynamic Token Compression for Efficient Large Multimodal Models 📝 4.8/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 20 words

ProactiveLLM: Learning Active Interaction for Streaming Large Language Models

📄 ProactiveLLM: Learning Active Interaction for Streaming Large Language Models ✅ 6.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 20 words

Probing Cross-modal Information Hubs in Audio-Visual LLMs

📄 Probing Cross-modal Information Hubs in Audio-Visual LLMs 📝 5.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 18 words

Quaternion Self-Attention with Shared Scores

📄 Quaternion Self-Attention with Shared Scores ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 16 words

Query-Based Asymmetric Modeling with Decoupled Input–Output Rates for Speech Restoration

📄 Query-Based Asymmetric Modeling with Decoupled Input–Output Rates for Speech Restoration ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 21 words

Real-World Unsupervised Models Generalize to Predict Brain Responses to Out-of-Distribution Stimuli

📄 Real-World Unsupervised Models Generalize to Predict Brain Responses to Out-of-Distribution Stimuli ✅ 7.8/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words