STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits

📄 STARCaster: Spatio-Temporal AutoRegressive Video Diffusion for Identity- and View-Aware Talking Portraits 📝 3.5/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage

📄 Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 23 words

T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation

📄 T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 18 words

tau-Voice: Benchmarking Full-Duplex Voice Agents on Real-World Domains

2026-05-23 · 更新于 2026-06-19 · 0 min · 0 words

TextME: Bridging Unseen Modalities Through Text Descriptions

📄 TextME: Bridging Unseen Modalities Through Text Descriptions ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 18 words

The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

📄 The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 25 words

TMD-Bench: A Multi-Level Evaluation Paradigm for Music–Dance Co-Generation

📄 TMD-Bench: A Multi-Level Evaluation Paradigm for Music–Dance Co-Generation ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 19 words

Towards Understanding Modality Interaction in Multimodal Language Models via Partial Information Decomposition

📄 Towards Understanding Modality Interaction in Multimodal Language Models via Partial Information Decomposition 📝 4.5/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 23 words

Two-dimensional quantization for geometry-aware audio coding

📄 Two-dimensional quantization for geometry-aware audio coding ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 17 words

Unlocking Speech–Text Compositional Powers: Instruction-Following Speech Language Models without Instruction Tuning

📄 Unlocking Speech–Text Compositional Powers: Instruction-Following Speech Language Models without Instruction Tuning 📝 5.8/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words