Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy

📄 Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 20 words

E-VAds: An E-commerce Short Videos Understanding Benchmark for MLLMs

📄 E-VAds: An E-commerce Short Videos Understanding Benchmark for MLLMs 🔥 8.0/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 20 words

EchoingPixels: Aliasing-Resistant Joint Token Reduction for Audio-Visual LLMs

📄 EchoingPixels: Aliasing-Resistant Joint Token Reduction for Audio-Visual LLMs ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 19 words

Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability

📄 Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability ✅ 6.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 23 words

FakeWorld 1.0: An Omni modal Benchmark for Fake Media and Content

📄 FakeWorld 1.0: An Omni modal Benchmark for Fake Media and Content 📝 3.5/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

FoeGlass: When Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors

📄 FoeGlass: When Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors 🔥 8.0/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 24 words

From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping

📄 From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping 📝 4.3/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 23 words

From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection

📄 From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection ✅ 6.8/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

📄 FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 21 words

Group Cognition Learning: Making Everything Better Through Controlled Two-Stage Agents Collaboration

📄 Group Cognition Learning: Making Everything Better Through Controlled Two-Stage Agents Collaboration ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words