Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy
📄 Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 E-VAds: An E-commerce Short Videos Understanding Benchmark for MLLMs 🔥 8.0/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 EchoingPixels: Aliasing-Resistant Joint Token Reduction for Audio-Visual LLMs ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability ✅ 6.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 FakeWorld 1.0: An Omni modal Benchmark for Fake Media and Content 📝 3.5/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 FoeGlass: When Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors 🔥 8.0/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping 📝 4.3/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection ✅ 6.8/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 Group Cognition Learning: Making Everything Better Through Controlled Two-Stage Agents Collaboration ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递