Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox
📄 Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox ✅ 6.9/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox ✅ 6.9/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation ✅ 7.8/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 E-VAds: An E-commerce Short Videos Understanding Benchmark for MLLMs 🔥 8.0/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 EchoingPixels: Aliasing-Resistant Joint Token Reduction for Audio-Visual LLMs ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 Evaluating and Rewarding LALMs for Expressive Role-Play TTS via Mean Continuation Log-Probability ✅ 6.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 FakeWorld 1.0: An Omni modal Benchmark for Fake Media and Content 📝 3.5/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 FoeGlass: When Simple In-Context Learning Is Enough for Red Teaming Audio Deepfake Detectors 🔥 8.0/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 From Inpainting to Editing: Unlocking Robust Mask-Free Visual Dubbing via Generative Bootstrapping 📝 4.3/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 From Talking to Singing: A New Challenge for Audio-Visual Deepfake Detection ✅ 6.8/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递