FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

📄 FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 21 words

Group Cognition Learning: Making Everything Better Through Controlled Two-Stage Agents Collaboration

📄 Group Cognition Learning: Making Everything Better Through Controlled Two-Stage Agents Collaboration ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

Hearing Without Noticing? Attention-Aware Stealthy Black-box Adversarial Audio Attacks

📄 Hearing Without Noticing? Attention-Aware Stealthy Black-box Adversarial Audio Attacks ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 20 words

Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio

📄 Hidden in Plain Tokens: Simply Robust, Gradient-Free Watermark for Synthetic Audio ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection

📄 HyperPotter: Spell the Charm of High-Order Interactions in Audio Deepfake Detection ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

INFER: Learning Implicit Neural Frequency Response Fields for Confined Acoustic Environments

📄 INFER: Learning Implicit Neural Frequency Response Fields for Confined Acoustic Environments 🔥 8.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

IVQ: Structured and Lightweight Vector Quantization via Binary Hierarchical Composition Inspired by IChing

📄 IVQ: Structured and Lightweight Vector Quantization via Binary Hierarchical Composition Inspired by IChing 📝 3.2/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 24 words

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

📄 JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

Joint Enhancement and Classification using Coupled Diffusion Models of Signals and Logits

📄 Joint Enhancement and Classification using Coupled Diffusion Models of Signals and Logits ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 23 words

LALM-as-a-Judge: Benchmarking Large Audio-Language Models for Safety Evaluation in Multi-Turn Spoken Dialogues

📄 LALM-as-a-Judge: Benchmarking Large Audio-Language Models for Safety Evaluation in Multi-Turn Spoken Dialogues 📝 3.5/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 23 words