LightAVSeg: Lightweight Audio-Visual Segmentation
📄 LightAVSeg: Lightweight Audio-Visual Segmentation ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 LightAVSeg: Lightweight Audio-Visual Segmentation ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 Listening Through the Noise: Cauchy-Driven Diffusion Bridges for Robust Gastrointestinal Auscultation and Clinical Benchmarking ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 Long Grounded Thoughts: Synthesizing Grounded Visual Problems and Distilling Reasoning Chains at Scale ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 LynX: Token Interface Alignment for Video+X LLMs #** #Video #LLMs #Token #Interface #Alignment #多模态整合 #流形对齐 #单模态数据 ✅ 7.5/10 | 前25% | #** | #Video | #LLMs #Token | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 MECAT: A Multi-Experts Constructed Benchmark for Fine-Grained Audio Understanding Tasks ✅ 7.2/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 MetaBio: Learning from metadata for bioacoustics foundation models ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 MFCL Audio: An Audio Function Calling Evaluation for Large Language Models 📝 3.0/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models 📝 3.8/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递
📄 MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递