ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents

📄 ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents #基准测试 #模型评估 #多模态模型 #大语言模型 #动态环境 ✅ 7.0/10 | 前25% | #基准测试 | #模型评估 | #多模态模型 #大语言模型 | arxiv 学术质量 6.5/7 | 选题价值 2.0/2 | 复现加成 0.5 | 置信度 高 👥 作者与机构 第一作者:Fanqing Meng (Evolvent AI, National University of Singapore) - 根据论文附录,其有*号标记为共同贡献者。 通讯作者:Mengkang Hu†, Michael Qizhe Shieh† (Evolvent AI, National University of Singapore) - 根据论文附录,其有†号标记为通讯作者。 作者列表:Fanqing Meng (Evolvent AI, National University of Singapore), Lingxiao Du (National University of Singapore), Zijian Wu (National University of Singapore), Guanzheng Chen (National University of Singapore), Xiangyan Liu (National University of Singapore), Jiaqi Liao (Independent Researcher), Chonghe Jiang (Massachusetts Institute of Technology), Zhenglin Wan (National University of Singapore), Jiawei Gu (University of Washington), Pengfei Zhou (National University of Singapore), Rui Huang (The University of Hong Kong), Ziqi Zhao (The Hong Kong Polytechnic University), Shengyuan Ding (Fudan University), Ailing Yu (Independent Researcher), Bo Peng (Shanghai Jiao Tong University), Bowei Xia (University of Electronic Science and Technology of China), Hao Sun (Peking University), Haotian Liang (University of Science and Technology of China), Ji Xie (Zhejiang University), Jiajun Chen (National University of Singapore), Jiajun Song (Renmin University of China), Liu Yang (The Hong Kong Polytechnic University), Ming Xu (National University of Singapore), Qionglin Qiu (Hunan University), Runhao Fu (Anhui University), Shengfang Zhai (National University of Singapore), Shijian Wang (Southeast University), Tengfei Ma (The Chinese University of Hong Kong), Tianyi Wu (National University of Singapore), Weiyang Jin (The University of Hong Kong), Yan Wang (Tongji University), Yang Dai (National University of Singapore), Yao Lai (The University of Hong Kong), Youwei Shu (National University of Singapore), Yue Liu (National University of Singapore), Yunzhuo Hao (Zhejiang University), Yuwei Niu (Peking University), Jinkai Huang (Evolvent AI, National University of Singapore), Jiayuan Zhuo (Evolvent AI, National University of Singapore), Zhennan Shen (The Hong Kong University of Science and Technology), Linyu Wu (National University of Singapore), Cihang Xie (University of California, Santa Cruz), Yuyin Zhou (University of California, Santa Cruz), Jiaheng Zhang (National University of Singapore), Zeyu Zheng (University of California, Berkeley), Mengkang Hu (Evolvent AI, National University of Singapore), Michael Qizhe Shieh (Evolvent AI, National University of Singapore)。 💡 毒舌点评 亮点:提出了一个设计极其严谨、评估维度(多天、动态环境、全模态)全面且完全杜绝“LLM当裁判”评分模糊性的智能体基准测试,填补了重要空白。短板:作为基准测试,其本身不产出新的模型或算法,对推动模型能力提升的作用是间接的;且100个任务的规模对于构建稳健的排行榜可能稍显不足。 ...

2026-04-29

语音/音频论文速递 2026-04-29

语音/音频论文速递 2026-04-29 共分析 29 篇论文 ⚡ 今日概览 📥 抓取 29 篇 → 🔬 深度分析完成 🏷️ 热门方向 方向 数量 分布 #基准测试 4篇 ████ #多模态模型 3篇 ███ #语音情感识别 3篇 ███ #语音识别 3篇 ███ #语音对话系统 2篇 ██ #音乐生成 2篇 ██ #生成模型 1篇 █ #频谱测绘 1篇 █ 📊 论文评分排行榜(28 篇,按分数降序) 排名 论文 评分 分档 主任务 🥇 Cutscene Agent: An LLM Agent Framework for Automated 3D 8.5分 前25% #生成模型 🥈 Accelerating Regularized Attention Kernel Regression fo 8.5分 前25% #频谱测绘 🥉 Nemotron 3 Nano Omni: Efficient and Open Multimodal Int 8.5分 前25% #多模态模型 4. Step-Audio-R1.5 Technical Report 8.0分 前25% #语音对话系统 5. Praxy Voice: Voice-Prompt Recovery + BUPS for Commercia 8.0分 前25% #语音合成 6. ML-SAN: Multi-Level Speaker-Adaptive Network for Emotio 8.0分 前25% #语音情感识别 7. Unrequited Emotions: Investigating the Gaps in Motivati 8.0分 前25% #语音情感识别 8. UNet-Based Fusion and Exponential Moving Average Adapta 7.5分 前25% #说话人验证 9. Walking Through Uncertainty: An Empirical Study of Unce 7.5分 前25% #音频问答 10. ASAP: An Azimuth-Priority Strip-Based Search Approach t 7.5分 前25% #声源定位 11. Mutual Forcing: Dual-Mode Self-Evolution for Fast Autor 7.5分 前25% #音频生成 12. SymphonyGen: 3D Hierarchical Orchestral Generation with 7.5分 前25% #音乐生成 13. PSP: An Interpretable Per-Dimension Accent Benchmark fo 7.5分 前25% #基准测试 14. RAS: a Reliability Oriented Metric for Automatic Speech 7.5分 前25% #语音识别 15. Robust Accent Identification via Voice Conversion and N 7.5分 前25% #语音识别 16. Independent-Component-Based Encoding Models of Brain Ac 7.5分 前25% #神经编码 17. Beyond Isolated Utterances: Cue-Guided Interaction for 7.5分 前25% #多模态模型 18. Mitigating Shared-Private Branch Imbalance via Dual-Bra 7.5分 前25% #多模态模型 19. MMEB-V3: Measuring the Performance Gaps of Omni-Modalit 7.5分 前25% #基准测试 20. Human-1 by Josh Talks: A Full-Duplex Conversational Mod 7.5分 前50% #语音对话系统 21. ClawMark: A Living-World Benchmark for Multi-Turn, Mult 7.0分 前25% #基准测试 22. The Structured Output Benchmark: A Multi-Source Benchma 7.0分 前25% #基准测试 23. WhisperPipe: A Resource-Efficient Streaming Architectur 6.5分 前50% #语音识别 24. S-SONDO: Self-Supervised Knowledge Distillation for Gen 6.5分 前25% #音频分类 25. Monitoring exposure-length variations in submarine powe 6.5分 前50% #音频事件检测 26. Generative UI as an Accessibility Bridge: Lessons from 6.5分 前50% #无障碍 27. Korean aegyo speech shows systematic F1 increase to sig 6.0分 前50% #语音情感识别 28. Huí Sù: Co-constructing a Dual Feedback Apparatus 5.5分 后50% #音乐生成 29 Cross-Linguistic Rhythmic and Spectral Feature-Based An N/A - - 📋 论文列表 🥇 Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation 🔥 8.5/10 | 前25% | #生成模型 | #大语言模型 | #多模态 #模型评估 | arxiv ...

2026-04-29