基准测试 | 语音/音乐/音频论文速递

AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs

📄 AVI-Bench: Toward Human-like Audio-Visual Intelligence of Omni-MLLMs #语音识别 #多模态模型 #基准测试 8.8/10 | 创新 1.7/2 | 严谨 1.2/1.5 | 实验 1.4/1.5 | 清晰 1/1 | 影响 1.3/1.5 | 开源 0.5/1.5 | 复现 0.5/0.5 | 工程 1.2/1.5 🔥 8.8/10 | 前25% | #语音识别 | #多模态模型 | #基准测试 | arxiv 👥 作者与机构作者：Yaoting Wang, Ziyi Zhang, Wenming Tu, Shaoxuan Xu, Wenjie Du, Cheng Liang, Weijun Wang, Yuanchao Li, Guangyao Li, Hao Fei, Yuanchun Li, Henghui Ding†, Yunxin Liu 机构：未在文中明确列出所有作者所属机构，但项目网站为 fudancvl.github.io，可能关联复旦大学视觉与学习实验室。 ...

Can LLMs understand LilyPond? A benchmark for symbolic music generation and understanding

📄 Can LLMs understand LilyPond? A benchmark for symbolic music generation and understanding #音乐生成 #音乐理解 #基准测试 #大语言模型 7/10 | 创新 1.5/2 | 严谨 1/1.5 | 实验 1/1.5 | 清晰 1/1 | 影响 0.5/1.5 | 开源 1/1.5 | 复现 0.5/0.5 | 工程 0.5/1.5 ✅ 7/10 | 前50% | #音乐生成 | #音乐理解 | #基准测试 #大语言模型 | arxiv 👥 作者与机构 Matteo Spanio, Mohammad Torabi, Andrea Poltronieri, Antonio Rodà。主要机构：Centro di Sonologia Computazionale, University of Padova, Italy；Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain。 ...

OpenBibleTTS: Large-Scale Speech Resources and TTS Models for Low-Resource Languages

📄 OpenBibleTTS: Large-Scale Speech Resources and TTS Models for Low-Resource Languages #语音合成 #低资源 #数据集 #模型评估 #流匹配 #语音生成 #基准测试 8/10 | 创新 1.4/2 | 严谨 1.2/1.5 | 实验 1.3/1.5 | 清晰 1/1 | 影响 0.8/1.5 | 开源 1/1.5 | 复现 0.5/0.5 | 工程 0.8/1.5 🔥 8/10 | 前25% | #语音合成 | #低资源 | #数据集 #模型评估 | arxiv 👥 作者与机构 David Guzmán1,2, Luel Hagos Beyene3,4, Jesujoba Oluwadara Alabi5, Yejin Jeon1,2, Dietrich Klakow5, David Ifeoluwa Adelani1,2,6 1 McGill University 2 Mila - Quebec AI Institute 3 AIMS Research and Innovation Centre 4 NM-AIST 5 Saarland University 6 Canada CIFAR AI Chair ...

语音/音乐/音频论文速递 2026-06-09

语音/音乐/音频论文速递 2026-06-09 共分析 48 篇论文 ⚡ 今日概览 📥 抓取 48 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音合成 10篇 ██████████ #语音识别 9篇 █████████ #自监督学习 3篇 ███ #多模态模型 3篇 ███ #语音增强 2篇 ██ #音频生成 2篇 ██ #说话人验证 2篇 ██ #大语言模型 1篇 █ 📊 论文评分排行榜（48 篇，按分数降序）排名论文总分分档主任务 🥇 A Finetuned SpeechLLM for Joint Multi-Granular L2 Asses 10.0分前25% #大语言模型 🥈 G-MaP-SE: Guided Speech Enhancement via GMM-Based Prior 9.3分前50% #语音增强 🥉 HoliDubber: Holistic Video Dubbing for Complex Acoustic 9.0分前10% #语音合成 4. Probing Token Spaces under Generator Shift in AI-Genera 9.0分前10% #音频编码 5. A Comparative Study of Pre-trained Speech Encoders and 8.9分前50% #自监督学习 6. AVI-Bench: Toward Human-like Audio-Visual Intelligence 8.8分前25% #语音识别 7. Liberating LLM Capabilities in Full-Duplex Speech Model 8.7分前25% #多模态模型 8. MeCo: One-Step MeanFlow-based Corrector for Multi-Chann 8.4分前25% #语音分离 9. Your U-Net Dereverberation Model is Secretly an RIR Enc 8.3分前50% #对比学习 10. Predictive Fixed-Filter Active Noise Control (PFANC) Us 8.3分前25% - 11. TLDR: Compressing Audio Tokens for Efficient Autoregres 8.2分前25% #语音合成 12. Subtitle-Aligned Fine-Tuning of Whisper for Swiss Germa 8.2分前25% #语音识别 13. Discovering Functionally Selective Brain Regions with a 8.2分前25% #多模态模型 14. Parameter-Efficient Continual Learning for Automatic Sp 8.1分前25% #语音识别 15. OmniMem: Perturbation-aware Memory Compression for Stre 8.0分前25% #高效推理 16. OpenBibleTTS: Large-Scale Speech Resources and TTS Mode 8.0分前25% #语音合成 17. FlashTTS: Fast Streaming TTS with MTP Acceleration and 7.9分前25% #语音合成 18. Multi-View Speech Representation Learning for Parkinson 7.9分前50% #自监督学习 19. Is Text All You Need? Text as a Universal Information B 7.6分前50% #语音识别 20. End-to-End Training for Discrete Token LLM based TTS Sy 7.6分前50% #语音合成 21. Conan-embedding-v3: Fusing Modality-Specific Models for 7.6分前25% #音频检索 22. Cross-Modal Masking for Robust Silent Speech Synthesis 7.5分前50% #语音合成 23. Rethinking Depth: A study of the Recursive-Transformer 7.5分前25% #语音识别 24. What Makes Synthetic Speech Sound Sarcastic? A Prosody- 7.5分前25% #语音合成 25. FXplorer: A Map-Based Interface for Exploratory Audio E 7.5分前25% #音频生成 26. Assessing the Energy and Carbon Emissions of Neural Spe 7.4分前50% #说话人验证 27. Exploring the Scale and Diversity of Speech Anti-spoofi 7.4分前50% #数据增强 28. From A to B to A: Palindromic Zero-Shot Voice Conversio 7.3分前50% - 29. A study on the impact of region specific data on the pe 7.2分前50% #语音识别 30. Speaker-Invariant Representation Learning for Spoofing 7.1分前25% #对抗训练 31. BareWave: Waveform-Native Flow-Matching Text-to-Speech 7.0分前50% #语音合成 32. SMC-ITA: Sequential Monte Carlo Inference-Time Alignmen 7.0分前50% #音频生成 33. Quality-Diversity Search in Sound Generation: Investiga 7.0分前50% - 34. Can LLMs understand LilyPond? A benchmark for symbolic 7.0分前50% #音乐生成 35. NüshuVoice: Reviving the Voice of Endangered Nüshu with 7.0分前50% #语音合成 36. Factors affecting ASR performance: A study using state 6.9分前50% #语音识别 37. MeanVC 2: Robust Low-Latency Streaming Zero-Shot Voice 6.9分前50% #语音转换 38. Few-shot Class-variable Incremental Audio Classificatio 6.9分前50% #音频分类 39. A Hierarchical Feature Engineering Framework for Automa 6.8分前50% - 40. Fast and Robust On-Device Speaker Diarization: Relative 6.6分前50% #说话人分离 41. On Low-Bit Quantization Errors in Speaker Verification: 6.6分前50% #说话人验证 42. Paediatric-HGNN: A Hybrid Heterogeneous Graph Neural Ne 6.5分后50% #语音合成 43. TinyGiantALM: A Compact Audio-Language Model for Intent 6.4分前50% #多模态模型 44. Overcoming Decoder Inconsistencies in Whisper for Dravi 6.2分后50% #语音识别 45. Bridging Traditional Explainability Methods and Multimo 5.4分后50% #语音识别 46. Sound Field Interpolation Using Physics-Informed Extrem 5.3分后50% #语音增强 47. A Comparison of SSL-Based Feature Extractors and Back-E 5.0分后50% #自监督学习 48. AeroSpectra Sentinel: An Auditable LLM Prompt-Chaining 4.5分后50% #音频事件检测 📋 论文列表 🥇 A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales 10.0/10 | 创新 2.0/2 | 严谨 1.5/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 1.5/1.5 | 开源 1.0/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

Assessing True Generalisability of Audio-Visual Speech Recognisers

📄 Assessing True Generalisability of Audio-Visual Speech Recognisers #语音识别 #自监督学习 #多模态模型 #基准测试 9.5/10 | 创新 1.6/2 | 严谨 1.3/1.5 | 实验 1.4/1.5 | 清晰 1/1 | 影响 1.4/1.5 | 开源 1.1/1.5 | 复现 0.5/0.5 | 工程 1.2/1.5 🔥 9.5/10 | 前10% | #语音识别 | #自监督学习 | #多模态模型 #基准测试 | arxiv 👥 作者与机构作者：Zhaofeng Lin, Stavros Petridis, Maja Pantic, Naomi Harte 机构：1 Trinity College Dublin, Ireland；2 Imperial College London, UK 💡 毒舌点评这篇论文根本不是在发明一个“更好”的AVSR模型，而是在无情地揭露当前AVSR领域集体自嗨的泡沫。它本质上是一篇“基准测评”论文，却起到了比很多模型创新论文更重要的作用——戳穿了LRS3基准带来的虚假繁荣。它的核心贡献是“破”而非“立”，但这种“破”恰恰是领域健康发展所急需的。作者通过近乎偏执的严谨方法（构建严格分布匹配的MV2LRS3集），得出了一个令人尴尬的结论：我们引以为傲的AVSR模型，在离开精心维护的LRS3温室后，表现得一塌糊涂。多模态融合不仅没帮忙，反而成了拖累。最讽刺的是，论文名为“评估真实泛化能力”，但其结论反而揭示了“泛化”这个概念本身在当前AVSR研究中可能被过度推广和误用。这篇论文应该被每一个致力于提升AVSR性能的研究者放在案头，用来审视自己工作的实际意义，而不是仅仅在LRS3刷榜。 ...

MMAE: A Massive Multitask Audio Editing Benchmark

📄 MMAE: A Massive Multitask Audio Editing Benchmark #语音编辑 #多任务学习 #基准测试 7.5/10 | 创新 1.5/2 | 严谨 1/1.5 | 实验 1/1.5 | 清晰 1/1 | 影响 1/1.5 | 开源 0.5/1.5 | 复现 0.5/0.5 | 工程 1/1.5 ✅ 7.5/10 | 前50% | #语音编辑 | #多任务学习 | #基准测试 | arxiv 👥 作者与机构论文作者众多，包括Ziyang Ma, Ruiqi Yan, Ruiyang Xu等30余人。论文未明确说明所有作者的具体所属机构。 💡 毒舌点评这篇论文的工作定位有点尴尬。作为一个“Benchmark”论文，它自称是“首个全面的评估测试台”，但问题在于：1. 它只提出了一个数据集和一个评估框架，并没有提出新的模型或算法。这本质上是一篇“资源论文”，而非“方法论文”，在顶会上通常处于竞争劣势。2. 从给出的实验结果看（EMR<5%，复杂任务为0%），它更像是一份给现有模型判了“死刑”的诊断报告，但报告本身并未提供“治病”的方法。3. 论文最大的价值在于定义了问题空间（7种模态、6级复杂度、8种操作）和评估标准，但这种分类工作是否具备足够的洞察力和普适性，值得怀疑。它更像是一个庞大的工程分类清单，而非深刻的科学发现。4. 论文声称解决了“评估基础设施滞后”的问题，但评估框架（基于rubric分解为17,741个标准）的具体设计、验证过程和有效性论证在摘要中完全缺失，使得其核心贡献的可靠性存疑。5. “人机协作”构建数据集的具体流程未说明，是让人标注还是用模型生成后校对？这直接影响数据质量。总的来说，这是一篇工整但平庸的资源型论文，缺乏让顶会审稿人眼前一亮的理论或技术火花。 📌 核心摘要 MMAE是一个面向通用指令式音频编辑的大规模多任务评测基准。它旨在解决当前音频编辑评估体系分散、局限于简单任务和特定子领域的不足。基准包含2000个高保真样本，涵盖7种音频模态，并建立了一个从基本修改到多轮推理的6级任务复杂度分类体系。其核心创新在于提出了一种基于评分标准（rubric）的评估框架，将开放式任务分解为数万个可验证的标准，以精确评估模型的指令遵循和上下文一致性。对现有模型的评估表明，当前系统在精确编辑方面表现极差，暴露了显著的性能瓶颈。 🔗 开源详情代码：论文中声称发布了基于Python的评估框架，但未提供任何具体的代码仓库链接（如GitHub URL）。模型权重：论文未提及发布任何模型权重。本文是介绍一个评测基准，而非新训练的模型。数据集：论文中声称包含2,000个样本，但未提供任何具体的下载链接或数据托管页面（如HuggingFace， ModelScope）。 Demo：未提及在线演示链接。复现材料：未提及。论文中引用的开源项目：论文提到了“Nano-banana 2”和“Gemini-Omni”作为相关工作的例子，但未提供这些项目的具体链接或完整名称。 🏗️ 方法概述和架构 MMAE并非一个算法模型，而是一个用于评测音频编辑模型的基准体系。其核心方法架构包含两个紧密耦合的部分：任务与数据的分类体系，以及配套的自动化评估框架。 ...

语音/音乐/音频论文速递 2026-06-08

语音/音乐/音频论文速递 2026-06-08 共分析 38 篇论文 ⚡ 今日概览 📥 抓取 38 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音合成 7篇 ███████ #语音识别 6篇 ██████ #音频生成 3篇 ███ #数据增强 3篇 ███ #多模态模型 3篇 ███ #语音情感识别 2篇 ██ #音乐生成 2篇 ██ #音乐信息检索 1篇 █ 📊 论文评分排行榜（38 篇，按分数降序）排名论文总分分档主任务 🥇 Audio-Oscar: A Multi-Agent System for Complex Audio Sce 9.9分前10% #音频生成 🥈 Assessing True Generalisability of Audio-Visual Speech 9.5分前10% #语音识别 🥉 VoxCPM2 Technical Report 9.5分前50% #语音合成 4. Beyond Semantic Dominance: Cognitive Affective Reasonin 9.2分前10% #语音合成 5. Hearing the Unspoken: Language Model Priors for Acousti 9.2分前25% #语音识别 6. dots.tts Technical Report 9.0分前25% #语音合成 7. How Far Can Chord-Symbol Time-Series Adaptation Carry G 8.8分前50% #音乐信息检索 8. Where Rectified Flows Leak: Characterising Membership S 8.7分前25% #音频生成 9. BiEAR: A Human Auditory-Inspired Adaptive Binaural Fron 8.5分前25% #声源定位 10. Mitigating Proxy-to-Wild Domain Gap in Deepfake Speech 8.4分前25% #数据增强 11. Multilingual Multi-Speaker Unit Vocoders: A Systematic 8.4分前25% #语音合成 12. Geometric Second-Order Feature Correlation Learning for 7.9分前50% #语音情感识别 13. Whisper Hallucination Detection and Mitigation via Hidd 7.9分前50% #语音识别 14. Acoustic Cue Alignment in Audio Language Models for Spe 7.8分前50% #语音情感识别 15. Towards Unified Song Generation and Singing Voice Conve 7.7分前25% #语音合成 16. Phonetic Error Analysis of Raw Waveform Acoustic Models 7.6分前50% #语音识别 17. SEAM: Shortcut-Aware Real-Time Detection of Scripted vs 7.5分前25% #语音增强 18. DirectAudioEdit: Inversion-Free Text-Guided Audio Editi 7.5分前25% #扩散模型 19. MMAE: A Massive Multitask Audio Editing Benchmark 7.5分前50% #语音编辑 20. Leveraging Soft Distributions of SSL-Derived Discrete S 7.4分前50% #语音识别 21. MyGardenBird: A Machine-Learning-Ready Bird Sound Datas 7.2分前50% #音频事件检测 22. FIGMA: Towards FIne-Grained Music retrievAl 7.2分前50% #对比学习 23. KIT's Submission to Cross-Lingual Voice Cloning in 7.2分前50% #语音合成 24. Contrastive Training with LLM-generated Near-Misses for 7.1分前50% #语音识别 25. A Large-Scale Per-Speaker Analysis of Re-identification 7.1分前50% #语音匿名化 26. SVHighlights: Towards Extremely Long Sport Video Highli 7.0分前50% #多模态模型 27. TargetSEC: Plug-and-Play In-the-Wild Speech Emotion Con 6.8分前50% #语音转换 28. Making the Most of Limited Data: Score-Aware Training f 6.7分前50% #音乐生成 29. IRAF: Interference-Resilient Adaptive Fusion for Noise- 6.5分前50% #语音对话系统 30. Towards Event-Robust Acoustic Scene Classification 6.5分前50% #数据增强 31. FSC-Net: Integrating Fast Fourier Convolutions and Prog 6.4分前50% #音频质量评估 32. Watch, Remember, Reason: Human-View Video Understanding 6.4分前50% #多模态模型 33. Hierarchical Semantic-Constrained Heterogeneous Graph f 6.2分前50% #多模态模型 34. Audio Imitator: Controlling Timbre and Tempo in Video2A 6.0分前50% #音频生成 35. HybridCodec: Fast Dual-Stream, Semantically Enhanced Ne 5.7分前50% #语音合成 36. SpectCount: Spectrotemporal Counting via Synthetic Sign 5.5分前50% #数据增强 37. Entropy as a Structural Prior: How a Log-Barrier on DiT 4.2分后50% #音乐生成 38. VISA: A Visual Information Strengthened Audio-Reasoning 3.9分前50% #音频问答 📋 论文列表 🥇 Audio-Oscar: A Multi-Agent System for Complex Audio Scene Generation, Orchestration, and Refinement 9.9/10 | 创新 1.6/2 | 严谨 1.3/1.5 | 实验 1.2/1.5 | 清晰 1/1 | 影响 1.4/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.4/1.5 ...

SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory

📄 SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory #基准测试 #数据集 8.4/10 | 创新 1.4/2 | 严谨 1.2/1.5 | 实验 1.2/1.5 | 清晰 1/1 | 影响 0.8/1.5 | 开源 1.3/1.5 | 复现 0.5/0.5 | 工程 1/1.5 🔥 8.4/10 | 前25% | #基准测试 | #数据集 | arxiv 👥 作者与机构作者：Samiul Alam, Shakhrul Iman Siam, Michael J. Proulx, James Fort, Richard Newcombe, Hyo Jin Kim, Mi Zhang 机构：俄亥俄州立大学 (The Ohio State University), Meta ...

语音/音乐/音频论文速递 2026-06-05

语音/音乐/音频论文速递 2026-06-05 共分析 47 篇论文 ⚡ 今日概览 📥 抓取 47 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 11篇 ███████████ #语音合成 6篇 ██████ #语音情感识别 3篇 ███ #大语言模型 2篇 ██ #语音增强 2篇 ██ #说话人识别 2篇 ██ #流式处理 1篇 █ #音频编码 1篇 █ 📊 论文评分排行榜（47 篇，按分数降序）排名论文总分分档主任务 🥇 Audio Interaction Model 9.8分前50% #流式处理 🥈 USAD 2.0: Scaling Representation Distillation for Unive 9.0分前25% #音频编码 🥉 M2S-AVSR: Modality-aware Multi-view Self-supervised Rep 9.0分前25% #语音识别 4. Vortex: Efficient and Programmable Sparse Attention Ser 8.9分前25% #大语言模型 5. UniVoice: A Unified Model for Speech and Singing Voice 8.7分前25% #语音合成 6. Ouvia: A User-centered Framework for Measuring Usabilit 8.6分前25% #语音翻译 7. Age-Aware Adapter Tuning for Children's Speech Reco 8.4分前25% #语音识别 8. MCBench: A Multicontext Safety Assessment Benchmark for 8.4分后50% #语音识别 9. SuperMemory-VQA: An Egocentric Visual Question-Answerin 8.4分前25% #基准测试 10. GLASS: GRPO-Trained LoRA for Acoustic Style Steering in 8.2分前25% #语音合成 11. A Model of Multi-turn Human Persuadability Using Probab 8.2分前50% - 12. Learning Emotion-discriminative Representations for Zer 8.1分前25% #语音情感识别 13. FORTE: FOL-guided Optimal Refinement for Text-audio rEt 8.1分前25% #参数高效微调 14. FiLM-Based Speaker Conditioning of a SpeechLLM for Path 8.0分前50% #语音识别 15. Task-Vector Arithmetic for Emotional Expressivity Contr 7.9分前25% #语音合成 16. An Ultra-Low-Bitrate Neural Speech Codec with Plain-to- 7.7分前25% #语音合成 17. Exploring LLMs for South Asian Music Understanding and 7.7分前50% #音乐生成 18. SB-RF: Schrödinger Bridge Rectified Flow for One-Step R 7.6分前25% #语音增强 19. nnAudio 2: Overcoming Dynamic Compilation Barriers and 7.5分前50% #开源工具 20. Beyond Waveform Robustness: Robust Feature-Vocoder Adve 7.5分前25% #语音识别 21. FoeGlass: Simple In-Context Learning Is Enough for Red 7.5分前25% #音频生成 22. ProSarc: Prosody-Aware Sarcasm Recognition Framework vi 7.5分前25% #语音情感识别 23. Probing Spatial Structure in Pretrained Audio Represent 7.4分前25% - 24. Forgive or forget: Understanding the context of hate in 7.4分前50% #音频检索 25. SpeechJBB: Probing Safety Alignment and Comprehension i 7.3分前25% #语音识别 26. VoCodec: A Low-bitrate Streamable Neural Speech Codec w 7.2分前50% #语音编码 27. F3-Tokenizer: Taming Audio Autoencoder Latents for Unde 7.2分前25% #语音合成 28. Beyond WER: A Paired Acoustic Stress Test for Ambient C 7.1分前50% #语音识别 29. InfoShield: Privacy-Preserving Speech Representations f 7.1分前50% - 30. Multi-task Learning is Not Enough: Representational Ent 6.9分前50% #语音识别 31. Sound Effects Dataset Unification With the Universal Ca 6.9分前50% #音频分类 32. To Be Multimodal or Not to Be: Query-Adaptive Audio-Vis 6.8分前50% #说话人识别 33. SHALA-LLM: Smartly Handling Ambiguous Labels in Alignin 6.8分前50% #语音情感识别 34. SagnacAssisted Enhanced OTDR for Distributed Acoustic S 6.6分前50% #信号处理基础 35. Domain-Aware Mispronunciation Detection and Diagnosis U 6.6分前50% #图神经网络 36. CoSTA: Cognitive-State-Conditioned TTS Data Augmentatio 6.5分前50% #语音合成 37. Beyond Text Following: Repairable Arbitration Reversals 6.4分前50% #音频问答 38. Enhancing Audio Captioning with Auxiliary AudioSet Sema 6.3分前50% - 39. Do speech foundation models perceive speaker similarity 6.3分前50% #说话人识别 40. Efficient Punctuation Restoration via Weighted Lookahea 6.3分前50% #大语言模型 41. Automatic Labelling of Speech Translation Errors 6.1分前50% #语音识别 42. Towards Truly Multilingual ASR: Generalizing Code-Switc 5.9分前50% #语音识别 43. An ERP Study on Recursive Locative Processing in Mandar 5.9分前50% - 44. Multilingual Detection of Alzheimer's Disease from 5.7分后50% #迁移学习 45. DBHN-Net: Dual-Branch Hybrid Neural Network For Low-Com 5.4分前25% #语音增强 46. Beyond Generative Decoding: Discriminative Hidden-State 5.3分前50% #多模态模型 47. Revisiting Lexicon Evaluation in Unsupervised Word Disc 1.0分前25% #语音识别 📋 论文列表 🥇 Audio Interaction Model 9.8/10 | 创新 1.5/2 | 严谨 1.3/1.5 | 实验 1.4/1.5 | 清晰 1.0/1 | 影响 1.5/1.5 | 开源 1.1/1.5 | 复现 0.5/0.5 | 工程 1.5/1.5 ...

DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities

📄 DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities #多模态模型 #自监督学习 #数据集 #基准测试 9.3/10 | 创新 1.5/2 | 严谨 1.2/1.5 | 实验 1.5/1.5 | 清晰 1/1 | 影响 0.8/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.3/1.5 🔥 9.3/10 | 前25% | #多模态模型 | #自监督学习 | #数据集 #基准测试 | arxiv 👥 作者与机构 Sajad Ebrahimi, Nima Jamali, Bardia Shirsalimian, Kelly McConvey, Wentao Zhang, Jalehsadat Mahdavimoghaddam, Maksym Taranukhin, Maura Grossman, Vered Shwartz, Yuntian Deng, Ebrahim Bagheri University of Toronto, University of Waterloo, Toronto Metropolitan University, University of British Columbia, Vector Institute ...