公平性 | 语音/音乐/音频论文速递

Responsible Benchmarking of Fairness for Automatic Speech Recognition

📄 Responsible Benchmarking of Fairness for Automatic Speech Recognition #语音识别 #基准测试 #公平性 #模型评估 #方法论 📝 5.0/10 | 前50% | #语音识别 | #基准测试 | #公平性 #模型评估 | arxiv 学术质量 5.0/8 | 影响力 0.6/2 | 可复现性 0.3/1 | 置信度高 👥 作者与机构第一作者：Felix Herron (Université Paris Dauphine-PSL, MILES Team, LAMSADE；Université Grenoble Alpes, GETALP Team, LIG) 通讯作者：未说明作者列表：Felix Herron (Université Paris Dauphine-PSL, Université Grenoble Alpes)、Ange Richard (Université Grenoble Alpes, PACTE)、François Portet (Université Grenoble Alpes)、Alexandre Allauzen (Université Paris Dauphine-PSL)、Solange Rossato (Université Grenoble Alpes, PACTE)。注：原文脚注指出 Ange Richard, François Portet, Solange Rossato 对框架中“说话人组的交叉性”和“多变量说话人组”的形成有贡献。 💡 毒舌点评本文旨在为ASR公平性评估提供一套“负责任”的方法论最佳实践。其核心价值在于系统性地整合了机器学习公平性、社会科学和语音科学领域的建议，并针对ASR场景（如说话人而非话语作为统计单元）进行了适配。案例研究部分通过对比分析（如忽略与控制交叉变量），直观地展示了方法论选择如何颠覆结论，具有警示意义。然而，作为一篇方法论文章，其主要贡献停留在“指出问题”和“提出建议”，缺乏一个经过严格验证、可直接复现的工具包或评估协议。此外，其提出的最佳实践框架本身的有效性，仅通过一个数据集（Fair-speech）的案例进行展示，普适性存疑。 ...

Towards Trustworthy Audio Deepfake Detection: A Systematic Framework for Diagnosing and Mitigating Gender Bias

📄 Towards Trustworthy Audio Deepfake Detection: A Systematic Framework for Diagnosing and Mitigating Gender Bias #音频深度伪造检测 #公平性 #语音伪造检测 #模型评估 #偏差诊断 #缓解策略 ✅ 6.5/10 | 前25% | #音频深度伪造检测 | #公平性 | #语音伪造检测 #模型评估 | arxiv 学术质量 6.5/8 | 影响力 1.8/2 | 可复现性 0.4/1 | 置信度高 👥 作者与机构第一作者：Aishwarya Fursule (School of Computing, Wichita State University, Wichita, KS, USA) 通讯作者：Anderson R. Avila (Institut national de la recherche scientifique (INRS-EMT), Montreal, QC, Canada; INRS-UQO Mixed Research Unit on Cybersecurity, Gatineau, Canada) 作者列表：Aishwarya Fursule (Wichita State University), Shruti Kshirsagar (Wichita State University), Anderson R. Avila (INRS-EMT & INRS-UQO) 📌 核心摘要要解决什么问题：音频深度伪造检测系统存在性别公平性问题，但偏差的根源未知，且缓解方法零散、未经系统性比较。论文旨在提出一个系统框架，在应用缓解策略前先精确定位偏差来源。 ...

语音/音乐/音频论文速递 2026-05-12

语音/音乐/音频论文速递 2026-05-12 共分析 39 篇论文 ⚡ 今日概览 📥 抓取 39 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 3篇 ███ #音乐生成 2篇 ██ #语音合成 2篇 ██ #语音增强 2篇 ██ #音频深度伪造检测 2篇 ██ #基准测试 2篇 ██ #语音质量评估 1篇 █ #音频编码 1篇 █ 📊 论文评分排行榜（39 篇，按分数降序）排名论文评分分档主任务 🥇 Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Mus 7.5分前30% #音乐生成 🥈 PoDAR: Power-Disentangled Audio Representation for Gene 7.3分前25% #语音合成 🥉 Evaluating the Expressive Appropriateness of Speech in 7.2分前25% #语音质量评估 4. Reducing Linguistic Hallucination in LM-Based Speech En 7.2分前25% #语音增强 5. Encoding and Decoding Temporal Signals with Spiking Ban 7.0分前25% #音频编码 6. Mitigating Multimodal Inconsistency via Cognitive Dual- 7.0分前50% #意图识别 7. SF-Flow: Sound field magnitude estimation via flow matc 6.8分前25% #空间音频 8. Probing Cross-modal Information Hubs in Audio-Visual LL 6.5分前25% #模型分析 9. Towards Trustworthy Audio Deepfake Detection: A Systema 6.5分前25% #音频深度伪造检测 10. Unison: Harmonizing Motion, Speech, and Sound for Human 6.5分前30% #音视频生成 11. CORTEG: Foundation Models Enable Cross-Modality Represe 6.5分前25% #脑机接口 12. Omni-Persona: Systematic Benchmarking and Improving Omn 6.5分前25% #基准测试 13. DiffVQE: Hybrid Diffusion Voice Quality Enhancement Und 6.2分前30% #语音增强 14. A Cold Diffusion Approach for Percussive Dereverberatio 6.2分前35% #音频修复 15. APEX: Audio Prototype EXplanations for Classification T 6.2分前25% #音频分类 16. How Should LLMs Listen While Speaking? A Study of User- 6.0分前25% #语音对话系统 17. RADAR Challenge 2026: Robust Audio Deepfake Recognition 6.0分前50% #音频深度伪造检测 18. ShipEcho – An Interactive Tool for Global Mapping of U 6.0分前25% #水下声学 19. Rethinking Entropy Minimization in Test-Time Adaptation 6.0分前40% #语音识别 20. Separate First, Fuse Later: Mitigating Cross-Modal Inte 6.0分前50% #音视频问答 21. ChladniSonify: A Visual-Acoustic Mapping Method for Chl 6.0分前50% #音频生成 22. Omni-DeepSearch: A Benchmark for Audio-Driven Omni-Moda 6.0分前25% #基准测试 23. Online Segmented Beamforming via Dynamic Programming 6.0分前25% #声源定位 24. FLARE: Full-Modality Long-Video Audiovisual Retrieval B 6.0分前25% #音频检索 25. Speech-based Psychological Crisis Assessment using LLMs 5.8分前25% #语音情感识别 26. EAR: Enhancing Uni-Modal Representations for Weakly Sup 5.8分前25% #音频事件检测 27. Kinetic-Optimal Scheduling with Moment Correction for M 5.5分前50% #语音合成 28. Dolphin-CN-Dialect: Where Chinese Dialects Matter 5.5分前50% #语音识别 29. Latent Secret Spin: Keyed Orthogonal Rotations for Blin 5.5分前50% #音频水印 30. Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote fo 5.5分前50% #语音识别 #说话人日志 31. Remix the Timbre: Diffusion-Based Style Transfer Across 5.5分前30% #音色迁移 32. Low-Cost Detection of Degraded Voice Clones via Source- 5.3分前50% #语音伪造检测 33. Single-Microphone Audio Point Source Discriminative Loc 5.0分前50% #说话人分离 34. Responsible Benchmarking of Fairness for Automatic Spee 5.0分前50% #语音识别 35. Sub-JEPA: Subspace Gaussian Regularization for Stable E 5.0分前50% #世界模型 36. AllocMV: Optimal Resource Allocation for Music Video Ge 4.8分前50% #音乐视频生成 37. Multi-layer attentive probing improves transfer of audi 4.0分中等偏上 #生物声学 #音频分类 38. Drum Synthesis from Expressive Drum Grids via Neural Au 4.0分前50% #音乐生成 39. Voice Biomarkers for Depression and Anxiety 1.0分后50% #语音生物标志物 📋 论文列表 🥇 Polyphonia: Zero-Shot Timbre Transfer in Polyphonic Music with Acoustic-Informed Attention Calibration ✅ 7.5/10 | 前30% | #音乐生成 | #扩散模型 | #注意力机制 #零样本 | arxiv ...

语音/音乐/音频论文速递 2026-05-03

语音/音乐/音频论文速递 2026-05-03 共分析 13 篇语音/AI 论文 🎯 任务分类点击任务标签查看该方向所有论文：音乐信息检索（2篇）语音识别（2篇）音频生成（1篇）发音错误检测（1篇）说话人识别（1篇）音乐理解（1篇）音频场景理解（1篇）语音质量评估（1篇）语音对话系统（1篇）音频问答（1篇）音频事件检测（1篇） ⚡ 今日概览 📥 抓取 13 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音乐信息检索 2篇 ██ #语音识别 2篇 ██ #音频生成 1篇 █ #发音错误检测 1篇 █ #说话人识别 1篇 █ #音乐理解 1篇 █ #音频场景理解 1篇 █ #语音质量评估 1篇 █ 📊 论文评分排行榜（13 篇，按分数降序）排名论文评分分档主任务 🥇 UniSonate: A Unified Model for Speech, Music, and Sound 8.5分前25% #音频生成 🥈 Beyond Acoustic Sparsity and Linguistic Bias: A Prompt- 8.5分前25% #发音错误检测 🥉 DM-ASR: Diarization-aware Multi-speaker ASR with Large 8.0分前25% #说话人识别 4. Transformer-Based Rhythm Quantization of Performance MI 8.0分前25% #音乐信息检索 5. Audio Effect Estimation with DNN-Based Prediction and S 8.0分前25% #音乐理解 6. Listening with Time: Precise Temporal Awareness for Lon 8.0分前25% #音频场景理解 7. TTS-PRISM: A Perceptual Reasoning and Interpretable Spe 7.5分前25% #语音质量评估 8. Spectrographic Portamento Gradient Analysis: A Quantita 7.5分前25% #音乐信息检索 9. Advancing automatic speech recognition using feature fu 7.0分前25% #语音识别 10. Identifying and typifying demographic unfairness in pho 7.0分前50% #语音识别 11. Full-Duplex Interaction in Spoken Dialogue Systems: A C 6.5分前25% #语音对话系统 12. Audio Video Verbal Analysis (AVVA) for Capturing Classr 6.0分前50% #音频问答 13. Earable Platform with Integrated Simultaneous EEG Sensi 5.5分后50% #音频事件检测 📋 论文列表 🥇 UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions 🔥 8.5/10 | 前25% | #音频生成 | #流匹配 | #扩散模型 #统一音频模型 | arxiv ...

Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models

📄 Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models #语音识别 #自监督学习 #公平性 #模型评估 #音素 ✅ 7.0/10 | 前50% | #语音识别 | #自监督学习 | #公平性 #模型评估 | arxiv 学术质量 5.5/7 | 选题价值 1.5/2 | 复现加成 0.0 | 置信度中 👥 作者与机构第一作者：Felix Herron（MILES Team, LAMSADE, Université Paris Dauphine-PSL, France & GETALP Team, LIG, Université Grenoble Alpes, France）通讯作者：未说明（论文未明确标注，但通常为末位作者或提供邮箱者，此处作者邮箱为felix.herron@univ-grenoble-alpes.fr）作者列表： Felix Herron（Université Paris Dauphine-PSL & Université Grenoble Alpes） Solange Rossato（Université Grenoble Alpes） Alexandre Allauzen（Université Paris Dauphine-PSL） François Portet（Université Grenoble Alpes） 💡 毒舌点评亮点在于将ASR不公平性问题分解为可度量的“系统性偏差”和“随机方差”两种几何形态，为诊断模型失败模式提供了清晰的理论工具箱；然而，整篇论文更像是对现有模型的一次全面“体检报告”，指出了病灶（尤其是高方差问题）却并未开出有效的“处方”，所验证的公平性增强方法（DET/DAT）也未能触及核心，这使得研究在建设性上略显乏力。 ...

Identity Leakage Through Accent Cues in Voice Anonymisation

📄 Identity Leakage Through Accent Cues in Voice Anonymisation #语音匿名化 #隐私保护 #公平性 #口音识别 #模型评估 ✅ 7.0/10 | 前50% | #语音匿名化 | #模型评估 | #隐私保护 #公平性学术质量 5.5/7 | 选题价值 1.5/2 | 复现加成 0.0 | 置信度中 👥 作者与机构第一作者：Rayane Bakari（Orange Innovation, France; EURECOM, Sophia Antipolis, France）通讯作者：未说明作者列表：Rayane Bakari (Orange Innovation, EURECOM), Olivier Le Blouch (Orange Innovation), Nicolas Gengembre (Orange Innovation), Nicholas Evans (EURECOM), Michele Panariello (EURECOM) 💡 毒舌点评亮点：论文敏锐地抓住了语音匿名化评估中一个关键盲点——非时域线索（口音）的残留风险，并系统性地利用多种嵌入（时域、非时域、口音相关）和攻击场景进行量化分析，逻辑严谨，论证有力，提出的公平性问题也很有价值。短板：对于其提出的改进方案B4*，分析略显“止步于现象”，缺乏对其内部机制（字符级条件反射如何具体抑制口音线索）的深入解构或对比消融；此外，实验部分因部分参赛系统代码不可用，导致对比不够完整，削弱了结论的普适性。 ...

Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models

📄 Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models #语音识别 #自监督学习 #公平性 #模型评估 #音素 ✅ 7.0/10 | 前50% | #语音识别 | #自监督学习 | #公平性 #模型评估 | arxiv 学术质量 5.5/7 | 选题价值 1.5/2 | 复现加成 0.0 | 置信度中 👥 作者与机构第一作者：Felix Herron（MILES Team, LAMSADE, Université Paris Dauphine-PSL, France & GETALP Team, LIG, Université Grenoble Alpes, France）通讯作者：未说明（论文未明确标注，但通常为末位作者或提供邮箱者，此处作者邮箱为felix.herron@univ-grenoble-alpes.fr）作者列表： Felix Herron（Université Paris Dauphine-PSL & Université Grenoble Alpes） Solange Rossato（Université Grenoble Alpes） Alexandre Allauzen（Université Paris Dauphine-PSL） François Portet（Université Grenoble Alpes） 💡 毒舌点评亮点在于将ASR不公平性问题分解为可度量的“系统性偏差”和“随机方差”两种几何形态，为诊断模型失败模式提供了清晰的理论工具箱；然而，整篇论文更像是对现有模型的一次全面“体检报告”，指出了病灶（尤其是高方差问题）却并未开出有效的“处方”，所验证的公平性增强方法（DET/DAT）也未能触及核心，这使得研究在建设性上略显乏力。 ...

语音/音乐/音频论文速递 2026-04-27

语音/音乐/音频论文速递 2026-04-27 共分析 13 篇论文 ⚡ 今日概览 📥 抓取 13 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #音乐信息检索 2篇 ██ #语音识别 2篇 ██ #音频生成 1篇 █ #发音错误检测 1篇 █ #说话人识别 1篇 █ #音乐理解 1篇 █ #音频场景理解 1篇 █ #语音质量评估 1篇 █ 📊 论文评分排行榜（13 篇，按分数降序）排名论文评分分档主任务 🥇 UniSonate: A Unified Model for Speech, Music, and Sound 8.5分前25% #音频生成 🥈 Beyond Acoustic Sparsity and Linguistic Bias: A Prompt- 8.5分前25% #发音错误检测 🥉 DM-ASR: Diarization-aware Multi-speaker ASR with Large 8.0分前25% #说话人识别 4. Transformer-Based Rhythm Quantization of Performance MI 8.0分前25% #音乐信息检索 5. Audio Effect Estimation with DNN-Based Prediction and S 8.0分前25% #音乐理解 6. Listening with Time: Precise Temporal Awareness for Lon 8.0分前25% #音频场景理解 7. TTS-PRISM: A Perceptual Reasoning and Interpretable Spe 7.5分前25% #语音质量评估 8. Spectrographic Portamento Gradient Analysis: A Quantita 7.5分前25% #音乐信息检索 9. Advancing automatic speech recognition using feature fu 7.0分前25% #语音识别 10. Identifying and typifying demographic unfairness in pho 7.0分前50% #语音识别 11. Full-Duplex Interaction in Spoken Dialogue Systems: A C 6.5分前25% #语音对话系统 12. Audio Video Verbal Analysis (AVVA) for Capturing Classr 6.0分前50% #音频问答 13. Earable Platform with Integrated Simultaneous EEG Sensi 5.5分后50% #音频事件检测 📋 论文列表 🥇 UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions 🔥 8.5/10 | 前25% | #音频生成 | #流匹配 | #扩散模型 #统一音频模型 | arxiv ...