医疗音频 | 语音/音乐/音频论文速递

Multimodal Digital Biomarker for Asthma: Complementary Roles of Vocal, Clinical and Demographic Factors

📄 Multimodal Digital Biomarker for Asthma: Complementary Roles of Vocal, Clinical and Demographic Factors 标签：#语音属性识别 #多模态模型 #可解释性 #基准测试 #医疗音频 #自监督学习 5.8/10 | 创新 1.2/2 | 严谨 1/1.5 | 实验 1/1.5 | 清晰 0.8/1 | 影响 0.5/1.5 | 开源 0/1.5 | 复现 0.3/0.5 | 工程 1/1.5 📝 5.8/10 | 前50% | 文档类型：方法研究 | 评分置信度：高 | #语音属性识别 | #模型融合 | #多模态模型 #可解释性 | arxiv 👥 作者与机构第一作者：Vladimir Despotovic (Luxembourg Institute of Health, Bioinformatics & AI, Department of Medical Informatics) 通讯作者：论文中未明确说明作者列表：Vladimir Despotovic (Luxembourg Institute of Health), Milena Despotovic (Luxembourg Institute of Health), Abir Elbeji (Luxembourg Institute of Health), Petr V. Nazarov (Luxembourg Institute of Health), Guy Fagherazzi (Luxembourg Institute of Health) 💡 毒舌点评这篇论文的亮点在于将成熟的多模态Mixture-of-Experts架构系统性地应用于语音生物标志物，并结合了两种互补的语音任务和丰富的临床数据，且对门控机制的解释性分析做得相对扎实。主要短板在于整个工作的创新性高度依赖于MoE框架的工程化应用而非方法本身，且核心贡献——数据集和模型完全未开源，严重限制了其影响力和可复现性，使其更像一份详尽的可行性报告而非突破性研究。此外，其声称的“首次”应用值得推敲，因为MoE在其他临床多模态数据中已有探索。 ...

语音/音乐/音频论文速递 2026-07-10

语音/音乐/音频论文速递 2026-07-10 共分析 19 篇论文 ⚡ 今日概览 📥 抓取 19 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 4篇 ████ #音乐转录 2篇 ██ #语音质量评估 2篇 ██ #多模态模型 2篇 ██ #音乐生成 1篇 █ #音频事件检测 1篇 █ #语音分离 1篇 █ #语音情感识别 1篇 █ 📊 论文评分排行榜（19 篇，按分数降序）排名论文总分分档文档类型主任务 🥇 A Quantized Native Runtime for On-Device Semantic Audio 8.4分前25% 系统技术报告 #音乐生成 🥈 MuScriptor: An Open Model for Multi-Instrument Music Tr 8.3分前25% 系统技术报告 #音乐转录 🥉 A Self-Supervised Approach for Minimal-Annotation Hydro 8.3分前25% 系统技术报告 #音频事件检测 4. COALA: Robust Contextualized Speech-augmented Language 8.2分前25% 方法研究 #语音识别 5. PS4: Proxy-Supervised Joint Training for Real Target Sp 8.0分前25% 系统技术报告 #语音分离 6. MulTTiPop: A Multitrack Transcription Dataset for Pop M 7.7分前25% 数据集与基准 #音乐转录 7. SHAP-Weighted Cross-Modal Expert Fusion for Emotion and 7.7分前25% 方法研究 #语音情感识别 8. When Synthetic Speech Is All You Have: Better Call GRPO 7.7分前25% 方法研究 #语音识别 9. Structural Bottlenecks on Frequency Representation in E 7.6分前25% 方法研究 #音频生成 10. A Reliability Assessment of LALM Audio Judges for Full- 7.1分前50% 系统技术报告 #语音质量评估 11. Inverse-designed meta processing units for multi-task n 6.9分前50% 系统技术报告 #音频理解 12. Multimodal Unlearning Across Vision, Language, Video, a 6.9分前50% 综述 #多模态模型 13. Best-of-\(N\) TTS Evaluation is Confounded by ASR Family 6.7分前50% 方法研究 #语音质量评估 14. Why Do You Say It Like That? A Phoneme-Level Framework 6.5分前50% 方法研究 #语音伪造检测 15. It Takes Few to TANGO: A Quantized Distributed Model fo 6.5分前50% 系统技术报告 #语音增强 16. On the Role of Conversational Timing in Synthetic Train 6.4分前50% 方法研究 #语音识别 17. Diarization-Guided Qwen-ASR Adaptation for Multilingual 5.7分前50% 系统技术报告 #语音识别 18. Multimodal Digital Biomarker for Asthma: Complementary 5.3分后50% 应用研究 #多模态模型 19. Vidu S1: A Real-Time Interactive Video Generation Model 5.2分后50% 系统技术报告 #音视频交互 📋 论文列表 🥇 A Quantized Native Runtime for On-Device Semantic Audio Generation 8.4/10 | 创新 1.3/2 | 严谨 1.3/1.5 | 实验 1/1.5 | 清晰 0.8/1 | 影响 1.2/1.5 | 开源 1/1.5 | 复现 0.3/0.5 | 工程 1.5/1.5 ...

Uncovering Latent Depression Severity for Binary Depression Detection via Advantage-weighting Ranking

📄 Uncovering Latent Depression Severity for Binary Depression Detection via Advantage-weighting Ranking #音视频理解 #对比学习 #医疗音频 #多模态模型 7/10 | 创新 1.5/2 | 严谨 1.2/1.5 | 实验 1.3/1.5 | 清晰 0.9/1 | 影响 1.2/1.5 | 开源 0/1.5 | 复现 0.4/0.5 | 工程 0.5/1.5 ✅ 7/10 | 前50% | #音视频理解 | #对比学习 | #医疗音频 #多模态模型 | arxiv 👥 作者与机构第一作者：Manning Gao（华南师范大学）通讯作者：未明确标注（推测为 Sijie Mai，华南师范大学，根据常见通讯作者惯例）作者列表：Manning Gao（华南师范大学）、Tingyi Liu（华南师范大学）、Leheng Zhang（华南师范大学）、Haifeng Hu（中山大学）、Yuncheng Jiang（华南师范大学）、Sijie Mai（华南师范大学） 💡 毒舌点评该工作抓住了二元抑郁检测中粗粒度标签丢失连续严重度信息的痛点，将排序学习引入基于音视频的自动抑郁检测，idea 有洞察力。BAR Loss 通过动态优势加权聚焦难样本，实验设计也较为扎实。但核心方法始终在成对损失框架内修修补补，学理深度有限，且作者完全不提供代码、模型或数据集链接，在严重依赖开源和快速复现的顶会语境下，这种封闭姿态会极大削弱社区信任与实际影响力。 ...

语音/音乐/音频论文速递 2026-07-08

语音/音乐/音频论文速递 2026-07-08 共分析 26 篇论文 ⚡ 今日概览 📥 抓取 26 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音属性识别 3篇 ███ #音频分类 3篇 ███ #语音合成 3篇 ███ #语音识别 3篇 ███ #声源定位 2篇 ██ #音乐生成 2篇 ██ #语音交互 1篇 █ #音频事件检测 1篇 █ 📊 论文评分排行榜（26 篇，按分数降序）排名论文总分分档主任务 🥇 Hierarchical Acoustic-Semantic Modeling: Modality Separ 9.2分前10% #语音交互 🥈 Propose and Attend: Training-free MLLM Grounding Confid 8.2分前25% #音频事件检测 🥉 Music I Care About: Automated Multimodal Benchmarking o 7.8分前25% #音乐理解 4. Escaping the Procrustean Bed: Groupwise Orthogonal Conn 7.8分前25% #语音属性识别 5. TriA Pipeline: A Large-Scale Automatic Audio Annotation 7.4分前50% #音频分类 6. InsideSSL: Understanding Self-Supervised Speech Represe 7.4分前50% #语音属性识别 7. Precise Video-to-Audio Generation with Cross-Modal Alig 7.4分前50% #音视频生成 8. WordVoice: Explicit and Decoupled Multi-Dimensional Wor 7.2分前50% #语音合成 9. ForestIR: Physics-Informed Forest Sound Simulation for 7.2分前50% #声源定位 10. Uncovering Latent Depression Severity for Binary Depres 7.0分前50% #音视频理解 11. Determinantal point process sampling for bioacoustic ac 6.9分前50% #音频分类 12. From Sinhala to Dhivehi: Cross-Lingual Transfer Learnin 6.6分前50% #语音识别 13. Goodbye Equal Error Rate, Hello Local Information Discl 6.5分前50% #语音转换 14. BlueMagpie-TTS: A Token-Efficient Tokenizer, Language M 6.5分前50% #语音合成 15. Fréchet Distance Loss on Speech Representations for Tex 6.5分前50% #语音合成 16. NAVER LABS System Re-implementation for the IWSLT 2026 6.4分前50% #语音翻译 17. Few-Shot Class-Incremental Audio Classification Using P 6.3分前50% #音频分类 18. Gemma 4 Technical Report 6.2分前50% #语音识别 19. Revisiting the Relation Between Language Model Perplexi 6.0分前50% #语音识别 20. Multimodal Video-to-Music Recommendation via Semantic R 5.4分后50% #音乐检索 21. Designing Maintainable Hybrid Generative Systems: A Qua 5.3分后50% #音乐生成 22. Learning-based Physics-Constrained Neural Kernel for So 5.2分后50% #声源定位 23. Distributed Multichannel Wiener Filtering for Topology- 5.1分后50% #语音增强 24. Flow Matching-Based Speech Source Separation with Best- 4.9分后50% #语音分离 25. Umm… With Transformers? Insights from Filled Pause Us 4.8分后50% #语音属性识别 26. From Textural Counterpoint to Feature Encoding: A Multi 2.1分后50% #音乐生成 📋 论文列表 🥇 Hierarchical Acoustic-Semantic Modeling: Modality Separation and Semantic Coherence for Full-Duplex SLMs 9.2/10 | 创新 1.8/2 | 严谨 1.2/1.5 | 实验 1.1/1.5 | 清晰 0.8/1 | 影响 1.3/1.5 | 开源 1.2/1.5 | 复现 0.3/0.5 | 工程 1.5/1.5 ...

CaReCoS: A Spectrogram based Visual Benchmark for Cardiac, Respiratory and Cough Sounds

📄 CaReCoS: A Spectrogram based Visual Benchmark for Cardiac, Respiratory and Cough Sounds #音频理解 #基准测试 #医疗音频 #多模态模型 #模型评估 6/10 | 创新 1.2/2 | 严谨 0.8/1.5 | 实验 1/1.5 | 清晰 0.8/1 | 影响 0.5/1.5 | 开源 0.2/1.5 | 复现 0.3/0.5 | 工程 1.2/1.5 ✅ 6/10 | 前50% | #音频理解 | #提示学习 | #基准测试 #医疗音频 | arxiv 👥 作者与机构第一作者：Harshit Rajgarhia（未说明）通讯作者：未说明作者列表：Harshit Rajgarhia（未说明）、Shuubham Ojha（未说明）、Akhil Pothanapalli（未说明）、Rachuri Lokesh（未说明）、Asif Shaik（未说明）、Abhishek Mukherji（未说明）、Prasanna Desikan（未说明） 💡 毒舌点评论文首次将医学心肺咳嗽声的频谱图作为视觉输入进行多模态推理评测，明确揭示当前顶尖视觉与全能模型在该任务上近乎“全军覆没”（最高仅51.2%），视角新颖且问题尖锐。但整个基准的真相由Gemini 3 Flash自动生成且未经任何临床专家验证，评判同样依赖大模型，这构成了“用大模型评测大模型”的循环依赖，可靠性令人高度不安；同时代码与QA数据集均未开源，社区几乎无法复现或在此基础上推进，本质上是一篇用闭源模型揭示闭源模型缺陷的“空中楼阁”式研究。 ...

语音/音乐/音频论文速递 2026-07-07

语音/音乐/音频论文速递 2026-07-07 共分析 58 篇论文 ⚡ 今日概览 📥 抓取 58 篇 → 🔬 深度分析完成 🏷️ 热门方向方向数量分布 #语音识别 11篇 ███████████ #语音伪造检测 5篇 █████ #音频理解 4篇 ████ #语音交互 3篇 ███ #音频事件检测 3篇 ███ #语音转换 3篇 ███ #音视频理解 3篇 ███ #语音合成 3篇 ███ 📊 论文评分排行榜（58 篇，按分数降序）排名论文总分分档主任务 🥇 Doppelganger: Sound Effects and Their Synthetic Twins 9.1分前10% #音频检索 🥈 SPEARBench: A Benchmark for Naturalness Evaluation in S 8.9分前25% #语音交互 🥉 Metronome: Bound the Cache, Keep the Beat for Real-Time 8.7分前25% #语音交互 4. Auto-AEG: Scalable Data Construction for Open-Vocabular 8.3分前25% #音频事件检测 5. RABBiT: Rapidly adaptive BOLD foundation model via brai 8.1分前25% #音频理解 6. TRACE-EVC: Text-Guided Relative Affective Control for Z 8.0分前25% #语音转换 7. Parallelized Autoregressive Decoding for Omni-Modal Den 8.0分前25% #音视频理解 8. Speaker-Disentangled Chunk-Wise Regression for Syllabic 7.9分前25% #语音编码 9. Speaker-Aware Temporal Aggregation Strategies on Segmen 7.9分前25% #语音属性识别 10. REDDIT: Correcting Model-Generated Timestamp Drift in A 7.8分前25% #语音识别 11. Deriving Benchmarking Datasets from Long-Form Recording 7.7分前25% #基准测试 12. ProPS: Prompted Profile Synthesis for Natural Language- 7.6分前25% #语音合成 13. DELTA-TTS: Adapting Autoregressive Model into Diffusion 7.5分前25% #语音合成 14. TokAN: Accent Normalization Using Self-Supervised Speec 7.5分前25% #语音转换 15. Listen, Think, Transcribe: Continuous Latent Test-Time 7.5分前25% #语音识别 16. \(C^3\)ASD: Multi-Level Consistency-Driven Representation 7.5分前25% #音视频理解 17. Training-Free Model Selection and Domain-Aware Score Ca 7.3分前50% #音频事件检测 18. CHILDES-Aligned: A Curated Children's Speech Datase 7.2分前50% #语音识别 19. Taste-aware music retrieval from audio embeddings 6.9分前50% #音乐检索 20. Lights, Camera, Carbon: Architectural Scaling Laws for 6.9分前50% #音视频生成 21. Unified Audio Intelligence Without Regressing on Text I 6.8分前50% #音频交互 22. Ranking the Impact of Contextual Specialization in Neur 6.7分前50% #语音增强 23. SynSFX: Multi-Model Sound Effects Synthesis Dataset for 6.5分前50% #音频伪造检测 24. Evaluating the Effect of Linguistic Relatedness on Cros 6.5分前50% #语音识别 25. MOSAIC: Interpretable Multi-Token Cross-Attention of Bi 6.3分前50% #语音伪造检测 26. CARD: Cross-component Audio Representation Distillation 6.3分前50% #音频字幕生成 27. Probing Low-Level Acoustic Attribute Encoding in CLAP A 6.2分前50% #音频理解 28. Trajectory Variance: AnUnsupervised Measure of Developm 6.2分前50% #音频理解 29. Adaptive Diversity-Uncertainty Active Learning with Red 6.2分前50% #音频事件检测 30. Adaptive Loss Balancing for Multi-Task Bioacoustic Clas 6.1分前50% #音频分类 31. An Intervention-Based Framework for Shortcut Diagnosis 6.1分前50% #语音伪造检测 32. QuaSR: Quality-Aware Sample Reweighting for Pacific Ind 6.0分前50% #语音识别 33. CaReCoS: A Spectrogram based Visual Benchmark for Cardi 6.0分前50% #音频理解 34. Open-Set Source Tracing as Compositional Factors via St 6.0分前50% #语音伪造检测 35. Context-Aware ASR for Mandarin Technical Lectures 6.0分前50% #语音识别 36. Streaming Neural Speech Codecs through Time-Invariant R 6.0分前50% #语音编码 37. Physiological Noise Augmentation Improves Non-Invasive 6.0分前50% #语音识别 38. DuplexChat: Constructing Speaker-Separated Full-Duplex 5.9分前50% #语音交互 39. Noisy Environment Adaptation of Neural Speech Codec via 5.9分前50% #语音增强 40. NouveauVoice: Generating Novel Pseudo Speakers for Voic 5.9分前50% #语音转换 41. OmniFocus: Query-Guided Modality-Balanced Token Compres 5.9分前50% #音视频问答 42. Jointly Improving Dialect Identification and ASR in Ind 5.8分前50% #语音识别 43. S-DiverSe: Spanish Diverse Speech 5.8分前50% #语音识别 44. Towards Robust Uncertainty-Aware Speaker Modeling 5.7分前50% #说话人验证 45. Towards Language-Agnostic Speech Inversion 5.6分前50% #语音属性识别 46. Layer-wise Cross-Lingual Depression Detection from Spee 5.5分前50% #语音情感识别 47. Wan-Streamer v0.2: Higher Resolution, Same Latency 5.4分后50% #音视频交互 48. Mixture-Constrained Max Pooling Improves Separation-Bas 5.3分后50% #音频分类 49. Reinforcement Learning for Data-Efficient Code-Switched 5.3分后50% #语音识别 50. Physics-Informed Direction-of-Arrival Estimation Over D 5.3分后50% #声源定位 51. Sampling Bias Compensation for Robust Evaluation of Aud 4.9分后50% #音频分类 52. UniSkip-Mamba: A Frequency-Aware State Space Model for 4.8分后50% #音视频理解 53. Progressive Refinement: An Iterative Pseudo-Labeling Ap 4.6分后50% #语音识别 54. Weakly Guided and Autoregressive Beamformer Parameteriz 4.3分后50% #语音分离 55. DETECT-3B-Omni is Agnostic of Content and Demographics 4.2分后50% #语音伪造检测 56. Towards Digital Preservation of Efik: TTS for a Low-Res 4.0分后50% #语音合成 57. Quantum-Inspired Harmonic Decision Models: A Computatio 2.3分后50% #音乐生成 58. Information-Geometric Superposed Vowel Evaluation: Part 1.9分后50% #语音伪造检测 📋 论文列表 🥇 Doppelganger: Sound Effects and Their Synthetic Twins 9.1/10 | 创新 1.5/2 | 严谨 1.4/1.5 | 实验 1.4/1.5 | 清晰 0.9/1 | 影响 0.8/1.5 | 开源 1.5/1.5 | 复现 0.5/0.5 | 工程 1.1/1.5 ...

Language Model Augmented Semi-Supervised Statistical Inference

📄 Language Model Augmented Semi-Supervised Statistical Inference #语音属性识别 #大语言模型 #少样本 #医疗音频 #理论分析 5.4/10 | 创新 1.3/2 | 严谨 1.2/1.5 | 实验 0.7/1.5 | 清晰 0.8/1 | 影响 0.4/1.5 | 开源 0/1.5 | 复现 0.5/0.5 | 工程 0.5/1.5 📝 5.4/10 | 后50% | #语音属性识别 | #大语言模型 | #少样本 #医疗音频 | arxiv 👥 作者与机构第一作者：Xinrui Ruan（University of California, Berkeley, Division of Biostatistics）通讯作者：Jingshen Wang（University of California, Berkeley, Division of Biostatistics）作者列表：Xinrui Ruan（University of California, Berkeley）、Yingfei Wang（University of Washington, Foster School of Business）、Waverly Wei（University of Southern California, Department of Data Sciences and Operations）、Jingshen Wang（University of California, Berkeley） 💡 毒舌点评论文在统计理论上花费了大量篇幅证明LLM伪标签的校准权重能提升半监督推断效率，思想严谨但不够惊艳——本质上是对半参数推断中投影技巧的LLM特化。实验局限于语音转录文本这一个应用，且与语音社区熟知的预训练模型（Wav2Vec2、HuBERT）毫无关联，代码、数据提取全闭源，对于语音/音频领域的读者而言，这更像一篇披着语音应用外衣的统计论文，而非真正解决语音问题的研究。 ...

MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio

📄 MedMosaic: A Challenging Large Scale Benchmark of Diverse Medical Audio #音频理解 #医疗音频 #基准测试 #数据集 #多模态模型 6.4/10 | 创新 0.8/2 | 严谨 0.8/1.5 | 实验 1.2/1.5 | 清晰 0.7/1 | 影响 1/1.5 | 开源 0.8/1.5 | 复现 0.3/0.5 | 工程 0.8/1.5 ✅ 6.4/10 | 前50% | #音频理解 | #多模态模型 | #医疗音频 #基准测试 | arxiv 👥 作者与机构第一作者：Harshit Rajgarhia（Centific Global Solutions Inc.）通讯作者：Harshit Rajgarhia（Centific Global Solutions Inc.）作者列表：Harshit Rajgarhia（Centific Global Solutions Inc.）、Shuubham Ojha（Centific Global Solutions Inc., University of Maryland, College Park）、Asif Shaik（Centific Global Solutions Inc.）、Akhil Pothanapalli（Centific Global Solutions Inc.）、Rachuri Lokesh（Centific Global Solutions Inc.）、Abhishek Mukherji（Centific Global Solutions Inc.）、Prasanna Desikan（Centific Global Solutions Inc.） 💡 毒舌点评这篇论文构建了一个规模可观（46k QA对）且设计精巧的医学音频推理基准，通过对13个前沿模型的系统评测，清晰暴露了当前多模态大模型在医学音频上的显著短板，尤其是言语理解与生理声理解的严重偏科。然而，数据完全依赖合成生成和API调用，使整个基准的价值高度绑定于特定商业模型（Gemini和ElevenLabs）的生成能力，缺乏对“真实”临床音频分布差距的严格验证；且没有开源代码、模型或完整的生成流水线，连自身宣称的“scalable”理念都无法让社区复制，工程诚意严重不足。 ...

NeuroCLUS: A Foundation Model with Functional Clustering for Intracranial Neural Decoding

📄 NeuroCLUS: A Foundation Model with Functional Clustering for Intracranial Neural Decoding #语音识别 #自监督学习 #预训练 #图神经网络 #医疗音频 6/10 | 创新 1.2/2 | 严谨 1/1.5 | 实验 1/1.5 | 清晰 0.7/1 | 影响 0.5/1.5 | 开源 0.5/1.5 | 复现 0.3/0.5 | 工程 0.8/1.5 ✅ 6/10 | 前50% | #语音识别 | #自监督学习 | #预训练 #图神经网络 | arxiv 👥 作者与机构第一作者：Hui Zheng（Independent Researcher）通讯作者：Hui Zheng（icml2026.neuroclus@gmail.com）作者列表：Hui Zheng（Independent Researcher）、Hai-Teng Wang（Independent Researcher） 💡 毒舌点评这项工作敏锐地捕捉到了现有iEEG基础模型在tokenization粒度上的核心矛盾——要么太细（单通道）要么太粗（全脑聚合），提出的两阶段功能聚类策略直击要害，在Du-IN语音生成任务上甚至大幅超越了专门的SOTA模型（65.92% vs 62.70%），这点值得称赞。然而，完全忽略解码任务中至关重要的时序动态聚类（即功能模块可能随时间漂移这一基本神经科学事实），仅用静态的功能依赖图指导token聚合，导致模型对复杂认知过程的适应性存疑；同时“独立研究者”的身份与高达10k小时的预训练数据和8张A100的算力需求存在一定张力，缺少代码和模型权重也使得“SOTA”声称暂时难以验证。 ...

Automatic Detection of Stress from Speech in the Trier Social Stress Test

📄 Automatic Detection of Stress from Speech in the Trier Social Stress Test #语音情感识别 #集成学习 #可解释性 #医疗音频 #模型比较 7.4/10 | 创新 0.9/2 | 严谨 1.3/1.5 | 实验 1/1.5 | 清晰 0.8/1 | 影响 0.8/1.5 | 开源 1.2/1.5 | 复现 0.4/0.5 | 工程 1/1.5 ✅ 7.4/10 | 前50% | #语音情感识别 | #集成学习 | #可解释性 #医疗音频 | arxiv 👥 作者与机构第一作者：Hanna Drimalla（比勒费尔德大学技术学院人本人工智能组）通讯作者：Hanna Drimalla（比勒费尔德大学技术学院人本人工智能组）作者列表：Hanna Drimalla（比勒费尔德大学技术学院人本人工智能组）、Wieland R. Cremer（未说明）、Christine Kraus（未说明）、Oliver T. Wolf（鲁尔大学波鸿分校心理学院认知心理学系） 💡 毒舌点评这篇论文用一个干净的全组间对照设计，为语音压力检测贡献了一个小而扎实的实证锚点，XGB 分类准确率 82% 清楚地证明讲话声确实藏着一把“压力尺子”。但回归预测整体疲软，仅有部分输出勉强显著，且 50 人的小样本令结果飘忽不定，很难让审稿人信服这套 acoustic-prosodic 特征包可以可靠地作为皮质醇的替代标志物。工程上提供了一个可复现的基线，但科学增量有限，考虑到实验设计、特征工程和模型选择均无本质突破，只能说是一份扎实但不够“亮眼”的工作。 ...