迁移学习 on 语音/音频论文速递

迁移学习 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E8%BF%81%E7%A7%BB%E5%AD%A6%E4%B9%A0/ Recent content in 迁移学习 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-parameter-efficient-multi-scale-convolutional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-parameter-efficient-multi-scale-convolutional/ A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection AFT: An Exemplar-Free Class Incremental Learning Method for Environmental Sound Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aft-an-exemplar-free-class-incremental-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aft-an-exemplar-free-class-incremental-learning/ 音频分类 | 7.0/10 AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Speech Dataset with Speech Recognition Baselines https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aishell6-whisper-a-chinese-mandarin-audio-visual/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aishell6-whisper-a-chinese-mandarin-audio-visual/ 语音识别 | 8.3/10 CASTELLA: Long Audio Dataset with Captions and Temporal Boundaries https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-castella-long-audio-dataset-with-captions-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-castella-long-audio-dataset-with-captions-and/ 音频检索 | 8.5/10 Discrete-Continuous Fusion With Adaptive Hierarchical Features For Audio Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-continuous-fusion-with-adaptive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-discrete-continuous-fusion-with-adaptive/ 音频深度伪造检测 | 8.0/10 E2E-AEC: Implementing An End-To-End Neural Network Learning Approach for Acoustic Echo Cancellation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-e2e-aec-implementing-an-end-to-end-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-e2e-aec-implementing-an-end-to-end-neural-network/ 语音增强 | 7.5/10 Efficient Depression Detection from Speech via Language-Independent Prompt-Driven Reprogramming https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/ 语音生物标志物 | 7.5/10 Exploring Fine-Tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-fine-tuning-of-large-audio-language/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-fine-tuning-of-large-audio-language/ 语音理解 | 8.0/10 Fine-Tuning Large Audio-Language Models with Lora for Precise Temporal Localization of Prolonged Exposure Therapy Elements https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-large-audio-language-models-with-lora/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fine-tuning-large-audio-language-models-with-lora/ 音频事件检测 | 6.5/10 FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flowse-grpo-training-flow-matching-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-flowse-grpo-training-flow-matching-speech/ 语音增强 | 7.5/10 From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-human-speech-to-ocean-signals-transferring/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-human-speech-to-ocean-signals-transferring/ 水下声学目标识别 | 7.0/10 Generalizability of Predictive and Generative Speech Enhancement Models to Pathological Speakers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generalizability-of-predictive-and-generative/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-generalizability-of-predictive-and-generative/ 语音增强 | 7.0/10 GLUE: Gradient-free Learning to Unify Experts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-glue-gradient-free-learning-to-unify-experts/ 迁移学习 | 6.5/10 How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-Resource Transfer https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/ 语音识别 | 6.5/10 Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-human-1-by-josh-talks-a-full-duplex/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-human-1-by-josh-talks-a-full-duplex/ 语音对话系统 | 7.5/10 ICASSP 2026 - 迁移学习论文列表 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-099/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/icassp2026-task-099/ 共 1 篇 ICASSP 2026 迁移学习方向论文 Improving Active Learning for Melody Estimation by Disentangling Uncertainties https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-active-learning-for-melody-estimation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-active-learning-for-melody-estimation/ 音乐信息检索 | 7.5/10 Influence-Aware Curation and Active Selection for Industrial and Surveillance Sound Events https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-influence-aware-curation-and-active-selection-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-influence-aware-curation-and-active-selection-for/ 音频事件检测 | 7.0/10 Interval-Aware Retrieval Framework For Speech-Based Automatic Alzheimer’s Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-interval-aware-retrieval-framework-for-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-interval-aware-retrieval-framework-for-speech/ 语音生物标志物 | 8.5/10 It Is Personal: The Importance of Personalization for Recognizing Self-Reported Emotion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-it-is-personal-the-importance-of-personalization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-it-is-personal-the-importance-of-personalization/ 语音情感识别 | 8.0/10 Learning to Align with Unbalanced Optimal Transport in Linguistic Knowledge Transfer for ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-to-align-with-unbalanced-optimal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learning-to-align-with-unbalanced-optimal/ 语音识别 | 6.5/10 LOTUSDIS: A Thai Far-Field Meeting Corpus for Robust Conversational ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lotusdis-a-thai-far-field-meeting-corpus-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lotusdis-a-thai-far-field-meeting-corpus-for/ 语音识别 | 7.5/10 Low-Resource Speech-Based Early Alzheimers Detection via Cross-Lingual and Few-Shot Transfer Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/ 语音生物标志物 | 7.5/10 MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmaudiosep-taming-video-to-audio-generative-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mmaudiosep-taming-video-to-audio-generative-model/ 语音分离 | 8.0/10 Multi-Layer Attentive Probing Improves Transfer of Audio Representations for Bioacoustics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-layer-attentive-probing-improves-transfer/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multi-layer-attentive-probing-improves-transfer/ 生物声学 | 7.5/10 Multilingual Supervised Pretraining with Lm-Assisted Decoding for Visual Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/ 语音识别 | 6.5/10 Perceptual Loss Optimized HRTF Personalization in Spherical Harmonic Domain https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-perceptual-loss-optimized-hrtf-personalization-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-perceptual-loss-optimized-hrtf-personalization-in/ 空间音频 | 7.0/10 Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/ 语音合成 | 8.0/10 Probing the Hidden Talent of ASR foundation models for L2 English Oral Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-the-hidden-talent-of-asr-foundation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-the-hidden-talent-of-asr-foundation/ 预训练 | 7.5/10 Probing Whisper for Dysarthric Speech in Detection and Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-whisper-for-dysarthric-speech-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-probing-whisper-for-dysarthric-speech-in/ 语音生物标志物 | 6.5/10 Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for Voicemos 2024 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quality-assessment-of-noisy-and-enhanced-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quality-assessment-of-noisy-and-enhanced-speech/ 语音质量评估 | 7.0/10 Ranking The Impact of Contextual Specialization in Neural Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ranking-the-impact-of-contextual-specialization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ranking-the-impact-of-contextual-specialization/ 语音增强 | 7.5/10 Representation-Diverse Self-Supervision for Cross-Domain Bioacoustic Learning in Low-Resource Settings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/ 生物声学 | 7.0/10 SAUNA: Song-Level Audio & User-Listening Data Neural Alignment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sauna-song-level-audio-user-listening-data-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sauna-song-level-audio-user-listening-data-neural/ 音乐信息检索 | 7.0/10 SELD-MOHA: A Fine-Tuning Method with the Mixture of Heterogeneous Adapters for Sound Event Localization and Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-seld-moha-a-fine-tuning-method-with-the-mixture/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-seld-moha-a-fine-tuning-method-with-the-mixture/ 音频事件检测 | 7.0/10 Semantic Anchor Transfer from Short to Long Speech in a Distillation-Based Summarization Framework https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-anchor-transfer-from-short-to-long/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-semantic-anchor-transfer-from-short-to-long/ 语音摘要 | 7.5/10 SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sightsound-r1-cross-modal-reasoning-distillation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sightsound-r1-cross-modal-reasoning-distillation/ 音频问答 | 7.5/10 SpeechMapper: Speech-To-Text Embedding Projector for LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechmapper-speech-to-text-embedding-projector/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-speechmapper-speech-to-text-embedding-projector/ 语音大模型 | 7.0/10 Stress Prediction from Temporal Emotion Trajectories in Clinical Patient-Physician Conversations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-stress-prediction-from-temporal-emotion/ 语音情感识别 | 7.0/10 Synthesized Data Selection via Score Distribution Matching for Te Reo Māori Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/ 语音识别 | 8.0/10 Three Seconds is Sufficient: A Multi-Pronged Framework for Model-Based Speaker Adaptation in ASR Under Data-Scarce Conditions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/ 语音识别 | 7.0/10 Towards Fair ASR for Second Language Speakers using Fairness Prompted Finetuning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-fair-asr-for-second-language-speakers/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-fair-asr-for-second-language-speakers/ 语音识别 | 6.5/10 Transfer Learning for Paediatric Sleep Apnoea Detection using Physiology-Guided Acoustic Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/ 音频分类 | 7.0/10 Transferable Audio Lottery Tickets: Gradient Accumulation for Extreme Sparsity https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transferable-audio-lottery-tickets-gradient/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transferable-audio-lottery-tickets-gradient/ 音频分类 | 7.0/10 UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unet-based-fusion-and-exponential-moving-average/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unet-based-fusion-and-exponential-moving-average/ 说话人验证 | 7.5/10 WavLink: Compact Audio–Text Embeddings with a Global Whisper Token https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavlink-compact-audiotext-embeddings-with-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavlink-compact-audiotext-embeddings-with-a/ 音频检索 | 8.0/10 Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-Resource Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/ 语音识别 | 6.5/10 Low-Rank Adaptation Redux for Large Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-low-rank-adaptation-redux-for-large-models/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-low-rank-adaptation-redux-for-large-models/ 大语言模型 | 5.5/10 Environmental Sound Deepfake Detection Using Deep-Learning Framework https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-environmental-sound-deepfake-detection-using-deep/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-environmental-sound-deepfake-detection-using-deep/ 1. **问题**：针对环境声音（包括声音场景和声音事件）的深度伪造检测（ESDD）任务，现有研究不足，且尚不清楚声音场景与声音事件的伪造检测是否需要不同模型。 2. **方法核心**：提出一个深度学习框架，核心是采用预训练的音频模型（BEATs）作为特征提取器，并结合一种三阶段训练策略（包含对 SAND: The Challenge on Speech Analysis for Neurodegenerative Disease Assessment https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-sand-the-challenge-on-speech-analysis-for/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-sand-the-challenge-on-speech-analysis-for/ 1. **解决的问题**：针对神经退行性疾病（特别是肌萎缩侧索硬化症ALS）的早期诊断和监测，缺乏大规模、有临床标注的语音数据集，以及标准化的算法评估框架。 2. **方法核心**：构建并发布了名为SAND的挑战赛，其核心是提供一个扩展的、包含纵向数据的ALS患者与健康对照语音数据集（VOC-A FreezeEmpath: Efficient Training for Empathetic Spoken Chatbots with Frozen LLMs https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-freezeempath-efficient-training-for-empathetic/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-freezeempath-efficient-training-for-empathetic/ 本文旨在解决训练共情语音聊天机器人时面临的**共情语音数据稀缺、模型泛化能力弱、以及微调导致LLM通用能力退化**三大难题。作者提出了**FreezeEmpath**，一种高效的端到端训练框架。其核心方法是**冻结基础LLM**，采用**语义-情感解耦编码策略**，通过独立的语义适配器和情感提取器从 Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-contextual-biasing-for-asr-in-speech-llm-with/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-contextual-biasing-for-asr-in-speech-llm-with/ 这篇论文旨在解决语音大模型（SLLM）在识别训练数据中稀有或未见的“偏置词”时性能不佳的问题。传统方法依赖于为偏置词提供精确的音素序列（通过G2P系统生成），但这对用户有专业要求且工具兼容性差。为此， SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-speakerrpl-v2-robust-open-set-speaker/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-speakerrpl-v2-robust-open-set-speaker/ 本文旨在解决开放集说话人识别中的鲁棒性问题，即系统在仅有少量目标说话人注册样本的情况下，需同时准确识别已知说话人并可靠拒识未知说话人。作者在先前SpeakerRPL V1框架基础上提出了三项关键改进： Transformer Based Machine Fault Detection From Audio Input https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-transformer-based-machine-fault-detection-from/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-transformer-based-machine-fault-detection-from/ 本文旨在探讨基于Transformer的架构在机器故障音频检测任务上相对于传统卷积神经网络（CNN）的潜在优势。**要解决的问题**是传统CNN在处理频谱图时固有的局部性和平移不变性等归纳偏置，可能并