低资源 on 语音/音频论文速递

低资源 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E4%BD%8E%E8%B5%84%E6%BA%90/ Recent content in 低资源 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Bimodal Approach for Detecting Fatigue Using Speech and Personal Assessments in College Students https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bimodal-approach-for-detecting-fatigue-using/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-bimodal-approach-for-detecting-fatigue-using/ A Bimodal Approach for Detecting Fatigue Using Speech and Personal Assessments in College Students AFT: An Exemplar-Free Class Incremental Learning Method for Environmental Sound Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aft-an-exemplar-free-class-incremental-learning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-aft-an-exemplar-free-class-incremental-learning/ 音频分类 | 7.0/10 Ara-BEST-RQ: Multi Dialectal Arabic SSL https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ara-best-rq-multi-dialectal-arabic-ssl/ 语音识别 | 6.5/10 Asynchrony-Aware Decoupled Multimodal Control for Cued Speech Video Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-asynchrony-aware-decoupled-multimodal-control-for/ 语音合成 | 7.5/10 ATOM: Adaptive Token-Level Optimal Transport Mixup for Speech Translation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-atom-adaptive-token-level-optimal-transport-mixup/ 语音翻译 | 8.0/10 Bayesian Low-Rank Factorization for Robust Model Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bayesian-low-rank-factorization-for-robust-model/ 语音识别 | 8.0/10 Behind the Scenes: Mechanistic Interpretability of Lora-Adapted Whisper for Speech Emotion Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-behind-the-scenes-mechanistic-interpretability-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-behind-the-scenes-mechanistic-interpretability-of/ 语音情感识别 | 7.5/10 BEST-RQ-based Self-Supervised Learning for Whisper Domain Adaptation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-best-rq-based-self-supervised-learning-for/ 语音识别 | 7.5/10 BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-birq-bi-level-self-labeling-random-quantization/ 语音识别 | 8.0/10 CTC-DID: CTC-Based Arabic Dialect Identification for Streaming Applications https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ctc-did-ctc-based-arabic-dialect-identification/ 语音识别 | 6.5/10 DDSC: Dynamic Dual-Signal Curriculum for Data-Efficient Acoustic Scene Classification Under Domain Shift https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsc-dynamic-dual-signal-curriculum-for-data/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ddsc-dynamic-dual-signal-curriculum-for-data/ 音频场景分类 | 7.0/10 Domain-Aware Scheduling for ASR Fine-Tuning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-aware-scheduling-for-asr-fine-tuning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-domain-aware-scheduling-for-asr-fine-tuning/ 语音识别 | 6.5/10 Efficient Depression Detection from Speech via Language-Independent Prompt-Driven Reprogramming https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-efficient-depression-detection-from-speech-via/ 语音生物标志物 | 7.5/10 Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-entropy-guided-grvq-for-ultra-low-bitrate-neural/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-entropy-guided-grvq-for-ultra-low-bitrate-neural/ 语音合成 | 7.5/10 Exploring Fine-Tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-fine-tuning-of-large-audio-language/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-exploring-fine-tuning-of-large-audio-language/ 语音理解 | 8.0/10 Fast-ULCNet: A Fast and Ultra Low Complexity Network for Single-Channel Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fast-ulcnet-a-fast-and-ultra-low-complexity/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-fast-ulcnet-a-fast-and-ultra-low-complexity/ 语音增强 | 7.5/10 FinHuBERT: Hierarchical Feature Imitating Networks for Low-Resource Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-finhubert-hierarchical-feature-imitating-networks/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-finhubert-hierarchical-feature-imitating-networks/ 语音识别 | 7.5/10 FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-focalcodec-stream-streaming-low-bitrate-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-focalcodec-stream-streaming-low-bitrate-speech/ 语音编码 | 8.0/10 From Hallucination to Articulation: Language Model-Driven Losses for Ultra Low-Bitrate Neural Speech Coding https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-from-hallucination-to-articulation-language-model/ 语音合成 | 7.5/10 H-nnPBFDAF: Hierarchical Neural Network Partitioned Block Frequency Domain Adaptive Filter with Novel Block Activation Probability https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-h-nnpbfdaf-hierarchical-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-h-nnpbfdaf-hierarchical-neural-network/ 语音增强 | 7.5/10 HCGAN: Harmonic-Coupled Generative Adversarial Network for Speech Super-Resolution in Low-Bandwidth Scenarios https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hcgan-harmonic-coupled-generative-adversarial/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hcgan-harmonic-coupled-generative-adversarial/ 语音增强 | 8.0/10 How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-Resource Transfer https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-how-far-do-ssl-speech-models-listen-for-tone/ 语音识别 | 6.5/10 Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hybrid-pruning-in-situ-compression-of-self/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hybrid-pruning-in-situ-compression-of-self/ 说话人验证 | 8.0/10 Improving Audio Event Recognition with Consistency Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-event-recognition-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-improving-audio-event-recognition-with/ 音频事件检测 | 7.0/10 Leveraging Diffusion U-Net Features for Predominant Instrument Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-diffusion-u-net-features-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-leveraging-diffusion-u-net-features-for/ 音乐信息检索 | 8.0/10 Lightweight and Perceptually-Guided Voice Conversion for Electro-Laryngeal Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-perceptually-guided-voice/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lightweight-and-perceptually-guided-voice/ 语音转换 | 7.5/10 Lingometer: On-Device Personal Speech Word Counting System https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lingometer-on-device-personal-speech-word/ 语音活动检测 | 8.0/10 LLM-Based Post-ASR Error Correction for Disordered Speech https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llm-based-post-asr-error-correction-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-llm-based-post-asr-error-correction-for/ 语音识别 | 7.5/10 LOTUSDIS: A Thai Far-Field Meeting Corpus for Robust Conversational ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lotusdis-a-thai-far-field-meeting-corpus-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lotusdis-a-thai-far-field-meeting-corpus-for/ 语音识别 | 7.5/10 Low-Resource Speech-Based Early Alzheimers Detection via Cross-Lingual and Few-Shot Transfer Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-low-resource-speech-based-early-alzheimers/ 语音生物标志物 | 7.5/10 LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-lp-cfm-perceptual-invariance-aware-conditional/ 语音合成 | 7.0/10 Mind the Shift: Using Delta SSL Embeddings to Enhance Child ASR https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-the-shift-using-delta-ssl-embeddings-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mind-the-shift-using-delta-ssl-embeddings-to/ 语音识别 | 7.0/10 Mixtures of Lightweight Articulatory Experts for Multilingual Asr https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mixtures-of-lightweight-articulatory-experts-for/ 语音识别 | 7.0/10 Multilingual Supervised Pretraining with Lm-Assisted Decoding for Visual Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-multilingual-supervised-pretraining-with-lm/ 语音识别 | 6.5/10 Neuromamba: Adaptive Frequency Filtering with a Pyramid Mamba for sEEG-driven Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neuromamba-adaptive-frequency-filtering-with-a/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-neuromamba-adaptive-frequency-filtering-with-a/ 语音合成 | 8.0/10 One Model–Three Tasks: Discovering a Shared Winning Ticket for Low-Complexity Audio Intelligence https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-one-modelthree-tasks-discovering-a-shared-winning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-one-modelthree-tasks-discovering-a-shared-winning/ 音频分类 | 7.5/10 Polynomial Mixing for Efficient Self-Supervised Speech Encoders https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-polynomial-mixing-for-efficient-self-supervised/ 语音识别 | 8.0/10 Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-praxy-voice-voice-prompt-recovery-bups-for/ 语音合成 | 8.0/10 Prompt-Guided Mixture-of-Experts for Robust Multimodal Sentiment Analysis with Missing Modalities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prompt-guided-mixture-of-experts-for-robust/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-prompt-guided-mixture-of-experts-for-robust/ 语音情感识别 | 8.5/10 Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quantifying-speaker-embedding-phonological-rule/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-quantifying-speaker-embedding-phonological-rule/ 语音合成 | 7.0/10 Ranking The Impact of Contextual Specialization in Neural Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ranking-the-impact-of-contextual-specialization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ranking-the-impact-of-contextual-specialization/ 语音增强 | 7.5/10 Representation-Diverse Self-Supervision for Cross-Domain Bioacoustic Learning in Low-Resource Settings https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-representation-diverse-self-supervision-for-cross/ 生物声学 | 7.0/10 Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-ambiguity-augmenting-human-annotation-in/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-scaling-ambiguity-augmenting-human-annotation-in/ 语音情感识别 | 6.5/10 Self-Supervised Note Tracking and Multi-Pitch Estimation Via Reconstruction-Based Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-self-supervised-note-tracking-and-multi-pitch/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-self-supervised-note-tracking-and-multi-pitch/ 多音高估计音符跟踪 | 8.5/10 Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sequence-level-unsupervised-training-in-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-sequence-level-unsupervised-training-in-speech/ 语音识别 | 6.5/10 SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ssvd-o-parameter-efficient-fine-tuning-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ssvd-o-parameter-efficient-fine-tuning-with/ 语音识别 | 7.0/10 Synthesized Data Selection via Score Distribution Matching for Te Reo Māori Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-synthesized-data-selection-via-score-distribution/ 语音识别 | 8.0/10 TAGARELA - A Portuguese Speech Dataset from Podcasts https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tagarela-a-portuguese-speech-dataset-from-podcasts/ 语音识别语音合成 | 7.0/10 Taming Audio VAEs via Target-KL Regularization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-taming-audio-vaes-via-target-kl-regularization/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-taming-audio-vaes-via-target-kl-regularization/ 音频生成 | 6.5/10 Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-task-vector-in-tts-toward-emotionally-expressive/ 语音合成 | 7.0/10 Three Seconds is Sufficient: A Multi-Pronged Framework for Model-Based Speaker Adaptation in ASR Under Data-Scarce Conditions https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-three-seconds-is-sufficient-a-multi-pronged/ 语音识别 | 7.0/10 TICL: Text-Embedding KNN for Speech in-Context Learning Unlocks Speech Recognition Abilities of Large Multimodal Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ticl-text-embedding-knn-for-speech-in-context/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ticl-text-embedding-knn-for-speech-in-context/ 语音识别 | 7.5/10 TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-tmd-tts-a-unified-tibetan-multi-dialect-text-to/ 语音合成 | 7.5/10 Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-building-speech-large-language-models-for/ 语音识别 | 6.5/10 Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-lightweight-adaptation-of-speech/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-lightweight-adaptation-of-speech/ 语音增强 | 8.5/10 Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-orthographically-informed-evaluation-of/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-towards-orthographically-informed-evaluation-of/ 语音识别 | 7.0/10 Transfer Learning for Paediatric Sleep Apnoea Detection using Physiology-Guided Acoustic Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-transfer-learning-for-paediatric-sleep-apnoea/ 音频分类 | 7.0/10 UJCodec: An End-to-end Unet-Style Codec for Joint Speech Compression and Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ujcodec-an-end-to-end-unet-style-codec-for-joint/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ujcodec-an-end-to-end-unet-style-codec-for-joint/ 语音增强 | 7.5/10 UNMIXX: Untangling Highly Correlated Singing Voices Mixtures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unmixx-untangling-highly-correlated-singing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unmixx-untangling-highly-correlated-singing/ 语音分离 | 8.5/10 Unsupervised Lexicon Learning from Speech is Limited by Representations Rather than Clustering https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-lexicon-learning-from-speech-is/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-lexicon-learning-from-speech-is/ 语音发现 | 8.0/10 Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-variational-low-rank-adaptation-for-personalized/ 语音识别 | 7.5/10 WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-whisperpipe-a-resource-efficient-streaming/ 语音识别 | 6.5/10 Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-Resource Speech Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-windowed-summarymixing-an-efficient-fine-tuning/ 语音识别 | 6.5/10 语音/音频论文速递 2026-04-29 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29/ 共分析 29 篇语音/AI 论文 Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-dilated-cnns-for-periodic-signal-processing-a-low/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24-dilated-cnns-for-periodic-signal-processing-a-low/ 语音增强 | 6.5/10 语音/音频论文速递 2026-04-24 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24/ Fri, 24 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-24/ 共分析 21 篇语音/AI 论文 Enhancing ASR Performance in the Medical Domain for Dravidian Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23-enhancing-asr-performance-in-the-medical-domain/ 这篇论文旨在解决达罗毗荼语言（Telugu和Kannada）在医疗领域自动语音识别（ASR）中面临的标注数据稀缺和语言形态复杂两大挑战。其核心方法是提出一个“置信度感知训练框架”，该框架通过一个混合置信度评分机制（结合静态的感知、声学相似性、WER分数和动态的模型熵），对混合了真实与合成语音的训练数语音/音频论文速递 2026-04-23 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23/ Thu, 23 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-23/ 共分析 27 篇语音/AI 论文 Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-voice-of-india-a-large-scale-benchmark-for-real/ 这篇论文旨在解决现有印度语言语音识别（Indic ASR）基准不反映真实场景、评估方法不公平的核心问题。为此，作者构建了“Voice of India”大规模基准，其数据源自3.6万名说话者的非脚本化电话对话，覆盖15种主要印度语言和139个地区集群，总计536小时。关键创新在于采用了考虑拼写变体的语音/音频论文速递 2026-04-22 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22/ 共分析 21 篇语音/AI 论文 BhashaSutra: A Task-Centric Unified Survey of Indian NLP Datasets, Corpora, and Resources https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-bhashasutra-a-task-centric-unified-survey-of/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-bhashasutra-a-task-centric-unified-survey-of/ 这篇论文旨在解决印度语言NLP研究资源分散、缺乏统一概览的痛点。作者首次提出了一个以任务为中心的统一分类体系，系统性地梳理和整合了超过200个数据集、50个基准测试以及100多个模型、工具和系统，覆盖了从核心语言处理（如分词、词性标注）到文本分类、生成翻译、信息检索、语音与多模态，乃至社会文化任务（ ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-claricodec-optimising-neural-speech-codes-for/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-claricodec-optimising-neural-speech-codes-for/ 本文针对卫星、水下通信等超低比特率（200bps）场景下，传统神经语音编解码器因优化重建质量而牺牲可懂度的问题，提出了ClariCodec。其核心方法是将编码器的量化过程重新定义为一个随机策略，并利用强化学习（RL），以词错率（WER）作为奖励信号对编码器进行微调，而冻结解码器等声学重建管线。实验表语音/音频论文速递 2026-04-21 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21/ 共分析 34 篇语音/AI 论文 NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-naijas2st-a-multi-accent-benchmark-for-speech-to/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-naijas2st-a-multi-accent-benchmark-for-speech-to/ 这篇论文旨在解决非洲低资源语言在语音翻译（S2ST和S2TT）研究中面临的高质量、多口音平行语音数据严重匮乏的核心瓶颈。为此，作者构建了**NaijaS2ST**数据集，涵盖豪萨语、伊博语、约鲁巴语和尼日利亚皮钦语与英语的平行语音，每种语言约50小时，捕获了真实的说话者与口音多样性。基于此数据集，论语音/音频论文速递 2026-04-20 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20/ 共分析 24 篇语音/AI 论文 Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-few-shot-and-pseudo-label-guided-speech-quality/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-few-shot-and-pseudo-label-guided-speech-quality/ 本文旨在解决非侵入式语音质量评估在标注数据有限场景下的性能瓶颈。作者提出了GatherMOS框架，其核心是将大语言模型（如GPT-5）作为一个元评估器，通过精心设计的文本提示，融合多类异构信号：包括手语音/音频论文速递 2026-04-19 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19/ 共分析 42 篇语音/AI 论文语音/音频论文速递 2026-04-18 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-18/ Sat, 18 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-18/ 共分析 39 篇语音/AI 论文