时频分析 on 语音/音频论文速递

时频分析 on 语音/音频论文速递 https://nanless.github.io/audio-paper-digest-blog/tags/%E6%97%B6%E9%A2%91%E5%88%86%E6%9E%90/ Recent content in 时频分析 on 语音/音频论文速递 Hugo zh-cn Wed, 29 Apr 2026 00:00:00 +0000 A Noniterative Phase Retrieval Considering the Zeros of STFT Magnitude https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-noniterative-phase-retrieval-considering-the/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-a-noniterative-phase-retrieval-considering-the/ 信号处理 | 7.5/10 Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-non-stationarity-objective-assessment/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-acoustic-non-stationarity-objective-assessment/ 音频分类 | 7.0/10 An Audio-Visual Speech Separation Network with Joint Cross-Attention and Iterative Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-audio-visual-speech-separation-network-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-audio-visual-speech-separation-network-with/ 语音分离 | 7.5/10 An Event-Based Sequence Modeling Approach to Recognizing Non-Triad Chords with Oversegmentation Minimization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-event-based-sequence-modeling-approach-to/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-an-event-based-sequence-modeling-approach-to/ 音乐信息检索 | 7.5/10 AR-BSNet: Towards Ultra-Low Complexity Autoregressive Target Speaker Extraction With Band-Split Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ar-bsnet-towards-ultra-low-complexity/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-ar-bsnet-towards-ultra-low-complexity/ 语音分离 | 7.0/10 Audio Deepfake Detection at the First Greeting: "Hi!" https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-deepfake-detection-at-the-first-greeting-hi/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-audio-deepfake-detection-at-the-first-greeting-hi/ 音频深度伪造检测 | 7.5/10 BioSEN: A Bio-Acoustic Signal Enhancement Network for Animal Vocalizations https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-biosen-a-bio-acoustic-signal-enhancement-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-biosen-a-bio-acoustic-signal-enhancement-network/ 生物声学 | 7.5/10 BSMP-SENet:Band-Split Magnitude-Phase Network for Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bsmp-senetband-split-magnitude-phase-network-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-bsmp-senetband-split-magnitude-phase-network-for/ 语音增强 | 7.0/10 Coupling Acoustic Geometry and Visual Semantics for Robust Depth Estimation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-coupling-acoustic-geometry-and-visual-semantics/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-coupling-acoustic-geometry-and-visual-semantics/ 空间音频 | 7.5/10 Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-cultural-bias-in-mel-scale-representations/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-cross-cultural-bias-in-mel-scale-representations/ 语音识别 | 7.0/10 Enabling Multi-Species Bird Classification on Low-Power Bioacoustic Loggers https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enabling-multi-species-bird-classification-on-low/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-enabling-multi-species-bird-classification-on-low/ 生物声学 | 8.0/10 H-nnPBFDAF: Hierarchical Neural Network Partitioned Block Frequency Domain Adaptive Filter with Novel Block Activation Probability https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-h-nnpbfdaf-hierarchical-neural-network/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-h-nnpbfdaf-hierarchical-neural-network/ 语音增强 | 7.5/10 HVAC-EAR: Eavesdropping Human Speech Using HVAC Systems https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hvac-ear-eavesdropping-human-speech-using-hvac/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-hvac-ear-eavesdropping-human-speech-using-hvac/ 音频安全 | 8.5/10 Input-Adaptive Differentiable Filterbanks via Hypernetworks for Robust Speech Processing https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-input-adaptive-differentiable-filterbanks-via/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-input-adaptive-differentiable-filterbanks-via/ 语音识别 | 7.5/10 Is Phase Really Needed for Weakly-Supervised Dereverberation? https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-is-phase-really-needed-for-weakly-supervised/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-is-phase-really-needed-for-weakly-supervised/ 语音增强 | 6.0/10 Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-Task Multi-Scale Network https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-piano-dynamics-and-metrical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-joint-estimation-of-piano-dynamics-and-metrical/ 音乐理解 | 7.5/10 Korean aegyo speech shows systematic F1 increase to signal childlike qualities https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-korean-aegyo-speech-shows-systematic-f1-increase/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-korean-aegyo-speech-shows-systematic-f1-increase/ 语音情感识别 | 6.0/10 Learnable Mel-Frontend for Robust Underwater Acoustic Target Detection under Non-Target Interference https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learnable-mel-frontend-for-robust-underwater/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-learnable-mel-frontend-for-robust-underwater/ 音频分类 | 6.5/10 Mambaformer: State-Space Augmented Self-Attention with Downup Sampling for Monaural Speech Enhancement https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mambaformer-state-space-augmented-self-attention/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-mambaformer-state-space-augmented-self-attention/ 语音增强 | 7.0/10 Non-Line-of-Sight Vehicle Detection via Audio-Visual Fusion https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-non-line-of-sight-vehicle-detection-via-audio/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-non-line-of-sight-vehicle-detection-via-audio/ 音频分类 | 8.0/10 Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-poly-svc-polyphony-aware-singing-voice-conversion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-poly-svc-polyphony-aware-singing-voice-conversion/ 歌唱语音转换 | 6.5/10 Random Matrix-Driven Graph Representation Learning For Bioacoustic Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-random-matrix-driven-graph-representation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-random-matrix-driven-graph-representation/ 生物声学 | 7.5/10 RMODGDF: A Robust STFT-Derived Feature for Musical Instrument Recognition https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rmodgdf-a-robust-stft-derived-feature-for-musical/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-rmodgdf-a-robust-stft-derived-feature-for-musical/ 音乐信息检索 | 7.0/10 Snore Sound Classification Based on Physiological Features and Adaptive Loss Function https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-snore-sound-classification-based-on-physiological/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-snore-sound-classification-based-on-physiological/ 音频分类 | 6.5/10 Spectrogram Event Based Feature Representation for Generalizable Automatic Music Transcription https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spectrogram-event-based-feature-representation/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-spectrogram-event-based-feature-representation/ 音乐信息检索 | 7.5/10 Subgraph Localization in the Subbands for Partially Spoofed Speech Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subgraph-localization-in-the-subbands-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subgraph-localization-in-the-subbands-for/ 音频深度伪造检测 | 8.0/10 Subspace Hybrid Adaptive Filtering for Phonocardiogram Signal Denoising https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subspace-hybrid-adaptive-filtering-for/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-subspace-hybrid-adaptive-filtering-for/ 音频增强 | 7.0/10 UMV: A Mixture-Of-Experts Vision Transformer with Multi-Spectrogram Fusion for Underwater Ship Noise Classification https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-umv-a-mixture-of-experts-vision-transformer-with/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-umv-a-mixture-of-experts-vision-transformer-with/ 音频分类 | 7.5/10 UNMIXX: Untangling Highly Correlated Singing Voices Mixtures https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unmixx-untangling-highly-correlated-singing/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unmixx-untangling-highly-correlated-singing/ 语音分离 | 8.5/10 Unsupervised Discovery and Analysis of the Vocal Repertoires and Patterns of Select Corvid Species https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-discovery-and-analysis-of-the-vocal/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-unsupervised-discovery-and-analysis-of-the-vocal/ 生物声学 | 7.5/10 USVexplorer: Robust Detection of Ultrasonic Vocalizations with Cross Species Generalization https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-usvexplorer-robust-detection-of-ultrasonic/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-usvexplorer-robust-detection-of-ultrasonic/ 音频事件检测 | 8.0/10 Voting-Based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voting-based-pitch-estimation-with-temporal-and/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-voting-based-pitch-estimation-with-temporal-and/ 语音识别 | 8.0/10 WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavesp-net-learnable-wavelet-domain-sparse-prompt/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavesp-net-learnable-wavelet-domain-sparse-prompt/ 语音伪造检测 | 8.0/10 WaveSpikeNet: A Wavelet-Spiking Fusion Architecture for Audio Classification on Edge Devices https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavespikenet-a-wavelet-spiking-fusion/ Wed, 29 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-29-wavespikenet-a-wavelet-spiking-fusion/ 音频分类 | 7.5/10 Spectro-Temporal Modulation Representation Framework for Human-Imitated Speech Detection https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-spectro-temporal-modulation-representation/ Tue, 28 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-28-spectro-temporal-modulation-representation/ 语音伪造检测 | 6.5/10 Earable Platform with Integrated Simultaneous EEG Sensing and Auditory Stimulation https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-earable-platform-with-integrated-simultaneous-eeg/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-earable-platform-with-integrated-simultaneous-eeg/ 音频事件检测 | 5.5/10 Spectrographic Portamento Gradient Analysis: A Quantitative Method for Historical Cello Recordings with Application to Beethoven's Piano and Cello Sonatas, 1930--2012 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-spectrographic-portamento-gradient-analysis-a/ Mon, 27 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-27-spectrographic-portamento-gradient-analysis-a/ 音乐信息检索 | 7.5/10 Audio Spoof Detection with GaborNet https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-audio-spoof-detection-with-gabornet/ Wed, 22 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-22-audio-spoof-detection-with-gabornet/ 本论文旨在解决传统SincNet前端在音频伪造检测中因有限长度sinc函数截断导致的频率泄漏问题。作者提出使用可学习的Gabor滤波器组（GaborNet）替代SincNet，并将其集成到两种先进的端到端检测架构RawNet2和RawGAT-ST中。同时，论文探索了将LEAF（Learnable F A novel LSTM music generator based on the fractional time-frequency feature extraction https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-a-novel-lstm-music-generator-based-on-the/ Tue, 21 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-21-a-novel-lstm-music-generator-based-on-the/ 本文提出了一种基于分数阶傅里叶变换（FrFT）和长短期记忆网络（LSTM）的新型AI音乐生成系统。**核心目标**是利用FrFT在分数阶域（时频平面的旋转表示）中提取比传统时域或频域更丰富的音乐信号特征，以解决传统LSTM在捕捉音乐复杂时频结构上的不足。**关键方法**是将输入音乐信号进行FrFT变 ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-artifactnet-detecting-ai-generated-music-via/ Mon, 20 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-20-artifactnet-detecting-ai-generated-music-via/ 本文旨在解决AI生成音乐检测中泛化性差和模型参数效率低的问题。作者提出了一种名为**ArtifactNet**的新框架，其核心创新在于将问题**重新定义为“法医物理学”**，即直接提取和分析神经音频编解码器在生成音频中不可避免留下的物理痕迹（残留物）。该方法使用一个轻量级的**Bounded-mas Elastic Net Regularization and Gabor Dictionary for Classification of Heart Sound Signals using Deep Learning https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-elastic-net-regularization-and-gabor-dictionary/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-elastic-net-regularization-and-gabor-dictionary/ 本文旨在解决心音信号（PCG）的多分类问题，以辅助心血管疾病的自动诊断。核心贡献在于提出了一套结合**优化Gabor字典**和**弹性网络正则化**的特征提取框架，并与**CNN-LSTM深度学习网络 Enhancing time-frequency resolution with optimal transport and barycentric fusion of multiple spectrogram https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-enhancing-time-frequency-resolution-with-optimal/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-enhancing-time-frequency-resolution-with-optimal/ **核心问题**：短时傅里叶变换（STFT）生成的谱图受制于不确定性原理，无法同时获得优异的时间和频率分辨率。传统融合方法（如几何平均）要求输入谱图网格对齐，且性能有限。 **核心方法**：本文提出一 Transformer Based Machine Fault Detection From Audio Input https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-transformer-based-machine-fault-detection-from/ Sun, 19 Apr 2026 00:00:00 +0000 https://nanless.github.io/audio-paper-digest-blog/posts/2026-04-19-transformer-based-machine-fault-detection-from/ 本文旨在探讨基于Transformer的架构在机器故障音频检测任务上相对于传统卷积神经网络（CNN）的潜在优势。**要解决的问题**是传统CNN在处理频谱图时固有的局部性和平移不变性等归纳偏置，可能并