3D Mesh Grid Room Impulse Responses Measured with A Linear Microphone Array And Suppression of Frame Reflections 2026-04-29
A Bayesian Approach to Singing Skill Evaluation Using Semitone Pitch Histogram and MCMC-Based Generated Quantities 2026-04-29
A Bimodal Approach for Detecting Fatigue Using Speech and Personal Assessments in College Students 2026-04-29
A Consistent Learning Depression Detection Framework Integrating Multi-View Attention 2026-04-29
A Data-Driven Framework for Personal Sound Zone Control Addressing Loudspeaker Nonlinearities 2026-04-29
A Dataset of Robot-Patient and Doctor-Patient Medical Dialogues for Spoken Language Processing Tasks 2026-04-29
A Distribution Matching Approach to Neural Piano Transcription with Optimal Transport 2026-04-29
A Dynamic Gated Cross-Attention Framework for Audio-Text Apparent Personality Analysis 2026-04-29
A Feature-Optimized Audio Watermarking Algorithm with Adaptive Embedding Strength 2026-04-29
A Framework for Controlled Multi-Speaker Audio Synthesis for Robustness Evaluation of Speaker Diarisation Systems 2026-04-29
A Generalization Strategy for Speech Quality Prediction: From Domain-Specific to Unified Datasets 2026-04-29
A Generative-First Neural Audio Autoencoder 2026-04-29
A Hybrid Convolution-Mamba Network with Tone-Octave Contrastive Learning for Stratified Semi-Supervised Singing Melody Extraction 2026-04-29
A Learning-Based Automotive Sound Field Reproduction Method Using Plane-Wave Decomposition and Multi-Position Constraint 2026-04-29
A Lightweight Fourier-Based Network for Binaural Speech Enhancement with Spatial Cue Preservation 2026-04-29
A LLM-Driven Acoustic Semantic Enriched Framework for Underwater Acoustic Target Recognition 2026-04-29
A Metric Learning Approach to Heart Murmur Detection from Phonocardiogram Recordings 2026-04-29
A New Method and Dataset for Classroom Teaching Stage Segmentation 2026-04-29
A Noniterative Phase Retrieval Considering the Zeros of STFT Magnitude 2026-04-29
A Noval Monte Carlo Gradient Method Based on Meta-Learning for Effective Step-Size Selection in Active Noise Control 2026-04-29
A Parameter-Efficient Multi-Scale Convolutional Adapter for Synthetic Speech Detection 2026-04-29
A Personalized Real-Time Proactive Voice Memory Assistant 2026-04-29
A Robust KNN Approach for Multi-Class Laryngeal Disease Detection using MFCC Features 2026-04-29
A Robust Multi-Scale Framework with Test-Time Adaptation for sEEG-Based Speech Decoding 2026-04-29
A Speech-Driven Paradigm for Physics-Informed Modeling of Coupled Micro-Speakers 2026-04-29
A Stabilized Hybrid Active Noise Control Algorithm of GFANC and FxNLMS with Online Clustering 2026-04-29
A State-Dependent Markov Diffusion Process for Generative Speech Enhancement 2026-04-29
A Study of Data Selection Strategies for Pre-Training Self-Supervised Speech Models 2026-04-29
A Superb-Style Benchmark of Self-Supervised Speech Models for Audio Deepfake Detection 2026-04-29
A Task-Aware Dual-Level Self-Supervised Learning Method for Effective Sound Event Detection 2026-04-29
A Text-To-Text Alignment Algorithm for Better Evaluation of Modern Speech Recognition Systems 2026-04-29
A Unified SVD-Modal Solution for Sparse Sound Field Reconstruction with Hybrid Spherical-Linear Microphone Arrays 2026-04-29
A Unsupervised Domain Adaptation Framework For Semi-Supervised Melody Extraction Using Confidence Matrix Replace and Nearest Neighbour Supervision 2026-04-29
ACAVCaps: Enabling Large-Scale Training for Fine-Grained and Diverse Audio Understanding 2026-04-29
Accelerating Regularized Attention Kernel Regression for Spectrum Cartography 2026-04-29
AccLID: Accent-aware Language Identification for Robust Multilingual Speech Recognition 2026-04-29
ACIR-MACL: Effective Multimodal Sentiment Analysis via Attention-Based Causal Intervention Regularization and Multi-Aspect Contrastive Learning 2026-04-29
Acoustic and Facial Markers of Perceived Conversational Success in Spontaneous Speech 2026-04-29
Acoustic Feedback Cancellation in Hearing Aids Exploiting an Inertial Sensor 2026-04-29
Acoustic Non-Stationarity Objective Assessment with Hard Label Criteria for Supervised Learning Models 2026-04-29
Acoustic Teleportation Via Disentangled Neural Audio Codec Representations 2026-04-29
Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition 2026-04-29
Adaptive Deterministic Flow Matching for Target Speaker Extraction 2026-04-29
Adaptive Embedding Fusion with Contrastive Learning for Robust Fully Few-Shot Class-Incremental Audio Classification 2026-04-29
Adaptive Per-Channel Energy Normalization Front-End for Robust Audio Signal Processing 2026-04-29
Adaptive Rotary Steering with Joint Autoregression for Robust Extraction of Closely Moving Speakers in Dynamic Scenarios 2026-04-29
Adaptive Spectral Weighting in Sagittal-Plane Sound Localization: A Reliability-Driven Approach 2026-04-29
Adaptive Task-Incremental Learning For Underwater Acoustic Recognition Based on Mixture-of-Experts Adapter 2026-04-29
Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection 2026-04-29
ADH-VA: Adaptive Directed-Hypergraph Convolution with VA Contrastive Learning for Multimodal Conversational Emotion Recognition 2026-04-29
Advanced modeling of interlanguage speech intelligibility benefit with L1-L2 multi-task learning using differentiable K-means for accent-robust discrete token-based ASR 2026-04-29
Advancing LLM-Based Multi-Channel Multi-Speaker Speech Recognition with Global Cross-Channel Attention and Sentence-Ordered First-In First-Out Serialized Output Training 2026-04-29
Advancing Semi-Supervised Child Speech Recognition with Omni-Temporal Classification under Label Noise 2026-04-29
Advancing Speech Summarization in Multi-Modal LLMs with Reinforcement Learning 2026-04-29
Advancing Speech Understanding in Speech-Aware Language Models with GRPO 2026-04-29
Adversarial Defense via Generative Speech Enhancement Module 2026-04-29
Adversarial Fine-Tuning on Speech Foundation Model with Vulnerable Attention Consistency Regularization for Robust Speech Recognition 2026-04-29
Adversarial Rivalry Learning for Music Classification 2026-04-29
Affect-Jigsaw: Integrating Core and Peripheral Emotions for Harmonious Fine-Grained Multimodal Emotion Recognition 2026-04-29
AFT: An Exemplar-Free Class Incremental Learning Method for Environmental Sound Classification 2026-04-29
AI-Generated Music Detection in Broadcast Monitoring 2026-04-29
Ailive Mixer: A Deep Learning Based Zero Latency Automatic Music Mixer for Live Music Performances 2026-04-29
AISHELL6-Whisper: A Chinese Mandarin Audio-Visual Whisper Speech Dataset with Speech Recognition Baselines 2026-04-29
Aligning Generative Speech Enhancement with Perceptual Feedback 2026-04-29
Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints 2026-04-29
ALMA-Chor: Leveraging Audio-Lyric Alignment with Mamba for Chorus Detection 2026-04-29
AMBER2: Dual Ambiguity-Aware Emotion Recognition Applied to Speech and Text 2026-04-29
AmbiDrop: Array-Agnostic Speech Enhancement Using Ambisonics Encoding and Dropout-Based Learning 2026-04-29
AMBISONIC-DML: A Benchmark Dataset for Dynamic Higher-Order Ambisonics Music with Motion-Aligned Stems 2026-04-29
An Anomaly-Aware and Audio-Enhanced Dual-Pathway Framework for Alzheimer’s Disease Progression Classification 2026-04-29
An Audio-Visual Speech Separation Network with Joint Cross-Attention and Iterative Modeling 2026-04-29
An Efficient Neural Network for Modeling Human Auditory Neurograms for Speech 2026-04-29
An End-to-End Multimodal System for Subtitle Recognition and Chinese-Japanese Translation in Short Dramas 2026-04-29
An Envelope Separation Aided Multi-Task Learning Model for Blind Source Counting and Localization 2026-04-29
An Event-Based Sequence Modeling Approach to Recognizing Non-Triad Chords with Oversegmentation Minimization 2026-04-29
An Unsupervised Alignment Feature Fusion System for Spoken Language-Based Dementia Detection 2026-04-29
Aneural Forward Filtering for Speaker-Image Separation 2026-04-29
AnimalCLAP: Taxonomy-Aware Language-Audio Pretraining for Species Recognition and Trait Inference 2026-04-29
AnyAccomp: Generalizable Accompaniment Generation Via Quantized Melodic Bottleneck 2026-04-29
AnyRIR: Robust Non-Intrusive Room Impulse Response Estimation in the Wild 2026-04-29
APKD: Aligned And Paced Knowledge Distillation Towards Lightweight Heterogeneous Multimodal Emotion Recognition 2026-04-29
AQUA-Bench: Beyond finding answers to knowing when there are None in Audio Question Answering 2026-04-29
AR-BSNet: Towards Ultra-Low Complexity Autoregressive Target Speaker Extraction With Band-Split Modeling 2026-04-29
AR&D: A Framework for Retrieving and Describing Concepts for Interpreting AudioLLMs 2026-04-29
Ara-BEST-RQ: Multi Dialectal Arabic SSL 2026-04-29
Arbitrarily Settable Frame Rate Neural Speech Codec with Content Adaptive Variable Length Segmentation 2026-04-29
ARCHI-TTS: A Flow-Matching-Based Text-to-Speech Model with Self-Supervised Semantic Aligner and Accelerated Inference 2026-04-29
Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks? 2026-04-29
ASAP: An Azimuth-Priority Strip-Based Search Approach to Planar Microphone Array DOA Estimation in 3D 2026-04-29
Assessing Identity Leakage in Talking Face Generation: Metrics and Evaluation Framework 2026-04-29
Assessing the Impact of Speaker Identity in Speech Spoofing Detection 2026-04-29
Assessing The Perceptual Impact of Low-Altitude Aircraft Noise in Cities: An Auralization Framework Using Gaussian Beam Tracing 2026-04-29
Asynchrony-Aware Decoupled Multimodal Control for Cued Speech Video Generation 2026-04-29
ATOM: Adaptive Token-Level Optimal Transport Mixup for Speech Translation 2026-04-29
Atomic Norm Minimization Revisited: Progressive Atom Identification And Refinement 2026-04-29
Attention-Based Encoder-Decoder Target-Speaker Voice Activity Detection for Robust Speaker Diarization 2026-04-29
Attention-Weighted Centered Kernel Alignment for Knowledge Distillation in Large Audio-Language Models Applied To Speech Emotion Recognition 2026-04-29
Attention2Probability: Attention-Driven Terminology Probability Estimation for Robust Speech-to-text System 2026-04-29
Attentive AV-Fusionnet: Audio-Visual Quality Prediction with Hybrid Attention 2026-04-29
Attentive Masked Self-Distillation for Respiratory Sound Classification 2026-04-29
Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding 2026-04-29
Audience-Aware Co-speech Gesture Generation in Public Speaking via Anticipation Tokens 2026-04-29
Audio Classification Models are Vulnerable to Filter Perturbations 2026-04-29
Audio Deepfake Detection at the First Greeting: “Hi!” 2026-04-29
Audio Effect Estimation with DNN-Based Prediction and Search Algorithm 2026-04-29
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing 2026-04-29
Audio-Guided Multimodal Approach for Fine-Grained Alignment and Boundary Modeling in Active Speaker Detection 2026-04-29
Audio-Text Jailbreak Attack on Large Audio-Language Models: Towards Generality and Stealthiness 2026-04-29
Audio-to-Score Jazz Solo Transcription with the Rhythm Perceiver 2026-04-29
Audio-Visual Deepfake Generation and Detection: An Exploratory Survey 2026-04-29
Audio-Visual Feature Fusion for Calibrating Relevance Scores of Video Moment Retrieval 2026-04-29
AUDIOCARDS: Structured Metadata Improves Audio Language Models for Sound Design 2026-04-29
AudioFuse: Unified Spectral-Temporal Learning Via A Hybrid VIT-1D CNN Architecture for Phonocardiogram Classification 2026-04-29
AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation 2026-04-29
AUDIOGENIE-Reasoner: A Training-Free Multi-Agent Framework for Coarse-to-Fine Audio Deep Reasoning 2026-04-29
Auditory Illusion Benchmark for Large Audio Language Models 2026-04-29
Auditory-Inspired Transformer for Binaural Speech Enhancement and Spatial Cue Preservation 2026-04-29
AURA: A Stegaformer-Based Scalable Deep Audio Watermark with Extreme Robustness 2026-04-29
Auto-MatchCut: An Audio-Visual Retrieval Framework for Seamless Match Cutting 2026-04-29
Automated Dysphagia Screening Using Noninvasive Neck Acoustic Sensing 2026-04-29
Automatic Estimation of Speaker Diarization Error Rate Based on Features of Audio Quality and Speaker Discriminability 2026-04-29
Automatic Music Mixing Using a Generative Model of Effect Embeddings 2026-04-29
Automatic Music Sample Identification with Multi-Track Contrastive Learning 2026-04-29
AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook 2026-04-29
Auxiliary Multi-Label Training For Improving the Robustness of Audio Deepfake Detection on AI-Processed Data 2026-04-29
AVATAR: Audio-Visual Adaptive Fusion via Trained Agent Reinforcement for Multimodal Deepfake Detection 2026-04-29
AVO-65: A Large-Scale Hierarchical Audio-Visual Object Dataset 2026-04-29
B-GRPO: Unsupervised Speech Emotion Recognition Based on Batched-Group Relative Policy Optimization 2026-04-29
BACHI: Boundary-Aware Symbolic Chord Recognition Through Masked Iterative Decoding on POP and Classical Music 2026-04-29
Bayesian Low-Rank Factorization for Robust Model Adaptation 2026-04-29
Bayesian Signal Separation Via Plug-and-Play Diffusion-Within-Gibbs Sampling 2026-04-29
BBPE16: UTF-16-Based Byte-Level Byte-Pair Encoding for Improved Multilingual Speech Recognition 2026-04-29
Beamforming Using Virtual Microphones for Hearing Aid Applications 2026-04-29
Beat and Downbeat Detection: A Reformulated Approach 2026-04-29
BeatMamba: Bidirectional Selective State-Space Modeling for Efficient Beat Tracking 2026-04-29
Behind the Scenes: Mechanistic Interpretability of Lora-Adapted Whisper for Speech Emotion Recognition 2026-04-29
Benchmarking Humans And Machines On Complex Multilingual Speech Understanding Tasks 2026-04-29
Benchmarking Music Autotagging with MGPHot Expert Annotations vs. Generic Tag Datasets 2026-04-29
BEST-RQ-based Self-Supervised Learning for Whisper Domain Adaptation 2026-04-29
BEST-STD 2.0: Balanced and Efficient Speech Tokenizer for Spoken Term Detection 2026-04-29
Beyond Face Swapping: A Diffusion-Based Digital Human Benchmark for Multimodal Deepfake Detection 2026-04-29
Beyond Global Emotion: Fine-Grained Emotional Speech Synthesis with Dynamic Word-Level Modulation 2026-04-29
Beyond Isolated Utterances: Cue-Guided Interaction for Context-Dependent Conversational Multimodal Understanding 2026-04-29
Beyond Mapping: Domain-Invariant Representations via Spectral Embedding of Optimal Transport Plans 2026-04-29
Bimodal Fusion Framework for Dynamic Facial Expression Recognition In-The-Wild 2026-04-29
BioSEN: A Bio-Acoustic Signal Enhancement Network for Animal Vocalizations 2026-04-29
BiRQ: Bi-Level Self-Labeling Random Quantization for Self-Supervised Speech Recognition 2026-04-29
Bleed No More: Generative Interference Reduction for Musical Recordings 2026-04-29
Bloodroot: When Watermarking Turns Poisonous for Stealthy Backdoor 2026-04-29
Bone-Conduction Guided Multimodal Speech Enhancement with Conditional Diffusion Models 2026-04-29
Brainprint-Modulated Target Speaker Extraction 2026-04-29
Break-the-Beat! Controllable MIDI-to-Drum audio synthesis 2026-04-29
BridgeCode: A Dual Speech Representation Paradigm for Autoregressive Zero-Shot Text-to-Speech Synthesis 2026-04-29
Bridging the Front-End and Back-End for Robust ASR via Cross-Attention-Based U-Net 2026-04-29
Bridging the Measurement–Simulation Gap in Room Acoustics with Real2sim Diffusion 2026-04-29
Bridging the Semantic Gap: Cross-Attentive Fusion for Joint Acoustic-Semantic Speech Quality Assessment 2026-04-29
BSMP-SENet:Band-Split Magnitude-Phase Network for Speech Enhancement 2026-04-29
CALM: Joint Contextual Acoustic-Linguistic Modeling for Personalization of Multi-Speaker ASR 2026-04-29
CaMoD: Causal-Aware Modality Denoising for Multimodal Dialogue Intent Recognition 2026-04-29
Can Hierarchical Cross-Modal Fusion Predict Human Perception of AI Dubbed Content? 2026-04-29
Can Large Audio Language Models Understand Audio Well? Speech, Scene and Events Understanding Benchmark for LALMs 2026-04-29
Caption and Audio-Guided Video Representation Learning with Gated Attention for Partially Relevant Video Retrieval 2026-04-29
Cardiobridge-DM: Bridging Cross-Cohort Heart Sound Synthesis via Rhythm-Aware Semi-Supervised Diffusion 2026-04-29
CASTELLA: Long Audio Dataset with Captions and Temporal Boundaries 2026-04-29
CCST: Cross-Modal and Consistency-Aware Self-Training for Source-Free Unsupervised Domain Adaptation in Speech Recognition 2026-04-29
Chunk-Wise Attention Transducers for Fast and Accurate Streaming Speech-to-Text 2026-04-29
Chunkwise Aligners for Streaming Speech Recognition 2026-04-29
Class-Aware Permutation-Invariant Signal-to-Distortion Ratio for Semantic Segmentation of Sound Scene with Same-Class Sources 2026-04-29
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents 2026-04-29
Clue2Emo: A Brain-Inspired Framework for Open-Vocabulary Multimodal Emotion Recognition 2026-04-29
CMSA-Mamba: Hierarchical State Space Modeling for Audio-Based Depression Detection 2026-04-29
Co-Initialization of Control Filter and Secondary Path via Meta-Learning for Active Noise Control 2026-04-29
CodecSlime: Temporal Redundancy Compression of Neural Speech Codec via Dynamic Frame Rate 2026-04-29
CodeSep: Low-Bitrate Codec-Driven Speech Separation with Base-Token Disentanglement and Auxiliary-Token Serial Prediction 2026-04-29
Combining Multi-Order Attention and Multi-Resolution Discriminator for High-Fidelity Neural Vocoder 2026-04-29
Combining SSL Speech Features, Contextual Transformers and Mamba Models for Realistic Audio Spoofing Detection 2026-04-29
Compression meets Sampling: LZ78-SPA for Efficient Symbolic Music Generation 2026-04-29
CompSpoof: A Dataset and Joint Learning Framework for Component-Level Audio Anti-Spoofing Countermeasures 2026-04-29
Condition-Invariant fMRI decoding of speech intelligibility with deep state space model 2026-04-29
Conditional Diffusion Models for Mental Health-Preserving Voice Conversion 2026-04-29
Confidence-Based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens 2026-04-29
Confidence-Guided Error Correction for Disordered Speech Recognition 2026-04-29
Connecting Layer-Wise Representation of Wavlm with Spectro-Temporal Modulation on Speaker Verification 2026-04-29
Constraint Optimized Multichannel Mixer-Limiter Design 2026-04-29
Constructing Composite Features for Interpretable Music-Tagging 2026-04-29
Content Anonymization for Privacy in Long-Form Audio 2026-04-29
Content Leakage in Librispeech and its Impact on the Privacy Evaluation of Speaker Anonymization 2026-04-29
Content-Preserving Speech Representation Learning Via Adaptive Segment-Level Alignment 2026-04-29
Context-Aware Dynamic Graph Learning for Multimodal Emotion Recognition with Missing Modalities 2026-04-29
Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction 2026-04-29
Continuation Method for Feedback Delay Network Modal Decomposition 2026-04-29
Continuous-Token Diffusion for Speaker-Referenced TTS in Multimodal LLMs 2026-04-29
Contrastive Timbre Representations for Musical Instrument And Synthesizer Retrieval 2026-04-29
Controllable Embedding Transformation for Mood-Guided Music Retrieval 2026-04-29
Cooperative Multi-Agent Reinforcement Learning for Adaptive Aggregation in Semi-Supervised Federated Learning with non-IID Data 2026-04-29
CosyAccent: Duration-Controllable Accent Normalization using Source-Synthesis Training Data 2026-04-29
Coupling Acoustic Geometry and Visual Semantics for Robust Depth Estimation 2026-04-29
CoVA: Text-Guided Composed Video Retrieval for Audio-Visual Content 2026-04-29
Cross-Architecture Knowledge Distillation of WavLM for Lightweight Speaker Verification 2026-04-29
Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music 2026-04-29
Cross-Domain Contrastive Learning with Dynamic Threshold Calibration for Source Speaker Tracing 2026-04-29
Cross-Lingual Alzheimer’s Disease Detection with Multimodal LLMs via Speech Cue-Augmented Prompting and Instruction Tuning 2026-04-29
Cross-Lingual F5-TTS: Towards Language-Agnostic Voice Cloning and Speech Synthesis 2026-04-29
Cross-Lingual Interleaving for Speech Language Models 2026-04-29
Cross-Linguistic Rhythmic and Spectral Feature-Based Analysis of Nyishi and Adi: Two Under-Resourced Languages of Arunachal Pradesh 2026-04-29
Cross-Modal Bottleneck Fusion for Noise Robust Audio-Visual Speech Recognition 2026-04-29
Cross-Modal Knowledge Distillation for Speech Large Language Models 2026-04-29
CTC-DID: CTC-Based Arabic Dialect Identification for Streaming Applications 2026-04-29
Curriculum Learning with Contrastive Loss for Lightweight Speaker Verification 2026-04-29
Cutscene Agent: An LLM Agent Framework for Automated 3D Cutscene Generation 2026-04-29
D3PIA: A Discrete Denoising Diffusion Model for Piano Accompaniment Generation from Lead Sheet 2026-04-29
DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis 2026-04-29
DAMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMS 2026-04-29
DAT-CFTNet: Speech Enhancement for Cochlear Implant Recipients using Attention-based Dual-Path Recurrent Neural Network 2026-04-29
DBFT-SD: Weakly Supervised Multimodal Detection of Sensitive Audio-Visual Content 2026-04-29
DDSC: Dynamic Dual-Signal Curriculum for Data-Efficient Acoustic Scene Classification Under Domain Shift 2026-04-29
DDSR-Net: Robust Multimodal Sentiment Analysis via Dynamic Modality Reliability Assessment 2026-04-29
DECAF: Dynamic Envelope Context-Aware Fusion for Speech-Envelope Reconstruction from EEG 2026-04-29
Decoder-Only Conformer with Modality-Aware Sparse Mixtures of Experts for ASR 2026-04-29
Decorrelation-Enhanced Multiband Subband Adaptive Filtering for RIR Tracking in Sound Field Control 2026-04-29
Deep Dubbing: End-to-End Auto-Audiobook System with Text-to-Timbre and Context-Aware Instruct-TTS 2026-04-29
Deep Learning-Based Joint Optimization of Adaptive Feedback Cancellation and Residual Feedback Suppression for Hearing Aids 2026-04-29
Deep Spatial Clue Informed Ambisonic Encoding for Irregular Microphone Arrays 2026-04-29
Deepaq: A Perceptual Audio Quality Metric Based on Foundational Models and Weakly Supervised Learning 2026-04-29
Denoising Of Stochastic Ray Tracing Room Impulse Responses 2026-04-29
DepthTalk: Few-Shot Talking Head Generation with Depth-Aware 3D Gaussian Field Motion 2026-04-29
Detecting and Attributing Synthetic Spanish Speech: The HISPASpoof Dataset 2026-04-29
DGSDNet: Dual-Graph Spectral Diffusion Network for Incomplete Multimodal Emotion Recognition in Conversations 2026-04-29
Diff-vs: Efficient Audio-Aware Diffusion U-Net for Vocals Separation 2026-04-29
Diffemotalk: Audio-Driven Facial Animation with Fine-Grained Emotion Control via Diffusion Models 2026-04-29
Differentiable Grouped Feedback Delay Networks for Learning Direction and Position-Dependent Late Reverberation 2026-04-29
Differentiable Pulsetable Synthesis for Wind Instrument Modeling 2026-04-29
Diffusion Timbre Transfer via Mutual Information Guided Inpainting 2026-04-29
Direct Preference Optimization For Speech Autoregressive Diffusion Models 2026-04-29
Direct Simultaneous Translation Activation for Large Audio-Language Models 2026-04-29
Direct Transfer of Prosody in Speech-to-speech Translation using Disentangled Speech Tokens 2026-04-29
Directly Trained Spiking Neural Networks with Adaptive Phase Coding 2026-04-29
DisContSE: Single-Step Diffusion Speech Enhancement based on Joint Discrete and Continuous Embeddings 2026-04-29
Discrete Diffusion for Generative Modeling of Text-Aligned Speech Tokens 2026-04-29
Discrete-Continuous Fusion With Adaptive Hierarchical Features For Audio Deepfake Detection 2026-04-29
Disentangled Authenticity Representation for Partially Deepfake Audio Localization 2026-04-29
Disentangling Physiology from Fidelity: Latent-Guided Diffusion Models for Cross-Modal Cardiac Synthesis 2026-04-29
Dissecting Performance Degradation in Audio Source Separation under Sampling Frequency Mismatch 2026-04-29
DISSR: Disentangling Speech Representation for Degradation-Prior Guided Cross-Domain Speech Restoration 2026-04-29
Distilling Attention Knowledge for Speaker Verification 2026-04-29
Distributed Multichannel Active Noise Control with Asynchronous Communication 2026-04-29
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers 2026-04-29
DiTSinger: Scaling Singing Voice Synthesis with Diffusion Transformer and Implicit Alignment 2026-04-29
Diverse and Few-Step Audio Captioning via Flow Matching 2026-04-29
DMP-TTS: Disentangled Multi-Modal Prompting for Controllable Text-to-Speech with Chained Guidance 2026-04-29
Do Bias Benchmarks Generalise? Evidence from Voice-Based Evaluation of Gender Bias in Speechllms 2026-04-29
Do Foundational Audio Encoders Understand Music Structure? 2026-04-29
Do Speech LLMs Learn Crossmodal Embedding Spaces? 2026-04-29
Do We Need EMA for Diffusion-Based Speech Enhancement? Toward A Magnitude-Preserving Network Architecture 2026-04-29
Do we really need self-attention for streaming automatic speech recognition? 2026-04-29
Do You Hear What I Mean? Quantifying the Instruction-Perception GAP in Instruction-Guided Expressive Text-to-Speech Systems 2026-04-29
Does the Pre-Training of an Embedding Influence its Encoding of Age? 2026-04-29
DOMA: Leveraging Diffusion Language Models with Adaptive Prior for Intent Classification and Slot Filling 2026-04-29
Domain Partitioning Meets Parameter-Efficient Fine-Tuning: A Novel Method for Improved Language-Queried Audio Source Separation 2026-04-29
Domain-Aware Scheduling for ASR Fine-Tuning 2026-04-29
Domain-Invariant Representation Learning of Bird Sounds 2026-04-29
DPO-Regularized Regression for Age Prediction 2026-04-29
DPT-Net: Dual-Path Transformer Network with Hierarchical Fusion for EEG-based Envelope Reconstruction 2026-04-29
DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models 2026-04-29
DSRMS-TransUnet: A Decentralized Non-Shifted Transunet for Shallow Water Acoustic Source Range Estimation 2026-04-29
DSSR: Decoupling Salient and Subtle Representations Under Missing Modalities for Multimodal Emotion Recognition 2026-04-29
Dual Contrastive Learning for Semi-Supervised Domain Adaptation in Bi-Modal Depression Recognition 2026-04-29
Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting 2026-04-29
Dual-Perspective Multimodal Sentiment Analysis with MoE Fusion: Representation Learning via Semantic Resonance and Divergence 2026-04-29
Dual-Strategy-Enhanced Conbimamba for Neural Speaker Diarization 2026-04-29
Dynamic Balanced Cross-Modal Attention with Gated Sequence Restoration: Towards Robust Multimodal Sentiment Analysis 2026-04-29
Dynamic Noise-Aware Multi Lora Framework Towards Real-World Audio Deepfake Detection 2026-04-29
Dynamic Spectrogram Analysis with Local-Aware Graph Networks for Audio Anti-Spoofing 2026-04-29
Dynamically Slimmable Speech Enhancement Network with Metric-Guided Training 2026-04-29
E2E-AEC: Implementing An End-To-End Neural Network Learning Approach for Acoustic Echo Cancellation 2026-04-29
Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems 2026-04-29
ECHO: Frequency-Aware Hierarchical Encoding for Variable-Length Signals 2026-04-29
EchoFake: A Replay-Aware Dataset For Practical Speech Deepfake Detection 2026-04-29
EchoRAG: A Two-Stage Framework for Audio-Text Retrieval and Temporal Grounding 2026-04-29
ECSA: Dual-Branch Emotion Compensation for Emotion-Consistent Speaker Anonymization 2026-04-29
EdgeSpot: Efficient and High-Performance Few-Shot Model for Keyword Spotting 2026-04-29
EEG and Eye-Tracking Driven Dynamic Target Speaker Extraction with Spontaneous Attention Switching 2026-04-29
EEND-SAA: Enrollment-Less Main Speaker Voice Activity Detection Using Self-Attention Attractors 2026-04-29
Efficient Audio-Visual Inference Via Token Clustering And Modality Fusion 2026-04-29
Efficient Depression Detection from Speech via Language-Independent Prompt-Driven Reprogramming 2026-04-29
Efficient Solutions for Mitigating Initialization Bias in Unsupervised Self-Adaptive Auditory Attention Decoding 2026-04-29
EMG-to-Speech with Fewer Channels 2026-04-29
Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling 2026-04-29
Emo-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition 2026-04-29
EMORL-TTS: Reinforcement Learning for Fine-Grained Emotion Control in LLM-based TTS 2026-04-29
EmoShift: Lightweight Activation Steering for Enhanced Emotion-Aware Speech Synthesis 2026-04-29
Emotion-Aligned Generation in Diffusion Text to Speech Models Via Preference-Guided Optimization 2026-04-29
Emotional Damage: Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations 2026-04-29
Emotional Dimension Control in Language Model-Based Text-To-Speech: Spanning a Broad Spectrum of Human Emotions 2026-04-29
EmoTri-RL: Emotion- and Cause-Aware Reinforcement Learning for Multi-Modal Empathetic Dialogue 2026-04-29
Empowering Multimodal Respiratory Sound Classification with Counterfactual Adversarial Debiasing for Out-of-Distribution Robustness 2026-04-29
Enabling Multi-Species Bird Classification on Low-Power Bioacoustic Loggers 2026-04-29
Encoding Emotion Through Self-Supervised Eye Movement Reconstruction 2026-04-29
Enhanced Generative Machine Listener 2026-04-29
Enhancing Audio Question-Answering Performance Through Log-Likelihood Guided Reward Functions 2026-04-29
Enhancing Automatic Drum Transcription with Online Dynamic Few-Shot Learning 2026-04-29
Enhancing Dialogue-Related Speech Tasks with Generated Spoken Dialogues 2026-04-29
Enhancing Noise Robustness for Neural Speech Codecs Through Resource-Efficient Progressive Quantization Perturbation Simulation 2026-04-29
Enhancing Speaker Verification with w2v-BERT 2.0 and Knowledge Distillation Guided Structured Pruning 2026-04-29
Enhancing Speech Intelligibility Prediction for Hearing Aids with Complementary Speech Foundation Model Representations 2026-04-29
Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec 2026-04-29
Equipping Large Language Model with Directional Speech Understanding Capabilities 2026-04-29
Erasing Your Voice Before it’s Heard: Training-Free Speaker Unlearning for Zero-Shot Text-to-Speech 2026-04-29
Estimating Hand-Related Features from Speech Using Machine Learning 2026-04-29
Estimating Respiratory Effort from Nocturnal Breathing Sounds for Obstructive Sleep Apnoea Screening 2026-04-29
Etude: Piano Cover Generation with a Three-Stage Approach — Extract, Structuralize, and Decode 2026-04-29
EuleroDec: A Complex-Valued RVQ-VAE for Efficient and Robust Audio Coding 2026-04-29
Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations 2026-04-29
Evaluating Compositional Structure in Audio Representations 2026-04-29
Evaluating Disentangled Representations for Controllable Music Generation 2026-04-29
Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech 2026-04-29
Evaluating High-Resolution Piano Sustain Pedal Depth Estimation with Musically Informed Metrics 2026-04-29
Evaluating Pretrained Speech Embedding Systems for Dysarthria Detection Across Heterogenous Datasets 2026-04-29
Event Classification by Physics-Informed Inpainting for Distributed Multichannel Acoustic Sensor with Partially Degraded Channels 2026-04-29
Exploring Fine-Tuning Of Large Audio Language Models For Spoken Language Understanding Under Limited Speech Data 2026-04-29
Exploring How Audio Effects Alter Emotion with Foundation Models 2026-04-29
Exploring Resolution-Wise Shared Attention in Hybrid Mamba-U-Nets for Improved Cross-Corpus Speech Enhancement 2026-04-29
Exploring SSL Discrete Tokens for Multilingual Automatic Speech Recognition 2026-04-29
Expressive Voice Conversion with Controllable Emotional Intensity 2026-04-29
Exterior Sound Field Estimation Based on Physics-Constrained Kernel 2026-04-29
FAC-FACodec: Controllable Zero-Shot Foreign Accent Conversion with Factorized Speech Codec 2026-04-29
Face-Voice Association with Inductive Bias for Maximum Class Separation 2026-04-29
Fake Speech Wild: Detecting Deepfake Speech on Social Media Platform 2026-04-29
Fast-ULCNet: A Fast and Ultra Low Complexity Network for Single-Channel Speech Enhancement 2026-04-29
FastAV: Efficient Token Pruning for Audio-Visual Large Language Model Inference 2026-04-29
FastEnhancer: Speed-Optimized Streaming Neural Speech Enhancement 2026-04-29
FD-ARL: Feature Disentanglement with Adversarial-Reconstruction Learning for Cross-Subject Auditory Attention Decoding 2026-04-29
FDCNet: Frequency Domain Channel Attention and Convolution for Lipreading 2026-04-29
FED-PISA: Federated Voice Cloning Via Personalized Identity-Style Adaptation 2026-04-29
Feedback-Driven Retrieval-Augmented Audio Generation with Large Audio Language Models 2026-04-29
Few-Shot Recognition of Audio Deepfake Generators using Graph-Based Prototype Adaptation 2026-04-29
FIDIC:Fine-Grained Conversational Emotion Recognition via Individual Differences in Inertia and Contagion 2026-04-29
Fine-Grained Frame Modeling in Multi-Head Self-Attention for Speech Deepfake Detection 2026-04-29
Fine-Tuning Bigvgan-V2 for Robust Musical Tuning Preservation 2026-04-29
Fine-Tuning Large Audio-Language Models with Lora for Precise Temporal Localization of Prolonged Exposure Therapy Elements 2026-04-29
Fine-Tuning Large Multimodal Models for Automatic Pronunciation Assessment 2026-04-29
FinHuBERT: Hierarchical Feature Imitating Networks for Low-Resource Speech Recognition 2026-04-29
FlashFoley: Fast Interactive Sketch2audio Generation 2026-04-29
Flexi-LoRA with Input-Adaptive Ranks: Efficient Finetuning for Speech and Reasoning Tasks 2026-04-29
Flexio: Flexible Single- and Multi-Channel Speech Separation and Enhancement 2026-04-29
FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning 2026-04-29
FOCA: Multimodal Malware Classification via Hyperbolic Cross-Attention 2026-04-29
FocalCodec-Stream: Streaming Low-Bitrate Speech Coding via Causal Distillation 2026-04-29
FODGE : High-Fidelity Dance Generation via Full-Body Optimization 2026-04-29
FoleyBench: A Benchmark for Video-to-Audio Models 2026-04-29
Forward Convolutive Prediction for Frame Online Monaural Speech Dereverberation based on Kronecker Product Decomposition 2026-04-29
Frame-Stacked Local Transformers for Efficient Multi-Codebook Speech Generation 2026-04-29
Frequency-Independent Ambisonics Upscaling Using Deep Learning 2026-04-29
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-Modal Understanding in Multimodal LLMS 2026-04-29
From Diet to Free Lunch: Estimating Auxiliary Signal Properties Using Dynamic Pruning Masks in Speech Enhancement Networks 2026-04-29
From Hallucination to Articulation: Language Model-Driven Losses for Ultra Low-Bitrate Neural Speech Coding 2026-04-29
From Human Speech to Ocean Signals: Transferring Speech Large Models for Underwater Acoustic Target Recognition 2026-04-29
Frontend Token Enhancement for Token-Based Speech Recognition 2026-04-29
Full Band Denoising of Room Impulse Response in the Wavelet Domain with Dictionary Learning 2026-04-29
FUN-SSL: Full-Band Layer Followed by U-Net With Narrow-Band Layers for Multiple Moving Sound Source Localization 2026-04-29
FUSEMOS: Perceptual Evaluation of Text-to-Music Generation with Dual-Encoder Fusion and Ranking-Aware Composite Loss 2026-04-29
Fusion of Multimodal Estimations by Extended State Hidden Markov Model: Application to Fetal Heart Rate Monitoring 2026-04-29
FxSearcher: Gradient-Free Text-Driven Audio Transformation 2026-04-29
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models 2026-04-29
Gdiffuse: Diffusion-Based Speech Enhancement with Noise Model Guidance 2026-04-29
Gelina: Unified Speech and Gesture Synthesis Via Interleaved Token Prediction 2026-04-29
Gen-SER: When the Generative Model Meets Speech Emotion Recognition 2026-04-29
Generalizability of Predictive and Generative Speech Enhancement Models to Pathological Speakers 2026-04-29
Generating Localized Audible Zones Using a Single-Channel Parametric Loudspeaker 2026-04-29
Generating Moving 3d Soundscapes with Latent Diffusion Models 2026-04-29
Generative Audio Extension and Morphing 2026-04-29
Generative UI as an Accessibility Bridge: Lessons from C2C E-Commerce 2026-04-29
GLA-GRAD++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis 2026-04-29
GLAP: General Contrastive Audio-Text Pretraining Across Domains and Languages 2026-04-29
GLoRIA: Gated Low-Rank Interpretable Adaptation for Dialectal ASR 2026-04-29
GLUE: Gradient-free Learning to Unify Experts 2026-04-29
GMS-CAVP: Improving Audio-Video Correspondence with Multi-Scale Constrative and Generative Pretraining 2026-04-29
Graph-Based Emotion Consensus Perception Learning for Multimodal Emotion Recognition in Conversation 2026-04-29
Graph-based Modality Alignment for Robustness in Conversational Emotion Recognition 2026-04-29
Graph-Biased EEG Transformers for Silent Speech Decoding 2026-04-29
Grey-Box Prompt Tuning With Graph Alignment for Speech-Language Models 2026-04-29
GRNet: Graph Reconstruction Network for Robust Multimodal Sentiment Analysis 2026-04-29
Group Relative Policy Optimization for Text-to-Speech with Large Language Models 2026-04-29
Group-Sparse Gaussian Process Regression for Inhomogeneous Sound Field Estimation 2026-04-29
H-nnPBFDAF: Hierarchical Neural Network Partitioned Block Frequency Domain Adaptive Filter with Novel Block Activation Probability 2026-04-29
Hair Noise Analysis and Mitigation for Smart Glasses Audio Captures 2026-04-29
Hanui: Harnessing Distributional Discrepancies for Singing Voice Deepfake Detection 2026-04-29
HarmoNet: Music Grounding by Short Video via Harmonic Resample and Dynamic Sparse Alignment 2026-04-29
Hashing-Baseline: Rethinking Hashing in the Age of Pretrained Models 2026-04-29
HAVT-IVD: Heterogeneity-Aware Cross-Modal Network for Audio-Visual Surveillance: Idling Vehicles Detection with Multichannel Audio and Multiscale Visual Cues 2026-04-29
HCGAN: Harmonic-Coupled Generative Adversarial Network for Speech Super-Resolution in Low-Bandwidth Scenarios 2026-04-29
HD-PPT: Hierarchical Decoding of Content- and Prompt-Preference Tokens for Instruction-Based TTS 2026-04-29
HergNet: A Fast Neural Surrogate Model for Sound Field Predictions Via Superposition of Plane Waves 2026-04-29
HFSQVAE: Hierarchical Vector Quantization with Residuals for Frequency-Specific Embedding 2026-04-29
Hierarchical Activity Recognition and Captioning from Long-Form Audio 2026-04-29
Hierarchical Discrete Flow Matching For Multi-Codebook Codec-Based Text-To-Speech 2026-04-29
Hierarchical Tokenization of Multimodal Music Data for Generative Music Retrieval 2026-04-29
HiFi-HARP: A High-Fidelity 7th-Order Ambisonic Room Impulse Response Dataset 2026-04-29
High-Fidelity Speech Enhancement Via Discrete Audio Tokens 2026-04-29
How Far Do SSL Speech Models Listen for Tone? Temporal Focus of Tone Representation under Low-Resource Transfer 2026-04-29
How to Label Resynthesized Audio: The Dual Role of Neural Audio Codecs in Audio Deepfake Detection 2026-04-29
Huí Sù: Co-constructing a Dual Feedback Apparatus 2026-04-29
Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations 2026-04-29
HVAC-EAR: Eavesdropping Human Speech Using HVAC Systems 2026-04-29
Hybrid Pruning: In-Situ Compression of Self-Supervised Speech Models for Speaker Verification and Anti-Spoofing 2026-04-29
HyFlowSE: Hybrid End-To-End Flow-Matching Speech Enhancement via Generative-Discriminative Learning 2026-04-29
I-DCCRN-VAE: An Improved Deep Representation Learning Framework for Complex VAE-Based Single-Channel Speech Enhancement 2026-04-29
IBPCodec : A Low-Bitrate Lightweight Speech Codec With Inter-Band Prediction 2026-04-29
ICASSP 2026 - 主动噪声控制 论文列表 2026-04-29
ICASSP 2026 - 主动降噪 论文列表 2026-04-29
ICASSP 2026 - 主题建模 论文列表 2026-04-29
ICASSP 2026 - 信号处理 论文列表 2026-04-29
ICASSP 2026 - 关键词检测 论文列表 2026-04-29
ICASSP 2026 - 医疗AI 论文列表 2026-04-29
ICASSP 2026 - 听觉注意力解码 论文列表 2026-04-29
ICASSP 2026 - 听觉注意解码 论文列表 2026-04-29
ICASSP 2026 - 噪声控制 论文列表 2026-04-29
ICASSP 2026 - 回声消除 论文列表 2026-04-29
ICASSP 2026 - 基准测试 论文列表 2026-04-29
ICASSP 2026 - 基频估计 论文列表 2026-04-29
ICASSP 2026 - 声场估计 论文列表 2026-04-29
ICASSP 2026 - 声学建模 论文列表 2026-04-29
ICASSP 2026 - 声源定位 论文列表 2026-04-29
ICASSP 2026 - 多模态学习 论文列表 2026-04-29
ICASSP 2026 - 多模态对话意图识别 论文列表 2026-04-29
ICASSP 2026 - 多模态情感分析 论文列表 2026-04-29
ICASSP 2026 - 多模态情感识别 论文列表 2026-04-29
ICASSP 2026 - 多模态模型 论文列表 2026-04-29
ICASSP 2026 - 多通道 论文列表 2026-04-29
ICASSP 2026 - 多音高估计 #音符跟踪 论文列表 2026-04-29
ICASSP 2026 - 实体消歧 论文列表 2026-04-29
ICASSP 2026 - 实时处理 论文列表 2026-04-29
ICASSP 2026 - 对抗样本 论文列表 2026-04-29
ICASSP 2026 - 异常声音检测 论文列表 2026-04-29
ICASSP 2026 - 情感分析 论文列表 2026-04-29
ICASSP 2026 - 情感识别 论文列表 2026-04-29
ICASSP 2026 - 房间脉冲响应 论文列表 2026-04-29
ICASSP 2026 - 房间脉冲响应去噪 论文列表 2026-04-29
ICASSP 2026 - 数据集 论文列表 2026-04-29
ICASSP 2026 - 数据集对齐 论文列表 2026-04-29
ICASSP 2026 - 槽填充 论文列表 2026-04-29
ICASSP 2026 - 模型评估 论文列表 2026-04-29
ICASSP 2026 - 歌唱旋律提取 论文列表 2026-04-29
ICASSP 2026 - 歌唱语音合成 论文列表 2026-04-29
ICASSP 2026 - 歌唱语音转录 论文列表 2026-04-29
ICASSP 2026 - 歌唱语音转换 论文列表 2026-04-29
ICASSP 2026 - 水下声学目标识别 论文列表 2026-04-29
ICASSP 2026 - 生物声学 论文列表 2026-04-29
ICASSP 2026 - 目标说话人提取 论文列表 2026-04-29
ICASSP 2026 - 神经解码 论文列表 2026-04-29
ICASSP 2026 - 空间音频 论文列表 2026-04-29
ICASSP 2026 - 联邦学习 论文列表 2026-04-29
ICASSP 2026 - 脑信号编码 论文列表 2026-04-29
ICASSP 2026 - 脑机接口 论文列表 2026-04-29
ICASSP 2026 - 舞蹈生成 论文列表 2026-04-29
ICASSP 2026 - 视觉语音识别 论文列表 2026-04-29
ICASSP 2026 - 视频到音频生成 论文列表 2026-04-29
ICASSP 2026 - 视频检索 论文列表 2026-04-29
ICASSP 2026 - 视频片段检索 论文列表 2026-04-29
ICASSP 2026 - 视频理解 论文列表 2026-04-29
ICASSP 2026 - 视频生成 论文列表 2026-04-29
ICASSP 2026 - 视频设备识别 论文列表 2026-04-29
ICASSP 2026 - 视频问答 论文列表 2026-04-29
ICASSP 2026 - 视频高光检测 论文列表 2026-04-29
ICASSP 2026 - 语音伪造检测 论文列表 2026-04-29
ICASSP 2026 - 语音克隆 论文列表 2026-04-29
ICASSP 2026 - 语音分离 论文列表 2026-04-29
ICASSP 2026 - 语音匿名化 论文列表 2026-04-29
ICASSP 2026 - 语音发现 论文列表 2026-04-29
ICASSP 2026 - 语音合成 论文列表 2026-04-29
ICASSP 2026 - 语音增强 #对抗防御 论文列表 2026-04-29
ICASSP 2026 - 语音增强 论文列表 2026-04-29
ICASSP 2026 - 语音大模型 论文列表 2026-04-29
ICASSP 2026 - 语音对话系统 论文列表 2026-04-29
ICASSP 2026 - 语音情感识别 论文列表 2026-04-29
ICASSP 2026 - 语音摘要 论文列表 2026-04-29
ICASSP 2026 - 语音活动检测 论文列表 2026-04-29
ICASSP 2026 - 语音理解 论文列表 2026-04-29
ICASSP 2026 - 语音生成 论文列表 2026-04-29
ICASSP 2026 - 语音生物标志物 论文列表 2026-04-29
ICASSP 2026 - 语音编码 论文列表 2026-04-29
ICASSP 2026 - 语音编码器 论文列表 2026-04-29
ICASSP 2026 - 语音翻译 论文列表 2026-04-29
ICASSP 2026 - 语音表示学习 论文列表 2026-04-29
ICASSP 2026 - 语音解码 论文列表 2026-04-29
ICASSP 2026 - 语音评估 论文列表 2026-04-29
ICASSP 2026 - 语音识别 #语音合成 论文列表 2026-04-29
ICASSP 2026 - 语音识别 #语音翻译 论文列表 2026-04-29
ICASSP 2026 - 语音识别 论文列表 2026-04-29
ICASSP 2026 - 语音质量评估 论文列表 2026-04-29
ICASSP 2026 - 语音转换 #语音增强 论文列表 2026-04-29
ICASSP 2026 - 语音转换 论文列表 2026-04-29
ICASSP 2026 - 语音问答 论文列表 2026-04-29
ICASSP 2026 - 语音驱动动作生成 论文列表 2026-04-29
ICASSP 2026 - 说话人分离 论文列表 2026-04-29
ICASSP 2026 - 说话人合成 论文列表 2026-04-29
ICASSP 2026 - 说话人日志 #语音分离 论文列表 2026-04-29
ICASSP 2026 - 说话人日志 论文列表 2026-04-29
ICASSP 2026 - 说话人检测 论文列表 2026-04-29
ICASSP 2026 - 说话人生成 论文列表 2026-04-29
ICASSP 2026 - 说话人脸生成 论文列表 2026-04-29
ICASSP 2026 - 说话人识别 论文列表 2026-04-29
ICASSP 2026 - 说话人验证 论文列表 2026-04-29
ICASSP 2026 - 课堂阶段分割 论文列表 2026-04-29
ICASSP 2026 - 跨模态 论文列表 2026-04-29
ICASSP 2026 - 跨模态检索 论文列表 2026-04-29
ICASSP 2026 - 轻度认知障碍检测 论文列表 2026-04-29
ICASSP 2026 - 迁移学习 论文列表 2026-04-29
ICASSP 2026 - 零样本关键词检测 论文列表 2026-04-29
ICASSP 2026 - 音乐信息检索 论文列表 2026-04-29
ICASSP 2026 - 音乐分离 论文列表 2026-04-29
ICASSP 2026 - 音乐分类 论文列表 2026-04-29
ICASSP 2026 - 音乐推荐 论文列表 2026-04-29
ICASSP 2026 - 音乐检索 论文列表 2026-04-29
ICASSP 2026 - 音乐混合 论文列表 2026-04-29
ICASSP 2026 - 音乐源分离 论文列表 2026-04-29
ICASSP 2026 - 音乐源提取 论文列表 2026-04-29
ICASSP 2026 - 音乐理解 论文列表 2026-04-29
ICASSP 2026 - 音乐生成 论文列表 2026-04-29
ICASSP 2026 - 音乐转录 论文列表 2026-04-29
ICASSP 2026 - 音视频 论文列表 2026-04-29
ICASSP 2026 - 音视频实例分割 论文列表 2026-04-29
ICASSP 2026 - 音频事件检测 论文列表 2026-04-29
ICASSP 2026 - 音频信号处理 论文列表 2026-04-29
ICASSP 2026 - 音频分离 论文列表 2026-04-29
ICASSP 2026 - 音频分类 #零样本学习 论文列表 2026-04-29
ICASSP 2026 - 音频分类 论文列表 2026-04-29
ICASSP 2026 - 音频压缩 论文列表 2026-04-29
ICASSP 2026 - 音频场景分类 论文列表 2026-04-29
ICASSP 2026 - 音频场景理解 论文列表 2026-04-29
ICASSP 2026 - 音频增强 论文列表 2026-04-29
ICASSP 2026 - 音频大模型 论文列表 2026-04-29
ICASSP 2026 - 音频字幕生成 论文列表 2026-04-29
ICASSP 2026 - 音频安全 论文列表 2026-04-29
ICASSP 2026 - 音频描述 论文列表 2026-04-29
ICASSP 2026 - 音频效果估计 论文列表 2026-04-29
ICASSP 2026 - 音频无损编码 论文列表 2026-04-29
ICASSP 2026 - 音频检索 #音频分类 论文列表 2026-04-29
ICASSP 2026 - 音频检索 论文列表 2026-04-29
ICASSP 2026 - 音频水印 论文列表 2026-04-29
ICASSP 2026 - 音频深度伪造检测 论文列表 2026-04-29
ICASSP 2026 - 音频生成 论文列表 2026-04-29
ICASSP 2026 - 音频编辑 论文列表 2026-04-29
ICASSP 2026 - 音频质量评估 论文列表 2026-04-29
ICASSP 2026 - 音频超分辨率 论文列表 2026-04-29
ICASSP 2026 - 音频问答 论文列表 2026-04-29
ICASSP 2026 - 预训练 论文列表 2026-04-29
ICASSP 2026 - 领域适应 论文列表 2026-04-29
ICASSP 2026 语音/音频论文详细分析 2026-04-29
Identifying Birdsong Syllables without Labelled Data 2026-04-29
Identifying the Minimal and Maximal Phonetic Subspace of Speech Representations 2026-04-29
Identity Leakage Through Accent Cues in Voice Anonymisation 2026-04-29
Impact of Phonetics on Speaker Identity in Adversarial Voice Attack 2026-04-29
Improving Active Learning for Melody Estimation by Disentangling Uncertainties 2026-04-29
Improving Anomalous Sound Detection with Attribute-Aware Representation from Domain-Adaptive Pre-Training 2026-04-29
Improving Audio Event Recognition with Consistency Regularization 2026-04-29
Improving Audio Question Answering with Variational Inference 2026-04-29
Improving Automatic Speech Recognition by Mitigating Distortions Introduced by Speech Enhancement Under Drone Noise 2026-04-29
Improving Binaural Distance Estimation in Reverberant Rooms Through Contrastive And Multi-Task Learning 2026-04-29
Improving Contextual Asr Via Multi-Grained Fusion With Large Language Models 2026-04-29
Improving Interpretability in Generative Multitimbral DDSP Frameworks via Semantically-Disentangled Musical Attributes 2026-04-29
Improving Multimodal Brain Encoding Model with Dynamic Subject-Awareness Routing 2026-04-29
Improving the Speaker Anonymization Evaluation’s Robustness to Target Speakers with Adversarial Learning 2026-04-29
In-Sync: Adaptation of Speech Aware Large Language Models for ASR with Word level timestamp predictions 2026-04-29
InconVAD: A Two-Stage Dual-Tower Framework for Multimodal Emotion Inconsistency Detection 2026-04-29
Incremental Learning for Audio Classification with Hebbian Deep Neural Networks 2026-04-29
Independent-Component-Based Encoding Models of Brain Activity During Story Comprehension 2026-04-29
Individualize the HRTF Neural Field Using Anthropometric Parameters Weighted by Direction-Attention 2026-04-29
Influence of Clean Speech Characteristics on Speech Enhancement Performance 2026-04-29
Influence-Aware Curation and Active Selection for Industrial and Surveillance Sound Events 2026-04-29
Input-Adaptive Differentiable Filterbanks via Hypernetworks for Robust Speech Processing 2026-04-29
InstructAudio: Unified Speech and Music Generation with Natural Language Instruction 2026-04-29
Instrument Generation Through Distributional Flow Matching and Test-Time Search 2026-04-29
Int-MeanFlow: Few-Step Speech Generation with Integral Velocity Distillation 2026-04-29
Integrating Speaker Embeddings and LLM-Derived Semantic Representations for Streaming Speaker Diarization 2026-04-29
Inter-Dialog Contrastive Learning for Multimodal Emotion Recognition in Conversations 2026-04-29
Interpretable Music Harmonic Analysis Through Multilinear Mixture of Experts 2026-04-29
Interval-Aware Retrieval Framework For Speech-Based Automatic Alzheimer’s Detection 2026-04-29
Inverse-Hessian Regularization for Continual Learning in ASR 2026-04-29
Investigating Modality Contribution in Audio LLMs for Music 2026-04-29
Investigating The Effect Of Sentence-Level Syntactic Structure On Information Loss In The Human Auditory System 2026-04-29
Is Phase Really Needed for Weakly-Supervised Dereverberation? 2026-04-29
It Is Personal: The Importance of Personalization for Recognizing Self-Reported Emotion 2026-04-29
Joint Autoregressive Modeling of Multi-Talker Overlapped Speech Recognition and Translation 2026-04-29
Joint Deep Secondary Path Estimation and Adaptive Control for Active Noise Cancellation 2026-04-29
Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-Task Multi-Scale Network 2026-04-29
Joint Estimation of Primary and Secondary Paths for Personalized Hearable Applications 2026-04-29
Joint Multichannel Acoustic Feedback Cancellation and Speaker Extraction via Kalman Filter and Deep Non-Linear Spatial Filter 2026-04-29
K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function 2026-04-29
KAN We Make Models Simpler for Audio Deepfake Detection with Kolmogorov–Arnold Networks? 2026-04-29
Keeping Models Listening: Segment- and time-aware attention rescaling at decoding time 2026-04-29
Korean aegyo speech shows systematic F1 increase to signal childlike qualities 2026-04-29
KSDIFF: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation 2026-04-29
LAFUFU: Latent Acoustic Features For Ultra-Fast Utterance Restoration 2026-04-29
LAMB: LLM-Based Audio Captioning with Modality Gap Bridging Via Cauchy-Schwarz Divergence 2026-04-29
Language-Infused Retrieval-Augmented CTC with Adaptive Soft-Hard Gating for Robust Code-Switching ASR 2026-04-29
Lattice-Guided Consistency Regularization of Dual-Mode Transducers for Automatic Speech Recognition 2026-04-29
Learnable Mel-Frontend for Robust Underwater Acoustic Target Detection under Non-Target Interference 2026-04-29
Learning Domain-Robust Bioacoustic Representations for Mosquito Species Classification with Contrastive Learning and Distribution Alignment 2026-04-29
Learning Linearity in Audio Consistency Autoencoders via Implicit Regularization 2026-04-29
Learning Piezoelectric Hysteresis in In-Ear MEMS Loudspeakers from Acoustic Measurements 2026-04-29
Learning to Align with Unbalanced Optimal Transport in Linguistic Knowledge Transfer for ASR 2026-04-29
Learning Vocal-Tract Area And Radiation With A Physics-Informed Webster Model 2026-04-29
Learning What to Hear: Boosting Sound-Source Association for Robust Audiovisual Instance Segmentation 2026-04-29
LenslessMic: Audio Encryption and Authentication via Lensless Computational Imaging 2026-04-29
LESS: Large Language Model Enhanced Semi-Supervised Learning for Speech Foundational Models Using in-the-wild Data 2026-04-29
LETPAV: Lexicon-Enhanced Text with Progressive Audio-Visual Fusion for Multimodal Sentiment Analysis 2026-04-29
Leveraging Audio-Visual Data to Reduce the Multilingual Gap in Self-Supervised Speech Models 2026-04-29
Leveraging Diffusion U-Net Features for Predominant Instrument Recognition 2026-04-29
Leveraging Large Multimodal Models for Audio-Video Deepfake Detection: A Pilot Study 2026-04-29
Leveraging Large Speech Language Models as Evaluators for Expressive Speech 2026-04-29
Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners 2026-04-29
Leveraging prediction entropy for Automatic prompt weighting in Zero-Shot Audio-Language Classification 2026-04-29
Leveraging Segment-Level Speech Representations for LLM-Based Speech Recognition 2026-04-29
Leveraging Text-to-Speech and Voice Conversion as Data Augmentation for Alzheimer’s Disease Detection from Spontaneous Speech 2026-04-29
Leveraging Whisper Embeddings For Audio-Based Lyrics Matching 2026-04-29
Lightweight and Generalizable Acoustic Scene Representations Via Contrastive Fine-Tuning and Distillation 2026-04-29
Lightweight and Perceptually-Guided Voice Conversion for Electro-Laryngeal Speech 2026-04-29
Lightweight Implicit Neural Network for Binaural Audio Synthesis 2026-04-29
Lightweight Phoneme-Conditioned Bandwidth Extension for Body-Conducted Speech 2026-04-29
Lingometer: On-Device Personal Speech Word Counting System 2026-04-29
Linguard: Authenticating Speech Recordings Using Speech Recognition and Watermark 2026-04-29
LipsAM: Lipschitz-Continuous Amplitude Modifier for Audio Signal Processing and its Application to Plug-And-Play Dereverberation 2026-04-29
Lisa: Lightweight Yet Superb Neural Speech Coding 2026-04-29
Listen, But Don’t Leak: Sensitive Data Protection for Privacy Aware Automatic Speech Recognition with Acoustic Triggers 2026-04-29
LLAC: Learned Lossless Audio Codec 2026-04-29
LLM-Based Post-ASR Error Correction for Disordered Speech 2026-04-29
Localizing Speech Deepfakes Beyond Transitions via Segment-Aware Learning 2026-04-29
LongSpeech: A Scalable Benchmark for Transcription, Translation and Understanding in Long Speech 2026-04-29
Look, Listen and Segment: Towards Weakly Supervised Audio-Visual Semantic Segmentation 2026-04-29
Loose Coupling of Spectral and Spatial Models for Multi-Channel Diarization and Enhancement of Meetings in Dynamic Environments 2026-04-29
LOTUSDIS: A Thai Far-Field Meeting Corpus for Robust Conversational ASR 2026-04-29
Low-Bandwidth High-Fidelity Speech Transmission with Generative Latent Joint Source-Channel Coding 2026-04-29
Low-Frequency Harmonic Control for Speech Intelligibility in Open-Ear Headphones 2026-04-29
Low-Latency Audio Front-End Region-of-Interest Beamforming for Smart Glasses 2026-04-29
Low-Resource Guidance for Controllable Latent Audio Diffusion 2026-04-29
Low-Resource Speech-Based Early Alzheimers Detection via Cross-Lingual and Few-Shot Transfer Learning 2026-04-29
LP-CFM: Perceptual Invariance-Aware Conditional Flow Matching for Speech Modeling 2026-04-29
MAG: Multi-Modal Aligned Autoregressive Co-Speech Gesture Generation Without Vector Quantization 2026-04-29
MAGE: A Coarse-to-Fine Speech Enhancer with Masked Generative Model 2026-04-29
Malefa: Multi-Granularity Learning and Effective False Alarm Suppression for Zero-Shot Keyword Spotting 2026-04-29
Mambaformer: State-Space Augmented Self-Attention with Downup Sampling for Monaural Speech Enhancement 2026-04-29
Marco-Voice: A Unified Framework for Expressive Speech Synthesis with Voice Cloning 2026-04-29
MaskVCT: Masked Voice Codec Transformer for Zero-Shot Voice Conversion with Increased Controllability via Multiple Guidances 2026-04-29
Matching Reverberant Speech Through Learned Acoustic Embeddings 2026-04-29
Matrix-Structured Hierarchical Convolutional Modeling for Pronunciation Assessment and Mispronunciation Detection 2026-04-29
Maximum Likelihood Measurement Noise Estimation for Block-Time Domain Kalman Filters 2026-04-29
MC-MRX: Reference- and Midi-Guided Music Source Extraction with Contrastive Learning 2026-04-29
MCF: Text LLMS for Multimodal Emotional Causality 2026-04-29
MCI-OTFusion: A Multimodal Model for MCI Detection and Cognitive Score Prediction 2026-04-29
Meanflow-Accelerated Multimodal Video-to-Audio Synthesis Via One-Step Generation 2026-04-29
MeanFlowSE: One-Step Generative Speech Enhancement via Conditional Mean Flow 2026-04-29
MeanSE: Efficient Generative Speech Enhancement with Mean Flows 2026-04-29
MeanVC: Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows 2026-04-29
MeanVoiceFlow: One-Step Nonparallel Voice Conversion with Mean Flows 2026-04-29
Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration 2026-04-29
MECap-R1: Emotion-Aware Policy with Reinforcement Learning for Multimodal Emotion Captioning 2026-04-29
Medical ASR Enhancement by Domain-Specific Reinforcement Fine-Tuning 2026-04-29
MELA-TTS: Joint Transformer-Diffusion Model with Representation Alignment for Speech Synthesis 2026-04-29
Melos: Sentence-To-Section Training with Multi-Task Learning for LLM-Driven Song Generation 2026-04-29
Membership Inference Attack against Music Diffusion Models via Generative Manifold Perturbation 2026-04-29
MFF-RVRDI: Multimodal Fusion Framework for Robust Video Recording Device Identification 2026-04-29
MI-Fuse: Label Fusion for Unsupervised Domain Adaptation with Closed-Source Large Audio-Language Model 2026-04-29
Microphone-Less Measurement of Three-Dimensional Radiating Impulse Response of Sound Source using Spherical Harmonic-Domain Acousto-Optic Tomography 2026-04-29
MIDI-LLaMA: An Instruction-Following Multimodal LLM for Symbolic Music Understanding 2026-04-29
Mind the Shift: Using Delta SSL Embeddings to Enhance Child ASR 2026-04-29
Mind Your [m]S, Cross Your [t]S: a Large-Scale Phonetic Analysis of Speech Reproduction in Modern Speech Generators 2026-04-29
MirrorTalk: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control 2026-04-29
Mispronunciation Detection and Diagnosis Without Model Training: A Retrieval-Based Approach 2026-04-29
Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs 2026-04-29
Mitigating Data Replication in Text-to-Audio Generative Diffusion Models Through Anti-Memorization Guidance 2026-04-29
Mitigating Intra-Speaker Variability in Diarization with Style-Controllable Speech Augmentation 2026-04-29
Mitigating Language Prior-Induced Hallucinations via Bi-Level Contrastive Decoding 2026-04-29
Mitigating Shared-Private Branch Imbalance via Dual-Branch Rebalancing for Multimodal Sentiment Analysis 2026-04-29
Mix2Morph: Learning Sound Morphing from Noisy Mixes 2026-04-29
MixGAN-based Non-blind Bandwidth Extension for Audio Codec 2026-04-29
Mixture of Experts for Recognizing Depression from Interview and Reading Tasks 2026-04-29
Mixture To Beamformed Mixture: Leveraging Beamformed Mixture As Weak-Supervision for Speech Enhancement and Noise-Robust ASR 2026-04-29
Mixture-of-Experts Based Soft-Label Learning for Multi-Label Speech Emotion Recognition 2026-04-29
Mixture-of-Experts Framework for Field-of-View Enhanced Signal-Dependent Binauralization of Moving Talkers 2026-04-29
Mixtures of Lightweight Articulatory Experts for Multilingual Asr 2026-04-29
ML-SAN: Multi-Level Speaker-Adaptive Network for Emotion Recognition in Conversations 2026-04-29
MMAudioSep: Taming Video-to-Audio Generative Model Towards Video/Text-Queried Sound Separation 2026-04-29
MMEB-V3: Measuring the Performance Gaps of Omni-Modality Embedding Models 2026-04-29
MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech 2026-04-29
Modeling Both Intra- And Inter-Utterance Variability for Conversational Emotion Recognition 2026-04-29
Modeling Inter-Segment Relationships in Speech for Dementia Detection with Audio Spectrogram Transformers and Graph Attention Networks 2026-04-29
Modeling Strategies For Speech Enhancement in The Latent Space of a Neural Audio Codec 2026-04-29
Monitoring exposure-length variations in submarine power cables using distributed fiber-optic sensing 2026-04-29
More Than a Shortcut: A Hyperbolic Approach to Early-Exit Networks 2026-04-29
Motionbeat: Motion-Aligned Music Representation via Embodied Contrastive Learning and Bar-Equivariant Contact-Aware Encoding 2026-04-29
MR-FlowDPO: Multi-Reward Direct Preference Optimization for Flow-Matching Text-to-Music Generation 2026-04-29
MSANET: Multi-Scale Semantic Aggregation Network for Brain-Assisted Speech Enhancement in Multi-Speaker Conditions 2026-04-29
MSCT: Differential Cross-Modal Attention for Deepfake Detection 2026-04-29
MSF-SER: Enriching Acoustic Modeling with Multi-Granularity Semantics for Speech Emotion Recognition 2026-04-29
MT-HuBERT: Self-Supervised Mix-Training for Few-Shot Keyword Spotting in Mixed Speech 2026-04-29
MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-Token Prediction 2026-04-29
Multi-Channel Speech Enhancement for Cocktail Party Speech Emotion Recognition 2026-04-29
Multi-Layer Attentive Probing Improves Transfer of Audio Representations for Bioacoustics 2026-04-29
Multi-Scale Physiologically-Motivated Alignment for Auditory Attention Decoding 2026-04-29
Multi-Task Learning For Speech Quality Assessment Using ASR-Derived Entropy Features 2026-04-29
Multi-Task Transformer for Explainable Speech Deepfake Detection via Formant Modeling 2026-04-29
Multi-View Hierarchical Hypergraph Neural Network for Automatic Stuttering Detection 2026-04-29
Multilingual Supervised Pretraining with Lm-Assisted Decoding for Visual Speech Recognition 2026-04-29
Multimodal Co-Training with Subtractive Unlabeled-Benefit Bounds 2026-04-29
Multimodal Fusion-Based IPCLIP Network for Mixed Reality Surgical Assistance 2026-04-29
Multimodal LLMs as Expert Speech Annotators: Acoustic Macro-Descriptors for Parkinson’s Detection 2026-04-29
Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching 2026-04-29
Multimodal Self-Attention Network with Temporal Alignment for Audio-Visual Emotion Recognition 2026-04-29
Multimodal Transformer with Multiperspective Training for Predicting Self-Expression Skills from Video Interview 2026-04-29
Multimodal Variational Graph Network for Multimodal Sentiment Analysis 2026-04-29
MuseTok: Symbolic Music Tokenization for Generation and Semantic Understanding 2026-04-29
Musicdetr: A Position-Aware Spectral Note Detection Model for Singing Transcription 2026-04-29
MusiCRS: Benchmarking Audio-Centric Conversational Recommendation 2026-04-29
Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation 2026-04-29
Natural Language to Spatial Audio Parameters: Lightweight Deterministic Rendering for Creative Authoring 2026-04-29
NCF-TTS: Enhancing Flow Matching Based Text-To-Speech with Neighborhood Consistency Flow 2026-04-29
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence 2026-04-29
Neural Network-Based Time-Frequency-Bin-Wise Linear Combination of Beamformers for Underdetermined Target Source Extraction 2026-04-29
Neuromamba: Adaptive Frequency Filtering with a Pyramid Mamba for sEEG-driven Speech Synthesis 2026-04-29
NeuroSIFT: A Biologically-Inspired Framework with Explicit Signal-Noise Separation for Robust Multimodal Emotion Recognition 2026-04-29
nGPT as a Scalable Architecture for Speech Recognition and Translation 2026-04-29
No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS 2026-04-29
Noise-Robust AV-ASR Using Visual Features both in the Whisper Encoder and Decoder 2026-04-29
Noise-Robust Contrastive Learning with an MFCC-Conformer for Coronary Artery Disease Detection 2026-04-29
Noise-to-Notes: Diffusion-Based Generation and Refinement for Automatic Drum Transcription 2026-04-29
Non-Line-of-Sight Vehicle Detection via Audio-Visual Fusion 2026-04-29
Obstructive Sleep Apnea Endotype Prediction During Wakefulness Using Voice Biomarkers 2026-04-29
Off-The-Grid Multi-Pitch Estimation Using Optimal Transport 2026-04-29
OMNI-AVSR: Towards Unified Multimodal Speech Recognition With Large Language Models 2026-04-29
On deepfake voice detection - It’s all in the presentation 2026-04-29
On The Design of Efficient Neural Methods for Geometry-Agnostic Multichannel Speech Enhancement 2026-04-29
On the Design of Higher-Order Time-Intensity Microphone Arrays for Panoramic Audio Recording and Reproduction 2026-04-29
One Model–Three Tasks: Discovering a Shared Winning Ticket for Low-Complexity Audio Intelligence 2026-04-29
Online Register For Dual-Mode Self-Supervised Speech Models: Mitigating the Lack of Future Context 2026-04-29
Optimizing Domain-Adaptive Self-Supervised Learning for Clinical Voice-Based Disease Classification 2026-04-29
Optimizing Speech Language Models for Acoustic Consistency 2026-04-29
OV-INSTRUCTTTS: Towards Open-Vocabulary Instruct Text-to-Speech 2026-04-29
PAC: Pronunciation-Aware Contextualized Large Language Model-Based Automatic Speech Recognition 2026-04-29
PADAM: Perceptual Audio Defect Assessment Model 2026-04-29
ParaGSE: Parallel Generative Speech Enhancement with Group-Vector-Quantization-Based Neural Speech Codec 2026-04-29
Parametric Neural Amp Modeling with Active Learning 2026-04-29
PC-MCL: Patient-Consistent Multi-Cycle Learning with Multi-Label Bias Correction for Respiratory Sound Classification 2026-04-29
Peeking Into the Future for Contextual Biasing 2026-04-29
Perceptual Loss Optimized HRTF Personalization in Spherical Harmonic Domain 2026-04-29
Perceptual Quality Assessment for Stylized Talking Heads 2026-04-29
PerformSinger: Multimodal Singing Voice Synthesis Leveraging Synchronized Lip Cues from Singing Performance Videos 2026-04-29
Personal Sound Zones with Flexible Bright Zone Control 2026-04-29
PersonaPlex: Voice and Role Control for Full Duplex Conversational Speech Models 2026-04-29
PFluxTTS: Hybrid Flow-Matching TTS with Robust Cross-Lingual Voice Cloning and Inference-Time Model Fusion 2026-04-29
PG-SE: Predictive Acceleration and Correction for Generative Speech Enhancement 2026-04-29
Phase-Retrieval-Based Physics-Informed Neural Networks For Acoustic Magnitude Field Reconstruction 2026-04-29
Phase-Space Signal Processing of Acoustic Data for Advanced Manufacturing In-Situ Monitoring 2026-04-29
PhoenixDSR: Phoneme-Guided and LLM-Enhanced Dysarthric Speech Recognition 2026-04-29
Phoneme-Level Visual Speech Recognition via Point-Visual Fusion and Language Model Reconstruction 2026-04-29
Phonological Tokenizer: Prosody-Aware Phonetic Token Via Multi-Objective Fine-Tuning with Differentiable K-Means 2026-04-29
Phrased: Phrase Dictionary Biasing for Speech Translation 2026-04-29
Physics-Informed Neural Networks for Ocean Acoustic Field Reconstruction and Source Localization 2026-04-29
Pianoroll-Event: A Novel Score Representation for Symbolic Music 2026-04-29
PICOAUDIO2: Temporal Controllable Text-to-Audio Generation with Natural Language Description 2026-04-29
Plug-and-Play Emotion Graphs for Compositional Prompting in Zero-Shot Speech Emotion Recognition 2026-04-29
Poly-SVC: Polyphony-Aware Singing Voice Conversion with Harmonic Modeling 2026-04-29
Polynomial Mixing for Efficient Self-Supervised Speech Encoders 2026-04-29
Position-Invariant Fine-Tuning Of Speech Enhancement Models With Self-Supervised Speech Representations 2026-04-29
Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost 2026-04-29
Principled Coarse-Grained Acceptance For Speculative Decoding In Speech 2026-04-29
PRoADS: Provably Secure And Robust Audio Diffusion Steganography With Latent Optimization And Backward Euler Inversion 2026-04-29
Probing the Hidden Talent of ASR foundation models for L2 English Oral Assessment 2026-04-29
Probing Whisper for Dysarthric Speech in Detection and Assessment 2026-04-29
Production-Scale Dynamic Vocabulary ASR Biasing with Word-Level FST and Robust Training 2026-04-29
Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR 2026-04-29
Prompt-Guided Mixture-of-Experts for Robust Multimodal Sentiment Analysis with Missing Modalities 2026-04-29
PromptSep: Generative Audio Separation Via Multimodal Prompting 2026-04-29
Prosody-Guided Harmonic Attention for Phase-Coherent Neural Vocoding in the Complex Spectrum 2026-04-29
PROST-LLM: Progressively Enhancing the Speech-to-Speech Translation Capability in LLMs 2026-04-29
Prototype-Guided Cross-Modal Contrastive Learning for Continual Audio-Visual Sound Separation 2026-04-29
PRSA: Preventing Malicious Speaker Recognition and Speech Synthesis Simultaneously with Adversarial Examples 2026-04-29
PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech 2026-04-29
PSTalker: Realistic 3D Talking Head Synthesis via a Semantic-Aware Audio-Driven Point-Based Shape 2026-04-29
Purification Before Fusion: Toward Mask-Free Speech Enhancement for Robust Audio-Visual Speech Recognition 2026-04-29
Qastanet: A DNN-Based Quality Metric for Spatial Audio 2026-04-29
QE-XVC: Zero-Shot Cross-Lingual Voice Conversion via Query-Enhancement and Conditional Flow Matching 2026-04-29
QFOCUS: Controllable Synthesis for Automated Speech Stress Editing to Deliver Human-Like Emphatic Intent 2026-04-29
Quality Assessment of Noisy and Enhanced Speech with Limited Data: UWB-NTIS System for Voicemos 2024 2026-04-29
Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis 2026-04-29
Random Matrix-Driven Graph Representation Learning For Bioacoustic Recognition 2026-04-29
Ranking The Impact of Contextual Specialization in Neural Speech Enhancement 2026-04-29
RAP: Real-Time Audio-Driven Portrait Animation with Video Diffusion Transformer 2026-04-29
RAS: a Reliability Oriented Metric for Automatic Speech Recognition 2026-04-29
RASD-SR: A Robust Anomalous Sound Detection Framework with Score Recalibration 2026-04-29
Rationale-Guided Learning for Multimodal Emotion Recognition 2026-04-29
RCAL: Reinforced Cross-Modal Alignment for Multimodal Sentiment Analysis with Sparse Visual Frames 2026-04-29
Reading Between the Waves: Robust Topic Segmentation Using Inter-Sentence Audio Features 2026-04-29
Real-Time Streaming MEL Vocoding with Generative Flow Matching 2026-04-29
Reasoning Driven Captions to Assist Noise Robust Speech Emotion Recognition 2026-04-29
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer 2026-04-29
Reconstruction of Spherical Sound Source Radiation Characteristics with Graph Signal Processing 2026-04-29
Recovering Performance in Speech Emotion Recognition from Discrete Tokens Via Multi-Layer Fusion and Paralinguistic Feature Integration 2026-04-29
Reducing Prompt Sensitivity in LLM-Based Speech Recognition Through Learnable Projection 2026-04-29
Reference Microphone Selection for Guided Source Separation Based on The Normalized L-P Norm 2026-04-29
Reference-Aware SFM Layers for Intrusive Intelligibility Prediction 2026-04-29
Refgen: Reference-Guided Synthetic Data Generation for Anomalous Sound Detection 2026-04-29
Regularized Inverse Filter Design for Rigid Spherical Microphone Array Processing: Laplace- And Time-Domain Representations 2026-04-29
Relative Time Intervals Representation For Word-Level Timestamping With Masked Training 2026-04-29
Reliable AI via Age-Balanced Validation: Fair Model Selection for Parkinson’s Detection from Voice 2026-04-29
Representation-Based Data Quality Audits for Audio 2026-04-29
Representation-Diverse Self-Supervision for Cross-Domain Bioacoustic Learning in Low-Resource Settings 2026-04-29
Residual Tokens Enhance Masked Autoencoders for Speech Modeling 2026-04-29
Respire-Mamba C-UNet: Consistency-Trained Autoencoder for High-Fidelity Respiratory Sound Compression 2026-04-29
Rethinking Entity Disambiguation in Complex Modalities 2026-04-29
Rethinking Music Captioning with Music Metadata LLMS 2026-04-29
Retrieval-Based Speculative Decoding For Autoregressive Speech Synthesis 2026-04-29
Revisiting Direct Speech-to-Text Translation with Speech LLMS: Better Scaling than Cot Prompting? 2026-04-29
RFM-Editing: Rectified Flow Matching for Text-Guided Audio Editing 2026-04-29
RHO-PERFECT: Correlation Ceiling for Subjective Evaluation Datasets 2026-04-29
RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses 2026-04-29
RLBR: Reinforcement Learning with Biasing Rewards for Contextual Speech Large Language Models 2026-04-29
RMODGDF: A Robust STFT-Derived Feature for Musical Instrument Recognition 2026-04-29
Robust Accent Identification via Voice Conversion and Non-Timbral Embeddings 2026-04-29
Robust and Lightweight F0 Estimation Through Mid-Level Fusion of DSP-Informed Features 2026-04-29
Robust Deepfake Audio Detection via Multi-Level Intermediate Feature Fusion 2026-04-29
Robust Online Overdetermined Independent Vector Analysis Based on Bilinear Decomposition 2026-04-29
RoCo: Robust Code for Fast and Effective Proactive Defense against Voice Cloning Attack 2026-04-29
RRPO: Robust Reward Policy Optimization for LLM-Based Emotional TTS 2026-04-29
S-PRESSO: Ultra Low Bitrate Sound Effect Compression with Diffusion Autoencoders and Offline Quantization 2026-04-29
S-SONDO: Self-Supervised Knowledge Distillation for General Audio Foundation Models 2026-04-29
S2Voice: Style-Aware Autoregressive Modeling with Enhanced Conditioning for Singing Style Conversion 2026-04-29
SA-SSL-MOS: Self-Supervised Learning MOS Prediction with Spectral Augmentation for Generalized Multi-Rate Speech Assessment 2026-04-29
SAASDNet: An EEG-Based Streaming Auditory Attention Switch Decoding Network for Self-Initiated Attention Switching in Mixed Speech 2026-04-29
SAGA-SR: Semantically and Acoustically Guided Audio Super-Resolution 2026-04-29
Salad-VAE: Semantic Audio Compression with Language-Audio Distillation 2026-04-29
Sampling-Rate-Agnostic Speech Super-Resolution Based on Gaussian Process Dynamical Systems with Deep Kernel Learning 2026-04-29
SAUNA: Song-Level Audio & User-Listening Data Neural Alignment 2026-04-29
Savgbench: Benchmarking Spatially Aligned Audio-Video Generation 2026-04-29
Scalable Evaluation for Audio Identification Via Synthetic Latent Fingerprint Generation 2026-04-29
Scaling Ambiguity: Augmenting Human Annotation in Speech Emotion Recognition with Audio-Language Models 2026-04-29
Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams 2026-04-29
Scaling Spoken Language Models with Syllabic Speech Tokenization 2026-04-29
SceneRAG: Scene-Level Retrieval-Augmented Generation for Video Understanding 2026-04-29
SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper 2026-04-29
Secondary Source Placement for Sound Field Control Based on Ising Model 2026-04-29
SED: Structural Entropy Based Speech Discretization for Discrete Token-Based ASR 2026-04-29
Segmentwise Pruning in Audio-Language Models 2026-04-29
SELD-MOHA: A Fine-Tuning Method with the Mixture of Heterogeneous Adapters for Sound Event Localization and Detection 2026-04-29
Selective Hub Fusion with Modality-Heterogeneous Experts for Multimodal Emotion Recognition 2026-04-29
Self-Supervised Note Tracking and Multi-Pitch Estimation Via Reconstruction-Based Learning 2026-04-29
Semantic Anchor Transfer from Short to Long Speech in a Distillation-Based Summarization Framework 2026-04-29
Semantic-Guided Pseudo-Feature Attention Network for Audio-Visual Zero-Shot Learning 2026-04-29
SEP-ST: Incorporating Speech Entity Prompt Into Large Language Models for Speech Translation 2026-04-29
Separate this, and all of these Things Around It: Music Source Separation Via Hyperellipsoidal Queries 2026-04-29
Sequence-Level Unsupervised Training in Speech Recognition: A Theoretical Study 2026-04-29
Sequential and Simultaneous Optimization of Microphone Array Geometry and Region-of-Interest Beamforming 2026-04-29
Session-Level Spoken Language Assessment with A Multimodal Foundation Model Via Multi-Target Learning 2026-04-29
SFM-TTS: Lightweight and Rapid Speech Synthesis with Flexible Shortcut Flow Matching 2026-04-29
Shared Representation Learning for Reference-Guided Targeted Sound Detection 2026-04-29
Shortcut Flow Matching for Speech Enhancement: Step-Invariant Flows via Single Stage Training 2026-04-29
Sidon: Fast and Robust Open-Source Multilingual Speech Restoration for Large-Scale Dataset Cleansing 2026-04-29
SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models 2026-04-29
Sing What You Fit: A Perception-Based Dataset and Benchmark for Vocal-Song Suitability Analysis 2026-04-29
Sing2Song: An Accompaniment Generation System Based on Solo Singing 2026-04-29
Single-Microphone Audio Point Source Discriminative Localization from Reverberation Late Tail Estimation 2026-04-29
Single-Step Controllable Music Bandwidth extension with Flow Matching 2026-04-29
SingMOS-Pro: An Comprehensive Benchmark For Singing Quality Assessment 2026-04-29
SIREN: Spatially-Informed Reconstruction of Binaural Audio with Vision 2026-04-29
SIRUP: A Diffusion-Based Virtual Upmixer of Steering Vectors for Highly-Directive Spatialization with First-Order Ambisonics 2026-04-29
SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training 2026-04-29
SLM-SS: Speech Language Model for Generative Speech Separation 2026-04-29
SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models 2026-04-29
Slot Filling as a Reasoning Task for Speechllms 2026-04-29
SmoothCLAP: Soft-Target Enhanced Contrastive Language-Audio Pretraining for Affective Computing 2026-04-29
Snore Sound Classification Based on Physiological Features and Adaptive Loss Function 2026-04-29
Solving the Helmholtz Equation Via Physics-Informed Neural Networks with an Adaptive Weighting Strategy 2026-04-29
SONAR: Self-Distilled Continual Pre-Training for Domain Adaptive Audio Representation 2026-04-29
SoundCompass: Navigating Target Sound Extraction with Effective Directional Clue Integration in Complex Acoustic Scenes 2026-04-29
Sounding Highlights: Dual-Pathway Audio Encoders for Audio-Visual Video Highlight Detection 2026-04-29
Sounds that Shape: Audio-Driven 3D Mesh Generation with Attribute-Decoupled Score Distillation Sampling 2026-04-29
Source Separation For A Cappella Music 2026-04-29
SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level 2026-04-29
SPADE: Structured Pruning and Adaptive Distillation for Efficient LLM-TTS 2026-04-29
SPAM: Style Prompt Adherence Metric for Prompt-Based TTS 2026-04-29
Sparse Autoencoders Make Audio Foundation Models More Explainable 2026-04-29
Sparse-View Visual-Acoustic Latent Learning for Novel-View Audio Synthesis 2026-04-29
Spatial Covariance Matrix Reconstruction for Speech Enhancement in Reverberant Multi-Source Environments 2026-04-29
Spatial-CLAP: Learning Spatially-Aware Audio–Text Embeddings for Multi-Source Conditions 2026-04-29
Spatially Aware Self-Supervised Models for Multi-Channel Neural Speaker Diarization 2026-04-29
SpatialNet-Echo: Real-Time Acoustic Echo Cancellation via Integrated Narrow-Band and Cross-Band Processing 2026-04-29
Speaker Anonymisation for Speech-Based Suicide Risk Detection 2026-04-29
Speaking Clearly: A Simplified Whisper-Based Codec for Low-Bitrate Speech Coding 2026-04-29
Spectral or Spatial? Leveraging Both for Speaker Extraction in Challenging Data Conditions 2026-04-29
Spectrogram Event Based Feature Representation for Generalizable Automatic Music Transcription 2026-04-29
Speech Emotion Recognition based on Hierarchical Transformer with Shifted Windows 2026-04-29
Speech Quality-Based Localization of Low-Quality Speech and Text-to-Speech Synthesis Artefacts 2026-04-29
SpeechCT-CLIP: Distilling Text-Image Knowledge to Speech for Voice-Native Multimodal CT Analysis 2026-04-29
SpeechMapper: Speech-To-Text Embedding Projector for LLMs 2026-04-29
Spike-Driven Low-Power Speech Bandwidth Extension 2026-04-29
Spiking Attention Network: A Hybrid Neuromorphic Approach to Underwater Acoustic Localization and Zero-Shot Adaptation 2026-04-29
Spiking Temporal-Enhanced Network for Zero-Shot Audio-Visual Learning 2026-04-29
Spring Reverb Emulation with Hybrid Gated Convolutional Networks and State Space Models 2026-04-29
SSVD-O: Parameter-Efficient Fine-Tuning with Structured SVD for Speech Recognition 2026-04-29
ST-HNTM: Joint Speech-Text Neural Topic Modeling on the Hypersphere 2026-04-29
STACodec: Semantic Token Assignment for Balancing Acoustic Fidelity and Semantic Information in Audio Codecs 2026-04-29
Staged Diffusion with Hybrid Mixture-of-Experts (MOE) for Multimodal Sentiment Analysis 2026-04-29
Stemphonic: All-At-Once Flexible Multi-Stem Music Generation 2026-04-29
Step-Audio-R1.5 Technical Report 2026-04-29
StereoFoley: Object-Aware Stereo Audio Generation from Video 2026-04-29
Stereophonic Acoustic Echo Cancellation Using an Improved Affine Projection Algorithm with Adaptive Multiple Sub-Filters 2026-04-29
Still Thinking or Stopped Talking? Dialogue Silence Intention Classification Using Multimodal Large Language Model 2026-04-29
Str-DiffSep: Streamable Diffusion Model for Speech Separation 2026-04-29
Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization Via Neural Audio Codec and Language Models 2026-04-29
Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization 2026-04-29
Streamingbench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding 2026-04-29
StreamMark: A Deep Learning-Based Semi-Fragile Audio Watermarking for Proactive Deepfake Detection 2026-04-29
Stress Prediction from Temporal Emotion Trajectories in Clinical Patient-Physician Conversations 2026-04-29
Structure-Aware Diffusion Schrödinger Bridge 2026-04-29
StyHarmo: Efficient Style-Specific Video Generation with Music Synchronization 2026-04-29
Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent 2026-04-29
Style-Disentangled Diffusion for Controllable and Identity-Generalized Speech-Driven Body Motion Generation 2026-04-29
StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control 2026-04-29
StylePitcher: Generating Style-Following and Expressive Pitch Curves for Versatile Singing Tasks 2026-04-29
Subgraph Localization in the Subbands for Partially Spoofed Speech Detection 2026-04-29
Subsequence SDTW: Differentiable Alignment with Flexible Boundary Conditions 2026-04-29
Subspace Hybrid Adaptive Filtering for Phonocardiogram Signal Denoising 2026-04-29
Sunac: Source-Aware Unified Neural Audio Codec 2026-04-29
SURE: Synergistic Uncertainty-Aware Reasoning for Multimodal Emotion Recognition in Conversations 2026-04-29
SwitchCodec: Adaptive Residual-Expert Sparse Quantization for High-Fidelity Neural Audio Coding 2026-04-29
Symphony Rendering: Midi and Composer-Conditioned Auto Orchestration with Flow-Matching Transformers 2026-04-29
SymphonyGen: 3D Hierarchical Orchestral Generation with Controllable Harmony Skeleton 2026-04-29
SynaSpot: A Lightweight, Streaming Multi-modal Framework for Keyword Spotting with Audio-Text Synergy 2026-04-29
Synchronous Secondary Path Modeling and Kronecker-Factorized Adaptive Algorithm for Multichannel Active Noise Control 2026-04-29
Syncspeech: Efficient and Low-Latency Text-to-Speech Based on Temporal Masked Transformer 2026-04-29
SynParaSpeech: Automated Synthesis of Paralinguistic Datasets for Speech Generation and Understanding 2026-04-29
Synthcloner: Synthesizer-Style Audio Transfer via Factorized Codec with ADSR Envelope Control 2026-04-29
Synthesized Data Selection via Score Distribution Matching for Te Reo Māori Automatic Speech Recognition 2026-04-29
Synthetic Data Domain Adaptation for ASR via LLM-Based Text and Phonetic Respelling Augmentation 2026-04-29
Synthetic yet Striking? Assessing Vocal Charisma in TTS via Perceptual and Algorithmic Measures 2026-04-29
T-Cache: Fast Inference For Masked Generative Transformer-Based TTS Via Prompt-Aware Feature Caching 2026-04-29
T-Mimi: A Transformer-Based Mimi Decoder for Real-Time On-Phone TTS 2026-04-29
TAG: Structured Temporal Audio Generation via LLM-Guided Manual Scription and Control 2026-04-29
TAGARELA - A Portuguese Speech Dataset from Podcasts 2026-04-29
Taming Audio VAEs via Target-KL Regularization 2026-04-29
Target Speaker Anonymization in Multi-Speaker Recordings 2026-04-29
Target-Speaker LLM-ASR with Speaker-Aware Speech Encoder 2026-04-29
Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis 2026-04-29
Task-Oriented Sound Privacy Preservation for Sound Event Detection Via End-to-End Adversarial Multi-Task Learning 2026-04-29
TASU: Text-only Alignment for Speech Understanding 2026-04-29
TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics 2026-04-29
Teacher-Guided Pseudo Supervision and Cross-Modal Alignment for Audio-Visual Video Parsing 2026-04-29
Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-Wise Distillation 2026-04-29
Teaching the Teachers: Boosting Unsupervised Domain Adaptation In Speech Recognition By Ensemble Update 2026-04-29
Temporal Distillation for Music Representation Learning 2026-04-29
Temporal Graph Modeling for Speech Emotion Recognition Using LSTM-Aggregated Multigraph Networks 2026-04-29
Temporal-Spatial Decouple Before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis 2026-04-29
Temporally Heterogeneous Graph Contrastive Learning for Multimodal Acoustic Event Classification 2026-04-29
Test Time Adaptation for Speech Emotion Recognition 2026-04-29
Test-Time Scaling for Auditory Cognition in Audio Language Models 2026-04-29
Testing The Efficient Coding Hypothesis Beyond Humans: The Auditory Kernels of Bat Vocalizations 2026-04-29
Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment 2026-04-29
Text2Move: Text-To-Moving Sound Generation via Trajectory Prediction and Temporal Alignment 2026-04-29
TextlessRAG: End-to-End Visual Document RAG by Speech without Text 2026-04-29
The 3rd Clarity Prediction Challenge: A Machine Learning Challenge for Hearing aid Speech Intelligibility Prediction 2026-04-29
The Curious Case of Visual Grounding: Different Effects for Speech-and Text-Based Language Encoders 2026-04-29
The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures 2026-04-29
The Muse Benchmark: Probing Music Perception and Auditory Relational Reasoning in Audio LLMs 2026-04-29
The Role of Prosodic and Lexical Cues in Turn-Taking with Self-Supervised Speech Representations 2026-04-29
The Singing Voice Conversion Challenge 2025: From Singer Identity Conversion to Singing Style Conversion 2026-04-29
The Structured Output Benchmark: A Multi-Source Benchmark for Evaluating Structured Output Quality in Large Language Models 2026-04-29
The Synergistic Role of Audio and Large Video-Language Model in Source-Free Video Domain Adaptation 2026-04-29
Theory and Application of Circular Relative Harmonic Coefficients 2026-04-29
Thinking While Listening: Simple Test Time Scaling for Audio Classification 2026-04-29
Three Seconds is Sufficient: A Multi-Pronged Framework for Model-Based Speaker Adaptation in ASR Under Data-Scarce Conditions 2026-04-29
TICL: Text-Embedding KNN for Speech in-Context Learning Unlocks Speech Recognition Abilities of Large Multimodal Models 2026-04-29
Timbre-Aware Audio Difference Captioning for Anomalous Machine Sounds without Paired Training Data via Synthetic Perturbations 2026-04-29
Timbre-Based Pretraining with Pseudo-Labels for Multi-Instrument Automatic Music Transcription 2026-04-29
Time vs. Layer: Locating Predictive Cues for Dysarthric Speech Descriptors in Wav2vec 2.0 2026-04-29
Time-Domain Synthesis of Virtual Sound Source Within Personalized Sound Zone using a Linear Loudspeaker Array 2026-04-29
Time-Shifted Token Scheduling for Symbolic Music Generation 2026-04-29
TinyMU: A Compact Audio-Language Model for Music Understanding 2026-04-29
Tldiffgan: A Latent Diffusion-Gan Framework with Temporal Information Fusion for Anomalous Sound Detection 2026-04-29
TMD-TTS: A Unified Tibetan Multi-Dialect Text-to-Speech Framework for Ü-Tsang, Amdo and Kham Speech Dataset Generation 2026-04-29
Tokenchain: A Discrete Speech Chain via Semantic Token Modeling 2026-04-29
Toward Faithful Explanations in Acoustic Anomaly Detection 2026-04-29
Toward Robust And Efficient Beat Tracking Via Beat-Aware Attention 2026-04-29
Towards Blind Data Cleaning: A Case Study in Music Source Separation 2026-04-29
Towards Building Speech Large Language Models for Multitask Understanding in Low-Resource Languages 2026-04-29
Towards Data Drift Monitoring for Speech Deepfake Detection in the Context of MLOps 2026-04-29
Towards Distance-Aware Synthetic Audio Mixtures for Universal Sound Separation 2026-04-29
Towards Effective Negation Modeling in Joint Audio-Text Models for Music 2026-04-29
Towards Evaluating Generative Audio: Insights from Neural Audio Codec Embedding Distances 2026-04-29
Towards Fair ASR for Second Language Speakers using Fairness Prompted Finetuning 2026-04-29
Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments 2026-04-29
Towards Multi-View Hierarchical Video-to-Piano Generation with MIDI Guidance 2026-04-29
Towards Orthographically-Informed Evaluation of Speech Recognition Systems for Indian Languages 2026-04-29
Towards Real-Time Generative Speech Restoration with Flow-Matching 2026-04-29
Towards Robust Dysarthric Speech Recognition: LLM-Agent Post-ASR Correction Beyond WER 2026-04-29
Tpeformer: Temporal Patch Embedding Transformer 2026-04-29
Train Short, Infer Long: Speech-LLM Enables Zero-Shot Streamable Joint ASR and Diarization on Long Audio 2026-04-29
Training Dynamics-Aware Multi-Factor Curriculum Learning for Target Speaker Extraction 2026-04-29
Training Flow Matching Models with Reliable Labels via Self-Purification 2026-04-29
Training-Free Inference-Time Scaling for Audio Source Separation 2026-04-29
Training-Free Multimodal Guidance for Video to Audio Generation 2026-04-29
Transfer Learning for Paediatric Sleep Apnoea Detection using Physiology-Guided Acoustic Models 2026-04-29
Transferable Audio Lottery Tickets: Gradient Accumulation for Extreme Sparsity 2026-04-29
Tri-Attention Fusion: Joint Temporal-Spectral and Bidirectional Modeling for Speech Spoofing Detection 2026-04-29
Triad: Tri-Head with Auxiliary Duplicating Permutation Invariant Training for Multi-Task Sound Event Localization and Detection 2026-04-29
Triage Knowledge Distillation for Speaker Verification 2026-04-29
TTA: Transcribe, Translate and Alignment for Cross-Lingual Speech Representation 2026-04-29
TVP-UNet: Threshold Variance Penalty U-Net for Voice Activity Detection in Dysarthric Speech 2026-04-29
Two-Stage Language Model Framework for Acoustic Echo Cancellation 2026-04-29
UJCodec: An End-to-end Unet-Style Codec for Joint Speech Compression and Enhancement 2026-04-29
UMA-SPLIT: Unimodal Aggregation for Both English and Mandarin Non-Autoregressive Speech Recognition 2026-04-29
UMV: A Mixture-Of-Experts Vision Transformer with Multi-Spectrogram Fusion for Underwater Ship Noise Classification 2026-04-29
Uncertainty-Aware 3D Emotional Talking Face Synthesis with Emotion Prior Distillation 2026-04-29
Understanding Textual Capability Degradation in Speech LLMS via Parameter Importance Analysis 2026-04-29
Understanding the Strengths and Weaknesses of SSL Models for Audio Deepfake Model Attribution 2026-04-29
UNet-Based Fusion and Exponential Moving Average Adaptation for Noise-Robust Speaker Recognition 2026-04-29
Universr: Unified and Versatile Audio Super-Resolution Via Vocoder-Free Flow Matching 2026-04-29
UNMIXX: Untangling Highly Correlated Singing Voices Mixtures 2026-04-29
Unrequited Emotions: Investigating the Gaps in Motivation and Practice in Speech Emotion Recognition Research 2026-04-29
Unseen but Not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models 2026-04-29
Unsupervised Discovery and Analysis of the Vocal Repertoires and Patterns of Select Corvid Species 2026-04-29
Unsupervised Lexicon Learning from Speech is Limited by Representations Rather than Clustering 2026-04-29
USVexplorer: Robust Detection of Ultrasonic Vocalizations with Cross Species Generalization 2026-04-29
UTI-LLM: A Personalized Articulatory-Speech Therapy Assistance System Based on Multimodal Large Language Model 2026-04-29
Utilizing Information Theoretic Approach to Study Cochlear Neural Degeneration 2026-04-29
UVT-LM: Unifying Visual and Tactile Perception with Language Model 2026-04-29
V2A-DPO: Omni-Preference Optimization for Video-To-Audio Generation 2026-04-29
Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition 2026-04-29
VBx for End-to-End Neural and Clustering-Based Diarization 2026-04-29
VChangeCodec: An Ultra Low-Complexity Neural Speech Codec with Built-In Voice Changer for Customized Real-Time Communication 2026-04-29
Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation 2026-04-29
Vib2Sound: Separation Of Multimodal Sound Sources 2026-04-29
Vioptt: Violin Technique-Aware Transcription from Synthetic Data Augmentation 2026-04-29
Virtual Consistency for Audio Editing 2026-04-29
Visual Keys to Symphonies: Latent Diffusion for Multi-Scene Video-to-Music Generation 2026-04-29
ViTex: Visual Texture Control for Multi-Track Symbolic Music Generation via Discrete Diffusion Models 2026-04-29
VividTalker: A Modular Framework for Expressive 3D Talking Avatars with Controllable Gaze and Blink 2026-04-29
VM-UNSSOR: Unsupervised Neural Speech Separation Enhanced by Higher-SNR Virtual Microphone Arrays 2026-04-29
VMSP: Video-to-Music Generation with Two-Stage Alignment and Synthesis 2026-04-29
Vocalnet-M2: Advancing Low-Latency Spoken Language Modeling via Integrated Multi-Codebook Tokenization and Multi-Token Prediction 2026-04-29
Voting-Based Pitch Estimation with Temporal and Frequential Alignment and Correlation Aware Selection 2026-04-29
VoxMorph: Scalable Zero-Shot Voice Identity Morphing via Disentangled Embeddings 2026-04-29
VoXtream: Full-Stream Text-To-Speech With Extremely Low Latency 2026-04-29
VT-Heads: Voice Cloning and Talking Head Generation from Text Based on V-DiT 2026-04-29
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models 2026-04-29
WAV2LEV: Predicting Levenshtein Edit Operation Sequences For Fine-Grained Estimation of Automatic Speech Recognition Error 2026-04-29
Wave-Trainer-Fit: Neural Vocoder With Trainable Prior And Fixed-Point Iteration Towards High-Quality Speech Generation From SSL Features 2026-04-29
Wavenext 2: Convnext-Based Fast Neural Vocoders with Residual Denoising and Sub-Modeling for Gan And Diffusion Models 2026-04-29
WaveSP-Net: Learnable Wavelet-Domain Sparse Prompt Tuning for Speech Deepfake Detection 2026-04-29
WaveSpikeNet: A Wavelet-Spiking Fusion Architecture for Audio Classification on Edge Devices 2026-04-29
WavLink: Compact Audio–Text Embeddings with a Global Whisper Token 2026-04-29
What the student learns in knowledge distillation: A subspace view and evidence on Convolutional Recurrent Network 2026-04-29
When Audio Matters: A Lightweight, Hierarchical Fusion Model for Speech and Non-Verbal Emotion Recognition 2026-04-29
When Children Talk and Machines Listen: Toward an Interpretable Speech-Based Screener for Dutch Developmental Language Disorder 2026-04-29
When Noise Lowers the Loss: Rethinking Likelihood-Based Evaluation in Music Large Language Models 2026-04-29
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models 2026-04-29
When Voice Matters: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making 2026-04-29
Whisper-FEST: Single-Channel Far-Field Enhanced Speech-to-text without Parallel Data 2026-04-29
Whisper-MLA: Reducing GPU Memory Consumption of ASR Models Based on MHA2MLA Conversion 2026-04-29
Whisper-QF: Leveraging Dual Cross-Attention Q-Former for Speech Emotion Recognition With Multi-Task Learning 2026-04-29
Whisper: Courtside Edition - Enhancing ASR Performance through LLM-Driven Context Generation 2026-04-29
WhisperPipe: A Resource-Efficient Streaming Architecture for Real-Time Automatic Speech Recognition 2026-04-29
Why Do Speech Language Models Fail to Generate Semantically Coherent Outputs? A Modality Evolving Perspective 2026-04-29
Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-Resource Speech Recognition 2026-04-29
Z-Scores: A Metric for Linguistically Assessing Disfluency Removal 2026-04-29
ZK-VSA: Zero-Knowledge Verifiable Speaker Anonymization Leveraging Phase Vocoder with Time-Scale Modification 2026-04-29
ZSV2C-MLLM: Zero-Shot Visual Voice Cloning Via Multimodal Large Language Models 2026-04-29
β-AVSDNET: A Novel End-To-End Neural Network Architecture For Audio-Visual Speaker Diarization 2026-04-29
语音/音频论文速递 2026-04-29 2026-04-29
A Functorial Formulation of Neighborhood Aggregating Deep Learning 2026-04-28
All That Glitters Is Not Audio: Rethinking Text Priors and Audio Reliance in Audio-Language Evaluation 2026-04-28
An event-based sequence modeling approach to recognizing non-triad chords with oversegmentation minimization 2026-04-28
CineAGI: Character-Consistent Movie Creation through LLM-Orchestrated Multi-Modal Generation and Cross-Scene Integration 2026-04-28
Come Together: Analyzing Popular Songs Through Statistical Embeddings 2026-04-28
Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features 2026-04-28
Explainable AI in Speaker Recognition – Making Latent Representations Understandable 2026-04-28
Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation 2026-04-28
HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models 2026-04-28
Latent-Hysteresis Graph ODEs: Modeling Coupled Topology-Feature Evolution via Continuous Phase Transitions 2026-04-28
Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding 2026-04-28
MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control 2026-04-28
Meta-Ensemble Learning with Diverse Data Splits for Improved Respiratory Sound Classification 2026-04-28
Opening the Design Space: Two Years of Performance with Intelligent Musical Instruments 2026-04-28
Predictive Directional Selective Fixed-Filter Active Noise Control for Moving Sources via a Convolutional Recurrent Neural Network 2026-04-28
Psychologically-Grounded Graph Modeling for Interpretable Depression Detection 2026-04-28
RAS: a Reliability Oriented Metric for Automatic Speech Recognition 2026-04-28
Robust Audio-Text Retrieval via Cross-Modal Attention and Hybrid Loss 2026-04-28
RTCFake: Speech Deepfake Detection in Real-Time Communication 2026-04-28
Scaling Properties of Continuous Diffusion Spoken Language Models 2026-04-28
Spectro-Temporal Modulation Representation Framework for Human-Imitated Speech Detection 2026-04-28
Speech Enhancement Based on Drifting Models 2026-04-28
Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling 2026-04-28
TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis 2026-04-28
语音/音频论文速递 2026-04-28 2026-04-28
Advancing automatic speech recognition using feature fusion with self-supervised learning features: A case study on Fearless Steps Apollo corpus 2026-04-27
Audio Effect Estimation with DNN-Based Prediction and Search Algorithm 2026-04-27
Audio Video Verbal Analysis (AVVA) for Capturing Classroom Dialogues 2026-04-27
Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis 2026-04-27
DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models 2026-04-27
Earable Platform with Integrated Simultaneous EEG Sensing and Auditory Stimulation 2026-04-27
Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge 2026-04-27
Identifying and typifying demographic unfairness in phoneme-level embeddings of self-supervised speech recognition models 2026-04-27
Listening with Time: Precise Temporal Awareness for Long-Form Audio Understanding 2026-04-27
Spectrographic Portamento Gradient Analysis: A Quantitative Method for Historical Cello Recordings with Application to Beethoven’s Piano and Cello Sonatas, 1930–2012 2026-04-27
Transformer-Based Rhythm Quantization of Performance MIDI Using Beat Annotations 2026-04-27
TTS-PRISM: A Perceptual Reasoning and Interpretable Speech Model for Fine-Grained Diagnosis 2026-04-27
UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions 2026-04-27
语音/音频论文速递 2026-04-27 2026-04-27
MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control 2026-04-25
MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation 2026-04-25
语音/音频论文速递 2026-04-25 2026-04-25
“This Wasn’t Made for Me”: Recentering User Experience and Emotional Impact in the Evaluation of ASR Bias 2026-04-24
ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis 2026-04-24
AUDITA: A New Dataset to Audit Humans vs. AI Skill at Audio QA 2026-04-24
Beyond Rules: Towards Basso Continuo Personal Style Identification 2026-04-24
DiariZen Explained: A Tutorial for the Open Source State-of-the-Art Speaker Diarization Pipeline 2026-04-24
Dilated CNNs for Periodic Signal Processing: A Low-Complexity Approach 2026-04-24
Do LLM Decoders Listen Fairly? Benchmarking How Language Model Priors Shape Bias in Speech Recognition 2026-04-24
Evaluation of Automatic Speech Recognition Using Generative Large Language Models 2026-04-24
Full-Duplex Interaction in Spoken Dialogue Systems: A Comprehensive Study from the ICASSP 2026 HumDial Challenge 2026-04-24
Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech 2026-04-24
Low-Rank Adaptation Redux for Large Models 2026-04-24
MAGIC-TTS: Fine-Grained Controllable Speech Synthesis with Explicit Local Duration and Pause Control 2026-04-24
Materialistic RIR: Material Conditioned Realistic RIR Generation 2026-04-24
MER 2026: From Discriminative Emotion Recognition to Generative Emotion Understanding 2026-04-24
Misinformation Span Detection in Videos via Audio Transcripts 2026-04-24
Phonological Subspace Collapse Is Aetiology-Specific and Cross-Lingually Stable: Evidence from 3,374 Speakers 2026-04-24
Preferences of a Voice-First Nation: Large-Scale Pairwise Evaluation and Preference Analysis for TTS in Indian Languages 2026-04-24
Prosody as Supervision: Bridging the Non-Verbal–Verbal for Multilingual Speech Emotion Recognition 2026-04-24
Sema: Semantic Transport for Real-Time Multimodal Agents 2026-04-24
Time vs. Layer: Locating Predictive Cues for Dysarthric Speech Descriptors in wav2vec 2.0 2026-04-24
Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation 2026-04-24
语音/音频论文速递 2026-04-24 2026-04-24
Aligning Stuttered-Speech Research with End-User Needs: Scoping Review, Survey, and Guidelines 2026-04-23
ATIR: Towards Audio-Text Interleaved Contextual Retrieval 2026-04-23
Before the Mic: Physical-Layer Voiceprint Anonymization with Acoustic Metamaterials 2026-04-23
Centering Ecological Goals in Automated Identification of Individual Animals 2026-04-23
CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation 2026-04-23
Deep Hierarchical Knowledge Loss for Fault Intensity Diagnosis 2026-04-23
Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations 2026-04-23
Enhancing ASR Performance in the Medical Domain for Dravidian Languages 2026-04-23
Enhancing Speaker Verification with Whispered Speech via Post-Processing 2026-04-23
Environmental Sound Deepfake Detection Using Deep-Learning Framework 2026-04-23
Explicit Dropout: Deterministic Regularization for Transformer Architectures 2026-04-23
FastTurn: Unifying Acoustic and Streaming Semantic Cues for Low-Latency and Robust Turn Detection 2026-04-23
FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings 2026-04-23
Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages 2026-04-23
MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation 2026-04-23
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation 2026-04-23
ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music Intelligence 2026-04-23
Qwen3.5-Omni Technical Report 2026-04-23
Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization 2026-04-23
SAND: The Challenge on Speech Analysis for Neurodegenerative Disease Assessment 2026-04-23
Self-Noise Reduction for Capacitive Sensors via Photoelectric DC Servo: Application to Condenser Microphones 2026-04-23
SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation 2026-04-23
Tadabur: A Large-Scale Quran Audio Dataset 2026-04-23
Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation 2026-04-23
Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model 2026-04-23
Utterance-Level Methods for Identifying Reliable ASR-Output for Child Speech 2026-04-23
X-VC: Zero-shot Streaming Voice Conversion in Codec Space 2026-04-23
语音/音频论文速递 2026-04-23 2026-04-23
APRVOS: 1st Place Winner of 5th PVUW MeViS-Audio Track 2026-04-22
ATRIE: Adaptive Tuning for Robust Inference and Emotion in Persona-Driven Speech Synthesis 2026-04-22
Audio Spoof Detection with GaborNet 2026-04-22
BEAT: Tokenizing and Generating Symbolic Music by Uniform Temporal Steps 2026-04-22
Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs 2026-04-22
Comparison of sEMG Encoding Accuracy Across Speech Modes Using Articulatory and Phoneme Features 2026-04-22
Deep Supervised Contrastive Learning of Pitch Contours for Robust Pitch Accent Classification in Seoul Korean 2026-04-22
Detecting Hallucinations in SpeechLLMs at Inference Time Using Attention Maps 2026-04-22
Disentangling Damage from Operational Variability: A Label-Free Self-Supervised Representation Learning Framework for Output-Only Structural Damage Identification 2026-04-22
Environmental Sound Deepfake Detection Using Deep-Learning Framework 2026-04-22
HalluAudio: A Comprehensive Benchmark for Hallucination Detection in Large Audio-Language Models 2026-04-22
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation 2026-04-22
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models 2026-04-22
NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations 2026-04-22
Qwen3.5-Omni Technical Report 2026-04-22
Reducing the Offline-Streaming Gap for Unified ASR Transducer with Consistency Regularization 2026-04-22
Tadabur: A Large-Scale Quran Audio Dataset 2026-04-22
Text-To-Speech with Chain-of-Details: modeling temporal dynamics in speech generation 2026-04-22
Towards Streaming Target Speaker Extraction via Chunk-wise Interleaved Splicing of Autoregressive Language Model 2026-04-22
UAF: A Unified Audio Front-end LLM for Full-Duplex Speech Interaction 2026-04-22
Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India 2026-04-22
语音/音频论文速递 2026-04-22 2026-04-22
A novel LSTM music generator based on the fractional time-frequency feature extraction 2026-04-21
A state-space representation of the boundary integral equation for room acoustic modelling 2026-04-21
Aligning Language Models for Lyric-to-Melody Generation with Rule-Based Musical Constraints 2026-04-21
Anonymization, Not Elimination: Utility-Preserved Speech Anonymization 2026-04-21
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics 2026-04-21
Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models 2026-04-21
Audio-DeepThinker: Progressive Reasoning-Aware Reinforcement Learning for High-Quality Chain-of-Thought Emergence in Audio Language Models 2026-04-21
AVRT: Audio-Visual Reasoning Transfer through Single-Modality Teachers 2026-04-21
Benign Fine-Tuning Breaks Safety Alignment in Audio LLMs 2026-04-21
BhashaSutra: A Task-Centric Unified Survey of Indian NLP Datasets, Corpora, and Resources 2026-04-21
ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning 2026-04-21
Coexisting Tempo Traditions in Beethoven’s Piano and Cello Sonatas: A K-means Clustering Analysis of Recorded Performances, 1930-2012 2026-04-21
FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings 2026-04-21
FreezeEmpath: Efficient Training for Empathetic Spoken Chatbots with Frozen LLMs 2026-04-21
From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench 2026-04-21
Hard to Be Heard: Phoneme-Level ASR Analysis of Phonologically Complex, Low-Resource Endangered Languages 2026-04-21
HCFD: A Benchmark for Audio Deepfake Detection in Healthcare 2026-04-21
ICLAD: In-Context Learning with Comparison-Guidance for Audio Deepfake Detection 2026-04-21
Incremental learning for audio classification with Hebbian Deep Neural Networks 2026-04-21
Latent Fourier Transform 2026-04-21
LLM-Codec: Neural Audio Codec Meets Language Model Objectives 2026-04-21
MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora 2026-04-21
MINT-Bench: A Comprehensive Multilingual Benchmark for Instruction-Following Text-to-Speech 2026-04-21
MoVE: Translating Laughter and Tears via Mixture of Vocalization Experts in Speech-to-Speech Translation 2026-04-21
Neural Encoding Detection is Not All You Need for Synthetic Speech Detection 2026-04-21
NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR 2026-04-21
Omni-Embed-Audio: Leveraging Multimodal LLMs for Robust Audio-Text Retrieval 2026-04-21
Prosody as Supervision: Bridging the Non-Verbal–Verbal for Multilingual Speech Emotion Recognition 2026-04-21
SELF-EMO: Emotional Self-Evolution from Recognition to Consistent Expression 2026-04-21
Still Between Us? Evaluating and Improving Voice Assistant Robustness to Third-Party Interruptions 2026-04-21
VIBE: Voice-Induced open-ended Bias Evaluation for Large Audio-Language Models via Real-World Speech 2026-04-21
Video-Robin: Autoregressive Diffusion Planning for Intent-Grounded Video-to-Music Generation 2026-04-21
VoxSafeBench: Not Just What Is Said, but Who, How, and Where 2026-04-21
Where Do Self-Supervised Speech Models Become Unfair? 2026-04-21
语音/音频论文速递 2026-04-21 2026-04-21
ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing 2026-04-20
ArtifactNet: Detecting AI-Generated Music via Forensic Residual Physics 2026-04-20
AST: Adaptive, Seamless, and Training-Free Precise Speech Editing 2026-04-20
Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels 2026-04-20
BlasBench: An Open Benchmark for Irish Speech Recognition 2026-04-20
Discrete Token Modeling for Multi-Stem Music Source Separation with Language Models 2026-04-20
Elucidating the SNR-t Bias of Diffusion Probabilistic Models 2026-04-20
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency 2026-04-20
Generalizable Audio-Visual Navigation via Binaural Difference Attention and Action Transition Prediction 2026-04-20
HARNESS: Lightweight Distilled Arabic Speech Foundation Models 2026-04-20
Hierarchical Codec Diffusion for Video-to-Speech Generation 2026-04-20
Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition 2026-04-20
Joint-Centric Dual Contrastive Alignment with Structure-Preserving and Information-Balanced Regularization 2026-04-20
MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models 2026-04-20
MUSCAT: MUltilingual, SCientific ConversATion Benchmark 2026-04-20
NaijaS2ST: A Multi-Accent Benchmark for Speech-to-Speech Translation in Low-Resource Nigerian Languages 2026-04-20
NVBench: A Benchmark for Speech Synthesis with Non-Verbal Vocalizations 2026-04-20
PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing 2026-04-20
Qwen3.5-Omni Technical Report 2026-04-20
Spatial-Aware Conditioned Fusion for Audio-Visual Navigation 2026-04-20
Temporal Contrastive Decoding: A Training-Free Method for Large Audio-Language Models 2026-04-20
The Acoustic Camouflage Phenomenon: Re-evaluating Speech Features for Financial Risk Prediction 2026-04-20
TinyMU: A Compact Audio-Language Model for Music Understanding 2026-04-20
VoxMind: An End-to-End Agentic Spoken Dialogue System 2026-04-20
语音/音频论文速递 2026-04-20 2026-04-20
A Manual Bar-by-Bar Tempo Measurement Protocol for Polyphonic Chamber Music Recordings: Design, Validation, and Application to Beethoven’s Piano and Cello Sonatas 2026-04-19
Adaptive Test-Time Scaling for Zero-Shot Respiratory Audio Classification 2026-04-19
An Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec Decoding 2026-04-19
Audio Source Separation in Reverberant Environments using $β$-divergence based Nonnegative Factorization 2026-04-19
Audio-Cogito: Towards Deep Audio Reasoning in Large Audio Language Models 2026-04-19
AVID: A Benchmark for Omni-Modal Audio-Visual Inconsistency Understanding via Agent-Driven Construction 2026-04-19
Beyond Transcription: Unified Audio Schema for Perception-Aware AudioLLMs 2026-04-19
ClariCodec: Optimising Neural Speech Codes for 200bps Communication using Reinforcement Learning 2026-04-19
Classical Machine Learning Baselines for Deepfake Audio Detection on the Fake-or-Real Dataset 2026-04-19
Comparison of window shapes and lengths in short-time feature extraction for classification of heart sound signals 2026-04-19
Contextual Biasing for ASR in Speech LLM with Common Word Cues and Bias Word Position Prediction 2026-04-19
ControlFoley: Unified and Controllable Video-to-Audio Generation with Cross-Modal Conflict Handling 2026-04-19
CoSyncDiT: Cognitive Synchronous Diffusion Transformer for Movie Dubbing 2026-04-19
Diffusion Language Models for Speech Recognition 2026-04-19
Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models 2026-04-19
Elastic Net Regularization and Gabor Dictionary for Classification of Heart Sound Signals using Deep Learning 2026-04-19
Enhancing time-frequency resolution with optimal transport and barycentric fusion of multiple spectrogram 2026-04-19
Few-Shot and Pseudo-Label Guided Speech Quality Evaluation with Large Language Models 2026-04-19
Four Decades of Digital Waveguides 2026-04-19
From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench 2026-04-19
Geo2Sound: A Scalable Geo-Aligned Framework for Soundscape Generation from Satellite Imagery 2026-04-19
Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection 2026-04-19
Listen, Pause, and Reason: Toward Perception-Grounded Hybrid Reasoning for Audio Understanding 2026-04-19
Listening Deepfake Detection: A New Perspective Beyond Speaking-Centric Forgery Analysis 2026-04-19
MoshiRAG: Asynchronous Knowledge Retrieval for Full-Duplex Speech Language Models 2026-04-19
On the Distillation Loss Functions of Speech VAE for Unified Reconstruction, Understanding, and Generation 2026-04-19
ProSDD: Learning Prosodic Representations for Speech Deepfake Detection against Expressive and Emotional Attacks 2026-04-19
Room compensation for loudspeaker reproduction using a supporting source 2026-04-19
Sky-Ear: An Unmanned Aerial Vehicle-Enabled Victim Sound Detection and Localization System 2026-04-19
SpeakerRPL v2: Robust Open-set Speaker Identification through Enhanced Few-shot Foundation Tuning and Model Fusion 2026-04-19
SpotSound: Enhancing Large Audio-Language Models with Fine-Grained Temporal Grounding 2026-04-19
StreamMark: A Deep Learning-Based Semi-Fragile Audio Watermarking for Proactive Deepfake Detection 2026-04-19
TokenSE: a Mamba-based discrete token speech enhancement framework for cochlear implants 2026-04-19
Tora3: Trajectory-Guided Audio-Video Generation with Physical Coherence 2026-04-19
Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt 2026-04-19
Transformer Based Machine Fault Detection From Audio Input 2026-04-19
UniPASE: A Generative Model for Universal Speech Enhancement with High Fidelity and Low Hallucinations 2026-04-19
VoxEffects: A Speech-Oriented Audio Effects Dataset and Benchmark 2026-04-19
VoxSafeBench: Not Just What Is Said, but Who, How, and Where 2026-04-19
WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training 2026-04-19
Who is Speaking or Who is Depressed? A Controlled Study of Speaker Leakage in Speech-Based Depression Detection 2026-04-19
Why Your Tokenizer Fails in Information Fusion: A Timing-Aware Pre-Quantization Fusion for Video-Enhanced Audio Tokenization 2026-04-19
X-VC: Zero-shot Streaming Voice Conversion in Codec Space 2026-04-19
语音/音频论文速递 2026-04-19 2026-04-19
语音/音频论文速递 2026-04-18 2026-04-18