论文速递 | 语音/音乐/音频论文速递

Bioacoustic Geolocation: Species Sounds as Geographic Signals

📄 Bioacoustic Geolocation: Species Sounds as Geographic Signals ✅ 7.2/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models

📄 Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models ✅ 7.3/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Bridging Your Imagination with Audio-Video Generation via a Unified Director

📄 Bridging Your Imagination with Audio-Video Generation via a Unified Director ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Characterizing the Predictive Impact of Modalities with Supervised Latent-Variable Modeling

📄 Characterizing the Predictive Impact of Modalities with Supervised Latent-Variable Modeling ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

📄 CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction 🔥 8.2/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering

📄 CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

CoLA: Cross-Modal Low-rank Adaptation for Multimodal Downstream Tasks

📄 CoLA: Cross-Modal Low-rank Adaptation for Multimodal Downstream Tasks ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Convex Low-resource Accent-Robust Language Detection in Speech Recognition

📄 Convex Low-resource Accent-Robust Language Detection in Speech Recognition #** #凸优化 #语音识别 #语言检测 #低资源 #口音鲁棒性 #ADMM ✅ 7.5/10 | 前25% | #** | #凸优化 | #语音识别 #语言检测 | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

DiscoForcing: A Unified Framework for Real-Time Audio-Driven Character Control with Diffusion Forcing

📄 DiscoForcing: A Unified Framework for Real-Time Audio-Driven Character Control with Diffusion Forcing 🔥 8.2/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox

📄 Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox ✅ 6.9/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递