Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas

📄 Reasoning LLM Improves Speaker Recognition in Long-form TV Dramas ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 20 words

ReGen: Hierarchical Multi-Prompt Representation Generation for Efficient Waveform Diffusion Models

📄 ReGen: Hierarchical Multi-Prompt Representation Generation for Efficient Waveform Diffusion Models ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 21 words

REST: Diffusion-based Real-time End-to-end Streaming Talking Head Generation via ID-Context Caching and Asynchronous Streaming Distillation

📄 REST: Diffusion-based Real-time End-to-end Streaming Talking Head Generation via ID-Context Caching and Asynchronous Streaming Distillation ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 26 words

Rethinking Attention in Spiking Transformers: Overcoming Density Bias with Set Similarity

📄 Rethinking Attention in Spiking Transformers: Overcoming Density Bias with Set Similarity ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

Robust Signal Enhancement via Fractional Detail Views and Knowledge Guided Multi-view Fusion

📄 Robust Signal Enhancement via Fractional Detail Views and Knowledge Guided Multi-view Fusion ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 23 words

S3Audio: Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer

📄 S3Audio: Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

SALSA-V: Shortcut-Augmented Long-form Synchronized Audio from Videos

📄 SALSA-V: Shortcut-Augmented Long-form Synchronized Audio from Videos ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 18 words

SAM Audio: Segment Anything in Audio

📄 SAM Audio: Segment Anything in Audio #** #未说明。 ✅ 6.5/10 | 前50% | #** | #未说明。 | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 23 words

SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering

📄 SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 21 words

Scaling Laws in Model Fine-tuning for Audio DeepFake Detection

📄 Scaling Laws in Model Fine-tuning for Audio DeepFake Detection 📝 5.0/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 20 words