Scaling Transformers for End-to-End Discrete Audio Tokenization

📄 Scaling Transformers for End-to-End Discrete Audio Tokenization ✅ 6.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 18 words

Self-Guidance: Enhancing Neural Codecs via Decoder Manifold Alignment

📄 Self-Guidance: Enhancing Neural Codecs via Decoder Manifold Alignment ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 19 words

Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis

📄 Self-Supervised Flow Matching for Scalable Multi-Modal Synthesis ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 18 words

Simultaneous Speech-to-Speech Translation Without Aligned Data

📄 Simultaneous Speech-to-Speech Translation Without Aligned Data ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 17 words

SONAR: Spectral‑Contrastive Audio Residuals for Generalizable Deepfake Detection

📄 SONAR: Spectral‑Contrastive Audio Residuals for Generalizable Deepfake Detection 📝 4.0/10 | 后50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 19 words

SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering

📄 SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 19 words

Sparse Autoencoders for Interpretable Emotion Control in Text-to-Speech

📄 Sparse Autoencoders for Interpretable Emotion Control in Text-to-Speech ✅ 6.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 19 words

Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization

📄 Sparse Tokens Suffice: Jailbreaking Audio Language Models via Token-Aware Gradient Optimization 📝 5.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

SPEAR: A Unified SSL Framework for Learning Speech and Audio Representations

📄 SPEAR: A Unified SSL Framework for Learning Speech and Audio Representations ✅ 7.2/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

Speech-Audio Compositional Attacks on Multimodal LLMs and Their Defense with SALMONN-Guard

📄 Speech-Audio Compositional Attacks on Multimodal LLMs and Their Defense with SALMONN-Guard ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words