📄 Unlocking Speech–Text Compositional Powers: Instruction-Following Speech Language Models without Instruction Tuning 📝 5.8/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递