OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

📄 OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 21 words

OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention

📄 OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 21 words

Optimality of FSQ tokens for continuous diffusion for categorical data with application to text-to-speech

📄 Optimality of FSQ tokens for continuous diffusion for categorical data with application to text-to-speech ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 25 words

PADS-TAL: Padding-Annealed Diffusion Sampling in Text-Aware Latent Space for Robust and Diverse Text-to-Music Generation

📄 PADS-TAL: Padding-Annealed Diffusion Sampling in Text-Aware Latent Space for Robust and Diverse Text-to-Music Generation ✅ 7.2/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 25 words

PCRNet: Phase-aware Complex Refinement Network for EEG-based Auditory Attention Decoding

📄 PCRNet: Phase-aware Complex Refinement Network for EEG-based Auditory Attention Decoding ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 21 words

PHALAR: Phasors for Learned Musical Audio Representations

📄 PHALAR: Phasors for Learned Musical Audio Representations 📝 5.5/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 18 words

PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs

📄 PhaseCoder: Microphone Geometry-Agnostic Spatial Audio Understanding for Multimodal LLMs ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 20 words

PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios

📄 PhoStream: Benchmarking Real-World Streaming for Omnimodal Assistants in Mobile Scenarios ✅ 7.5/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 21 words

Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training

📄 Pianist Transformer: Towards Expressive Piano Performance Rendering via Scalable Self-Supervised Pre-Training ✅ 7.8/10 | 前25% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 22 words

Polyphonia: Training-Free Context-Aware Music Editing with Acoustic-Informed Attention Calibration

📄 Polyphonia: Training-Free Context-Aware Music Editing with Acoustic-Informed Attention Calibration ✅ 7.0/10 | 前50% | arxiv ← 返回 2026-05-23 语音/音乐/音频论文速递

2026-05-23 · 更新于 2026-06-19 · 1 min · 20 words