[News] ELSA (CVPR 2026 Findings Workshop): Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers

3 月 6, 2026

Research Direction: Efficient AI / Hardware-Agnostic Inference

Authors: Chih-Chung Hsu, Xin-Di Ma, Wo-Ting Liao, and Chia-Ming Lee

ELSA reformulates exact softmax attention as a prefix-scan problem over an associative monoid, enabling O(n) extra memory and O(log n) parallel depth without changing the operator or retraining the model. Unlike many acceleration schemes that depend heavily on fused Tensor Core kernels or approximate semantics, ELSA is designed to remain deployable across both server-class and edge hardware.

Implemented in Triton and CUDA C++, ELSA improves FP32 inference latency on high-resolution ViT inference and long-context LLM workloads while maintaining exact softmax behavior with a provable FP32 relative error bound. This work reflects our lab’s broader focus on making modern vision and multimodal AI both trustworthy and deployable in real-world systems.

Key idea: exact linear-scan attention without retraining
Target use cases: scientific imaging, edge AI, long-context inference, and hardware-diverse deployment
Status: Accepted to the CVPR 2026 Findings Workshop

ArXiv link will be added once it is publicly available.

Chih-Chung Hsu (許志仲)

[News] ELSA (CVPR 2026 Findings Workshop): Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers

By Jess

You missed

[News] ELSA (CVPR 2026 Findings Workshop): Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers

[News] Two papers accepted to CVPR 2026 main track and One to the CVPR Findings Workshop!

[News] One paper accepted to IEEE TGRS!

[News] UMCL has been published in IJCV!

Counter