
Performance comparison of ELSA vs. ME-SDPA.
ELSA achieves up to 392% throughput improvement on high-resolution ViT inference (left),
maintains consistent advantages on single attention operations (center-left), delivers 8.9–10.2% gains on LLaMA-13B offloading at long contexts (center-right), and provides 19.94% speedup for LLaMA-8B inference at 16K tokens (right). Point area represents peak VRAM. ViT comparisons use matched model sizes (Tiny/Small/Medium) for fairness. All run in FP32 mode.
See more details in my paper, which will be released soon.
