Language: English | 繁體中文

Research Vision 研究主軸

Advanced Computer Vision Lab — Assured Computer Vision: Lean, Autonomous, Broad-Spectrum

As generative AI blurs the boundary between authentic and fabricated media, autonomous systems demand vision that never fails silently, and Earth observation enters a data-rich new era, the bar for deployable visual intelligence keeps rising. ACVLab responds with four interlocking research pillars.

Assured Visual Intelligence ensures that every visual AI output can be trusted — whether detecting DeepFakes under heavy compression, defending against adversarial perturbations, or authenticating media through proactive watermarking — providing the accountability that forensic, medical, and regulatory settings require.

Lean Visual Architectures rethink computation at every level of abstraction: prefix-scan reformulations of exact attention (ELSA), bitstream-level forensics that skip pixel decoding entirely, adaptive quantization that preserves accuracy at ultra-low bit widths (QuantTune/FracQuant), and joint transmission-restoration for bandwidth-constrained satellites — cutting latency, memory, and energy cost for sustainable, real-time deployment.

Autonomous Visual Perception extends vision from 2D images into 3D physical space: material-aware scene reconstruction with hyperspectral unmixing, BEV adversarial defense for self-driving (BFDM), physics-aligned shadow and reflection removal that feeds robust features to downstream robotic pipelines (PhaSR, ReflexSplit), and uncertainty-aware 3D annotation for autonomous driving datasets.

Broad-Spectrum Scientific Sensing pushes perception beyond the visible: universal hyperspectral restoration via vision-language prompts (PromptHSI), real-time CubeSat compressed sensing recognized with the Future Technology Award, hyperspectral pansharpening through sparse spectral representations (S³RNet), and cross-spectral forgery detection that reveals manipulation invisible to RGB analysis.

These pillars do not operate in isolation. Hyperspectral forensics merges trust with spectral sensing. On-satellite real-time inference merges efficiency with broad-spectrum data. BEV adversarial defense merges trust with embodied perception. This cross-pillar synergy is not accidental — it reflects a single underlying conviction: deployment-grade visual intelligence must be simultaneously trustworthy, efficient, embodied, and perceptually complete.

Research Pillars

Autonomous Visual Perception: PhaSR, ReflexSplit, autonomous driving, tracking, embodied perception, 3D reconstruction
Assured Visual Intelligence: GRACEv2, UMCL, DDD-Net, DeepFake detection, proactive authentication, trustworthy media analysis
Broad-Spectrum Scientific Sensing: PromptHSI, S³RNet, CubeSat compressed sensing, remote sensing, satellite imaging
Lean Visual Architectures: ELSA, QuantTune, FracQuant, bitstream-level inference, CubeSat on-board processing, edge deployment

A short introduction to my research: [PDF] (Latest updated: Oct. 2024)

Robust Shadow Removal

PhaSR: Generalized Image Shadow Removal with Physically Aligned Priors

Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026.

Shadow removal under complex and multi-source lighting is hindered by the mismatch between physical illumination priors and learned features. PhaSR couples physically aligned normalization with geometry-semantic rectification to deliver robust shadow removal that generalizes beyond traditional single-light settings.

Research Direction. Autonomous Visual Perception / Robust Scene Recovery

[arXiv] [GitHub]

Reflection Separation in the Wild

ReflexSplit: Single Image Reflection Separation via Layer Fusion-Separation

Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026.

Reflections on glass introduce nonlinear layer mixing that often breaks existing separation networks. ReflexSplit uses dual-stream fusion-separation blocks and curriculum training to achieve robust performance on both synthetic and real-world benchmarks.

Research Direction. Autonomous Visual Perception / Robust Scene Recovery

[arXiv] [GitHub]

Efficient AI Inference

ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers

Accepted to CVPR 2026 Findings (CVPRF).

ELSA reformulates exact softmax attention as a prefix scan over an associative monoid, achieving memory-light inference with provable FP32 stability and no retraining. Implemented in Triton and CUDA C++, it improves deployability on both data-center and edge hardware.

Research Direction. Lean Visual Architectures / Hardware-Agnostic Inference

ArXiv coming soon

Quantization-Friendly Deployment

QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning

Published in IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR) 2025.

QuantTune addresses outlier-driven dynamic range amplification during Transformer quantization and substantially reduces accuracy loss under low-bit settings. The method requires no extra inference-time hardware complexity and transfers across ViT, BERT, and OPT models.

Research Direction. Lean Visual Architectures / Quantization-Aware Deployment

[arXiv] [IEEE Xplore]

Universal Hyperspectral Restoration

PromptHSI: Universal Hyperspectral Image Restoration with Vision-Language Modulated Frequency Adaptation

Published in IEEE Transactions on Geoscience and Remote Sensing (TGRS), Early Access, Feb. 2026.

PromptHSI is a universal all-in-one framework for hyperspectral restoration that combines frequency-aware modulation with vision-language guided prompt learning. A single model can handle cloud occlusion, blur, noise, and spectral band loss across remote sensing scenarios.

Research Direction. Broad-Spectrum Scientific Sensing / Hyperspectral Restoration

[IEEE Xplore] [arXiv] [GitHub]

Media Security & DeepFake Robustness

Towards Robust DeepFake Detection under Unstable Face Sequences: Adaptive Sparse Graph Embedding with Order-Free Representation and Explicit Laplacian Spectral Prior

Submitted to IEEE Transactions on Information Forensics and Security (TIFS).

GRACEv2 targets unstable face sequences caused by compression, occlusion, and shuffled or missing frames. By combining order-free temporal graph embedding with an explicit Laplacian spectral prior, it improves robust DeepFake detection under severe real-world disruptions.

Research Direction. Assured Visual Intelligence / Robust DeepFake Detection

[arXiv]

Cross-Compression DeepFake Detection

UMCL: Unimodal-Generated Multimodal Contrastive Learning for Cross-compression-rate Deepfake Detection

Published in International Journal of Computer Vision (IJCV), Jan. 2026.

UMCL synthesizes compression-robust multimodal cues, including rPPG, temporal landmarks, and semantic embeddings, from a single visual input. The framework improves cross-compression DeepFake detection while preserving interpretable feature relationships.

Research Direction. Assured Visual Intelligence / Cross-Compression Forensics

[Springer] [DOI] [arXiv]