語言：English | 繁體中文

研究願景

Advanced Computer Vision Lab — Assured Computer Vision: Lean, Autonomous, Broad-Spectrum

當 generative AI 逐步模糊真實與偽造媒體的邊界、自主系統對 vision 的可靠度要求愈來愈高，而 Earth observation 也進入高資料量的新階段時，真正能部署到現場的 visual intelligence 門檻自然跟著提高。ACVLab 的研究主軸可整理成四個彼此扣合的 pillars。

Assured Visual Intelligence 關心的是每一次 visual AI 輸出能不能被信任，無論是高壓縮條件下的 DeepFake detection、adversarial perturbation defense，或透過 proactive watermarking 進行 media authentication，核心都是讓 forensic、medical 與 regulatory 場景有足夠的 accountability。

Lean Visual Architectures 則從 computation abstraction 的不同層次重新設計系統：包含 exact attention 的 prefix-scan reformulation（ELSA）、略過 pixel decoding 的 bitstream-level forensics、在 ultra-low bit width 仍盡量守住 accuracy 的 adaptive quantization（QuantTune/FracQuant），以及 bandwidth-constrained satellites 上的 joint transmission-restoration，目標是把 latency、memory 與 energy cost 一起降下來。

Autonomous Visual Perception 把 vision 從 2D 影像推進到 3D physical space：material-aware scene reconstruction with hyperspectral unmixing、BEV adversarial defense for self-driving（BFDM）、能為下游 robotic pipelines 提供穩健特徵的 shadow / reflection removal（PhaSR、ReflexSplit），以及 uncertainty-aware 3D annotation for autonomous driving datasets。

Broad-Spectrum Scientific Sensing 則把感知能力推到可見光之外：vision-language prompts 驅動的 universal hyperspectral restoration（PromptHSI）、獲得未來科技獎肯定的 real-time CubeSat compressed sensing、透過 sparse spectral representations 進行的 hyperspectral pansharpening（S³RNet），以及能揭露 RGB 看不到操弄痕跡的 cross-spectral forgery detection。

這些 pillars 並不是各自獨立。Hyperspectral forensics 把 trust 和 spectral sensing 接起來，on-satellite real-time inference 把 efficiency 和 broad-spectrum data 接起來，BEV adversarial defense 則把 trust 和 embodied perception 接起來。對 ACVLab 而言，真正能落地的 visual intelligence，必須同時兼顧 trustworthy、efficient、embodied 與 perceptually complete。

研究支柱

Autonomous Visual Perception: PhaSR、ReflexSplit、autonomous driving、tracking、embodied perception、3D reconstruction
Assured Visual Intelligence: GRACEv2、UMCL、DDD-Net、DeepFake detection、proactive authentication、trustworthy media analysis
Broad-Spectrum Scientific Sensing: PromptHSI、S³RNet、CubeSat compressed sensing、remote sensing、satellite imaging
Lean Visual Architectures: ELSA、QuantTune、FracQuant、bitstream-level inference、CubeSat on-board processing、edge deployment

研究簡介： [PDF]（最近更新：2024 年 10 月）

魯棒陰影移除

PhaSR: Generalized Image Shadow Removal with Physically Aligned Priors

已獲 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026 接收。

在複雜且多光源的情境下，陰影移除容易受到物理照明先驗與學習特徵不一致的影響。PhaSR 結合 physically aligned normalization 與 geometry-semantic rectification，在超越單一光源假設的真實場景中仍能維持穩健表現。

研究方向。 自主視覺感知 / 魯棒場景恢復

[arXiv] [GitHub]

真實世界反射分離

ReflexSplit: Single Image Reflection Separation via Layer Fusion-Separation

已獲 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026 接收。

玻璃反射會造成高度非線性的圖層混合，讓既有分離模型在真實世界中容易失效。ReflexSplit 透過 dual-stream fusion-separation blocks 與 curriculum training，在合成與真實資料上都達到更穩健的反射分離能力。

研究方向。 自主視覺感知 / 魯棒場景恢復

[arXiv] [GitHub]

高效率 AI 推論

ELSA: Exact Linear-Scan Attention for Fast and Memory-Light Vision Transformers

已獲 CVPR 2026 Findings Workshop 接收。

ELSA 將 exact softmax attention 重寫為 associative monoid 上的 prefix scan，在不需重新訓練的前提下實現更省記憶體的推論，並具備可證明的 FP32 穩定性。透過 Triton 與 CUDA C++ 實作，可同時提升資料中心與邊緣硬體上的部署性。

研究方向。 精實視覺架構 / 硬體無關推論

arXiv 預印本即將公開

量化友善部署

QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning

發表於 IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR) 2025。

QuantTune 聚焦於 Transformer 量化過程中的 outlier-driven dynamic range amplification，顯著降低低位元設定下的精度損失，且不需增加推論端的硬體複雜度，可跨 ViT、BERT 與 OPT 模型轉移。

研究方向。 精實視覺架構 / 量化感知部署

[arXiv] [IEEE Xplore]

通用高光譜復原

PromptHSI: Universal Hyperspectral Image Restoration with Vision-Language Modulated Frequency Adaptation

發表於 IEEE Transactions on Geoscience and Remote Sensing (TGRS), Early Access, Feb. 2026。

PromptHSI 是一個 all-in-one 的高光譜影像復原框架，結合 frequency-aware modulation 與 vision-language guided prompt learning，使單一模型即可同時處理雲遮蔽、模糊、雜訊與光譜缺失等多種遙測退化。

研究方向。 全頻譜科學感測 / 高光譜復原

[IEEE Xplore] [arXiv] [GitHub]

媒體安全與 DeepFake 魯棒性

Towards Robust DeepFake Detection under Unstable Face Sequences: Adaptive Sparse Graph Embedding with Order-Free Representation and Explicit Laplacian Spectral Prior

目前投稿至 IEEE Transactions on Information Forensics and Security (TIFS)。

GRACEv2 針對壓縮、遮擋、影格缺漏與順序擾動所造成的不穩定人臉序列設計，透過 order-free temporal graph embedding 與 explicit Laplacian spectral prior，在嚴苛真實條件下提升 DeepFake 偵測的穩健度。

研究方向。 可信視覺智慧 / 魯棒 DeepFake 偵測

[arXiv]