When Images Can Be Generated:
Forensic Thinking for Images, Video, and Deepfakes
Visual forensics is an advanced part of AI literacy, not the whole story. This chapter places deepfakes, whole-frame generation, traditional image forensics, and heatmap reading into one interpretation framework so you can see what the tools add, what they cannot guarantee, and why final judgment must still return to source, timeline, and context evidence.
Four Types of AI Forensic Tools
| Type | Detection Target | Core Technology | Media Type |
|---|---|---|---|
| Whole-Frame AI Traces | Statistical anomalies from AI generation/editing | CNN, Vision Transformer | Image + Video |
| Face Authenticity | Face swap features in facial region | CFDet, DFM models | Media with faces |
| Temporal Consistency | Frame-to-frame inconsistencies | GenConViT, LSTM | Video only |
| Traditional Image Forensics | Photoshop splicing, copy-paste | ELA, SIFT | Image only |
Understanding AUC, TPR, FPR: Why These Numbers Matter
When you see a deepfake detector claiming "95% accuracy," what does this number actually represent? When evaluating detector performance, these three metrics are most critical:
| Metric | Definition | Ideal Value |
|---|---|---|
| AUC (Area Under Curve) | Overall discrimination: 0.5 = random guess, 1.0 = perfect detection | > 0.85 |
| TPR @ 0.1 FPR (True Positive Rate) | How many fakes caught when only 10 in 100 real items are mislabeled | > 0.70 |
| FPR (False Positive Rate) | Rate of mislabeling real content as fake (lower is better) | < 0.05 |
Why isn't "95% accuracy" enough? Because if only 5% of a dataset is deepfakes, a detector that "always says real" also achieves 95% accuracy — but it's completely useless. AUC and TPR@FPR are the true measures of detection capability.
How to Read Forensic Heatmaps (GradCAM)
Most AI forensic tools generate heatmaps (typically using GradCAM or GradCAM++ technology), showing which regions the model was "looking at" when making its determination. Correctly reading heatmaps is a key skill for using forensic tools.
- 🔴 Red/Hot Zones: Model sees strong AI traces here. Common locations: facial edges, hairlines, ears, background-foreground boundaries.
- 🔵 Blue/Cool Zones: Model considers these areas relatively authentic. Usually uniform backgrounds, fabric textures, top of head and chin edge positions.
- ⚠️ Interpretation Caution: Heatmaps show "why the model made this determination" — not "these areas are definitely faked." High-quality JPEG compression, beauty filters, and screenshots can all trigger similar "hot responses."
- ✅ Reliable Evidence: The same region (e.g., facial boundary) showing as hot across multiple independent detectors' heatmaps has higher credibility.
Error Level Analysis (ELA): A Traditional Image Forensics Workhorse
ELA is a traditional image forensics technique requiring no AI. The principle:
JPEG images produce some compression loss each time they're saved. If an image has been saved multiple times, the compression error rates in different regions gradually become uniform. But if a region was pasted in later using Photoshop, its compression error rate differs from the surroundings — because it went through a different number of compression cycles. ELA visualizes these differences, making "later-pasted" regions appear as abnormally bright (or dark) areas in the image.
- FotoForensics.com — Free online tool; upload an image to get ELA visualization
- Forensically (29a.ch/photo-forensics) — Provides ELA, Clone Detection, Noise Analysis, and multiple analysis modes
The Limits of AI Forensics: Knowing Your Tool's Boundaries
Any AI forensic tool has the following known limitations that must be kept in mind:
- Training set limitations (Distribution Shift): The deepfake techniques used in a detector's training may not match those in real-world deepfakes. A detector trained on one deepfake technique may fail against new techniques.
- Image post-processing triggering false positives: The following can cause real images to be mislabeled: excessive beauty filters (Instagram, camera app AI beauty), screenshots (moiré effect), heavy JPEG compression (quality below 60%), AI upscaling.
- Adversarial Arms Race: The "cat-and-mouse game" between deepfake generation and detection technologies continues. When a detection method becomes widely adopted, deepfake creators optimize their generation techniques to evade it.
- Special courtroom requirements: In legal contexts, AI forensic reports must be accompanied by human expert review and disclosure of the tool's training dataset, known limitations, and confidence intervals before being accepted as supporting evidence.
How to Correctly Interpret Forensic Reports: The Three-Color Verdict Framework
This Platform's Multi-Detector Architecture
This platform integrates 15+ deepfake detection models in four functional groups. Each detector is weighted based on its AUC performance in independent evaluations. The multi-detector weighted voting design reduces the mislabeling risk of any single detector — results are most credible when multiple independent models point in the same direction.
Slide Deck
Case Studies
In January 2024, a finance worker at Arup's Hong Kong office was invited to join an "emergency multi-person video conference" allegedly arranged by the company's London headquarters. The meeting showed the "faces and voices" of multiple executives including the CFO, instructing the employee to execute a series of urgent transfers. Following instructions, the employee transferred a total of HK$25 million (approximately USD$3.2M) in fifteen transactions over a few days.
Afterwards, when the employee contacted London to confirm the transfers, they discovered London headquarters had no knowledge whatsoever. Investigation showed all "executive figures" in the video conference were AI deepfake-generated; attackers used videos and photos collected from the public internet (media interviews, conference recordings, LinkedIn profiles) to train deepfake models.
Post-incident AI forensic analysis revealed multiple tells: ① All "participants" showed degraded facial boundary rendering quality when turning or making complex movements ② Background lighting directions were inconsistent across different "participants" (suggesting each figure was generated in a different environment and composited into the same video) ③ Voice prosody patterns showed statistical differences from these executives' speaking styles in real videos ④ The compression conditions of live video (typically H.264/265 at low bitrate) made these tells difficult to visually identify on screen.
- Out-of-Band (OOB) Verification: Any large transfer instruction from a video conference must be confirmed through a completely independent channel (calling the executive's known personal mobile, not the contact method in the meeting invitation)
- Pre-agreed challenge phrases: At the start of important video conferences, require all participants to answer a pre-agreed question (something only the real person would know)
- Unnatural movement test: Ask the person in the video to make quick head turns or cover then uncover their face — deepfakes are most prone to breaking down in these situations
One of deepfake technology's most dangerous side effects is what scholars Bobby Chesney and Danielle Citron (2019) called the "Liar's Dividend": once the public becomes aware of deepfakes, dishonest individuals can claim any real video damaging to them is a deepfake to evade accountability.
Multiple related cases have emerged globally: politicians claiming real bribery recordings were "AI-fabricated" after exposure; corporate executives denying real improper instruction recordings; and in criminal cases, defendants claiming surveillance footage was a deepfake. Although most were eventually disproven by forensics, the confusion and litigation delays caused significant harm.
This is precisely why AI forensic tools must have two capabilities: exposing fabrication (identifying deepfakes), and confirming authenticity (protecting the credibility of real videos). In "Liar's Dividend" scenarios, an accurate "this is real" forensic report is no less valuable than "this is fake." This is also why forensic reports must include detailed technical explanations and confidence intervals so courts and media can evaluate their reliability.