Chapter 03 · AI Risks & Pitfalls

How AI Goes Wrong:
From Hallucinations and Bias to Synthetic Abuse

~25 min read Difficulty: Intermediate Deepfake · GAN · Face Swap · Voice Cloning

AI risk is not limited to fake images or cloned voices. More often, AI invents details confidently, amplifies bias, or wraps authentic material in false context. This chapter maps those risks together so you are not only defending against deepfakes, but understanding why AI-generated or AI-amplified content misleads people at all.

The Complete Manipulation Technology Spectrum

Type	Tech Threshold	Cost	Detection Difficulty	Primary Use
Cheapfake	⭐	$0	⭐⭐	Political attack, emotional manipulation
Photoshop	⭐⭐	Low	⭐⭐⭐	Faking crime scenes, forging documents
GAN Synthetic Faces	⭐⭐⭐	Medium	⭐⭐⭐⭐	Fake accounts, fake review farms
Face Swap	⭐⭐⭐⭐	Medium	⭐⭐⭐⭐	Political disinformation, non-consensual sexual content
Voice Cloning	⭐⭐⭐	Low	⭐⭐⭐⭐⭐	Fraud, political interference
Multimodal Deepfake	⭐⭐⭐⭐⭐	High	⭐⭐⭐⭐⭐	Corporate fraud, high-value deception

Three Types of Cheapfakes

The term "cheapfake" was popularized by journalist Nina Schick and Sam Gregory (WITNESS media watchdog) to describe manipulated media created using simple, low-cost techniques without AI.

⚡ Three Cheapfake Techniques

Speed manipulation: Adjust playback speed (usually slowed to 70-80%). Effect: Anyone appears drunk or mentally sluggish. Detection: Notice voice pitch (slower speed = lower pitch) and background sounds (ambient audio abnormally low-pitched).
Context stripping: Keep only out-of-context clips so politicians' or experts' words seem completely different. Detection: Search for original full video and check surrounding context.
Loop editing: Cut a few-second clip into a seamless loop to make viewers believe an event lasted much longer (common in crowd violence, explosions, protest scenes). Detection: Carefully watch for repeating objects in the background (cloud movement, crowd positioning).

GAN Synthetic Faces: Identifying "People Who Don't Exist"

Technologies like StyleGAN and Stable Diffusion can generate highly realistic photos of "people who don't exist," widely used to create fake social media accounts, fake review farms, and forged expert credentials.

🔍 Visual Tells of GAN-Generated Faces

Asymmetric ears: GAN faces often have oddly shaped ears, or clearly asymmetric left-right ears
Abnormal background: Background objects may merge, straight lines curve, objects "disappear"
Inconsistent eye catchlights: Real eyes have nearly identical catchlights in both eyes; GAN faces often have different catchlights in each eye
Hair and teeth anomalies: Fine strands of hair may merge into blobs; teeth may have wrong count or abnormally perfect edges
Necklaces and glasses: These two items are where GANs most often fail — may be asymmetric or bizarrely shaped

How to Detect Face Swap Deepfakes

Face swap deepfakes use deep learning to "paste" one person's facial features onto another person's body video. Common technologies include DeepFaceLab, FaceSwap, and various NVIDIA face-swapping models.

Facial boundary halos: Face swap edges often show semi-transparent "halos" during lighting changes, especially in profile views, low light, or fast movement
Abnormal blink rate: Early deepfake tech rarely blinked (fewer closed-eye training images); modern deepfakes may blink excessively
Head rotation artifacts: When the head rapidly turns beyond 45 degrees to the side, facial rendering quality visibly degrades
Skin tone boundaries: Under different lighting, the swapped face's skin tone may not match the neck or ears
Lip sync mismatch: Especially in specific languages (like Chinese), lip movement may not perfectly match the audio

AI Voice Cloning: When the Phone Isn't Who You Think

Modern AI voice cloning technology (like ElevenLabs, Coqui TTS, OpenAI's Voice Engine) requires only 3-5 seconds of voice sample to generate convincing clones. Cost: virtually zero. This dramatically lowers the technical barrier for phone fraud.

🎙️ AI Voice Cloning Identification Characteristics

Abnormal breathing rhythm: AI voices often lack natural breath sounds, or breathing occurs at unnatural sentence positions
Overly flat prosody: Emotional passages (anger, excitement, sadness) have less natural pitch variation than real humans — sounds like "reading a script"
Background audio "splicing feel": There may be slight volume or audio quality switching between AI-synthesized voice portions and background ambient sound
Specific pronunciation errors: Chinese dialects, Taiwanese, regional accents, and technical terms are where AI voice cloning most often fails

Multimodal Deepfakes: The Most Dangerous Combination Attack

Single-modality deepfakes (only visual, or only voice) are relatively easier to detect. But when attackers simultaneously fake visual, audio, and text modalities, the three mutually "confirm" each other, dramatically improving deception success rates. This is called "multimodal deepfake attack" — currently the most technically mature and dangerous form of deepfake.

Typical attack flow: Attackers first collect public videos and audio of the target (e.g., a corporate executive); use face swap to generate deepfake video; use voice cloning to generate audio; and forge email to "confirm" the instructions. The victim sees the video, hears the voice, receives the email — three channels all "pointing to the same instruction" — therefore believing its authenticity.

Slide Deck

01 / 05

The Manipulation Spectrum

Type	Threshold	Detection
Cheapfake	⭐	⭐⭐
GAN Synthetic Face	⭐⭐⭐	⭐⭐⭐⭐
Face Swap	⭐⭐⭐⭐	⭐⭐⭐⭐
Voice Cloning	⭐⭐⭐	⭐⭐⭐⭐⭐
Multimodal	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐

02 / 05

Three Cheapfake Techniques

🐌 Speed Manipulation

Slow to 70-80%; anyone looks drunk or slow

✂️ Context Stripping

Keep only out-of-context clips, change original meaning

🔁 Loop Editing

Make a few-second clip into seamless loop, exaggerate event duration

🔍 Detection

Find original full version; notice voice pitch; watch for repeating background objects

03 / 05

Face Swap Visual Tells

💡 Facial Boundary Halo

Semi-transparent halo appears during fast movement or profile views

👁️ Abnormal Blink

Too few or too many blinks, inconsistent with natural rhythm

🌈 Skin Tone Mismatch

Face and neck/ear skin tones don't match under different lighting

👄 Lip Sync Mismatch

Slight timing gap between lip movement and audio

04 / 05

Detecting AI Voice Clones

🫁 Breathing Rhythm

AI lacks natural breathing, or breathing at unnatural positions

🎵 Flat Prosody

Emotional passages have unnatural pitch variation — sounds scripted

🔊 Audio Splicing

Slight audio quality switching between voice and ambient sound

🗣️ Dialect Errors

Taiwanese, Hakka, regional accents often inaccurate

05 / 05

Zelensky Deepfake Case Analysis

March 2022: Technical tells in the Zelensky "surrender announcement" deepfake video:

Tell	Description
Head Proportion	Head visibly oversized relative to shoulders
Neck Boundary	Clear halo at face-neck boundary
Pitch Deviation	Voice pitch vastly different from real Zelensky

Even with technical flaws, it spread widely under war panic conditions.

Case Studies

FAKE · Political Deepfake

Zelensky Deepfake "Surrender Announcement" (2022)

March 2022 · Ukraine / Global

About three weeks after Russia's invasion of Ukraine, a deepfake video circulated online in which "Ukrainian President Zelensky" appeared to tell soldiers to lay down their weapons. Media reports and platform security teams identified it as fabricated, and Meta said it removed the clip under its manipulated-media policy.

Public reporting described the video as low quality, with suspicious face-body proportion, voice-video synchronization, and blending issues. The teaching point is that even crude deepfakes can spread quickly when the surrounding context is fear, war, and urgency.

Social Media's Rapid Response

Meta publicly stated that it removed the video, and Zelensky released a rebuttal video making clear that the surrender message was false. High-consequence political videos should be checked against official channels and reliable reporting before any sharing.

Core Learning

This case illustrates an important principle: Even technically low-quality deepfakes can be effective under certain social conditions (war panic). Defense strategy: Any video involving major political decisions or high-consequence statements like "surrender/attack" must wait for official media and government channel confirmation, not be judged based on the first social media source.

Sources: TechCrunch / Meta removal report; Bitdefender incident summary

FAKE · Fraud Deepfake Ad

Taiwan Celebrity AI Investment Fraud Ads (2023-2024)

November 2023 onwards · Taiwan

Taiwan has seen many investment scams impersonating celebrities or government agencies. Some use AI-altered images, fake group photos, fake news pages, or fake websites to create trust. Taiwan FactCheck Center has checked scams impersonating Morris Chang and AI-packaged investment claims involving public agencies and business figures.

Common signals include stolen public images, links to LINE investment groups, fake-news layouts, promised high returns, and pressure to open accounts or transfer money. The main defense is not only judging whether a video is a deepfake, but checking official accounts, mainstream reporting, and fact-check reports.

Legal Developments

Publicly checkable sources currently support the warning, fact-checking, and investigation pattern; a specific claim that a court accepted AI forensic reports in a named conviction would require a concrete judgment document. This course therefore treats the item as a Taiwan AI-packaged investment-scam pattern, not as a claim about a specific court ruling.