Chapter 03 · AI Risks & Pitfalls

How AI Goes Wrong:
From Hallucinations and Bias to Synthetic Abuse

~25 min read Difficulty: Intermediate Deepfake · GAN · Face Swap · Voice Cloning

AI risk is not limited to fake images or cloned voices. More often, AI invents details confidently, amplifies bias, or wraps authentic material in false context. This chapter maps those risks together so you are not only defending against deepfakes, but understanding why AI-generated or AI-amplified content misleads people at all.

The Complete Manipulation Technology Spectrum

Type	Tech Threshold	Cost	Detection Difficulty	Primary Use
Cheapfake	⭐	$0	⭐⭐	Political attack, emotional manipulation
Photoshop	⭐⭐	Low	⭐⭐⭐	Faking crime scenes, forging documents
GAN Synthetic Faces	⭐⭐⭐	Medium	⭐⭐⭐⭐	Fake accounts, fake review farms
Face Swap	⭐⭐⭐⭐	Medium	⭐⭐⭐⭐	Political disinformation, non-consensual sexual content
Voice Cloning	⭐⭐⭐	Low	⭐⭐⭐⭐⭐	Fraud, political interference
Multimodal Deepfake	⭐⭐⭐⭐⭐	High	⭐⭐⭐⭐⭐	Corporate fraud, high-value deception

Three Types of Cheapfakes

The term "cheapfake" was popularized by journalist Nina Schick and Sam Gregory (WITNESS media watchdog) to describe manipulated media created using simple, low-cost techniques without AI.

⚡ Three Cheapfake Techniques

Speed manipulation: Adjust playback speed (usually slowed to 70-80%). Effect: Anyone appears drunk or mentally sluggish. Detection: Notice voice pitch (slower speed = lower pitch) and background sounds (ambient audio abnormally low-pitched).
Context stripping: Keep only out-of-context clips so politicians' or experts' words seem completely different. Detection: Search for original full video and check surrounding context.
Loop editing: Cut a few-second clip into a seamless loop to make viewers believe an event lasted much longer (common in crowd violence, explosions, protest scenes). Detection: Carefully watch for repeating objects in the background (cloud movement, crowd positioning).

GAN Synthetic Faces: Identifying "People Who Don't Exist"

Technologies like StyleGAN and Stable Diffusion can generate highly realistic photos of "people who don't exist," widely used to create fake social media accounts, fake review farms, and forged expert credentials.

🔍 Visual Tells of GAN-Generated Faces

Asymmetric ears: GAN faces often have oddly shaped ears, or clearly asymmetric left-right ears
Abnormal background: Background objects may merge, straight lines curve, objects "disappear"
Inconsistent eye catchlights: Real eyes have nearly identical catchlights in both eyes; GAN faces often have different catchlights in each eye
Hair and teeth anomalies: Fine strands of hair may merge into blobs; teeth may have wrong count or abnormally perfect edges
Necklaces and glasses: These two items are where GANs most often fail — may be asymmetric or bizarrely shaped

How to Detect Face Swap Deepfakes

Face swap deepfakes use deep learning to "paste" one person's facial features onto another person's body video. Common technologies include DeepFaceLab, FaceSwap, and various NVIDIA face-swapping models.

Facial boundary halos: Face swap edges often show semi-transparent "halos" during lighting changes, especially in profile views, low light, or fast movement
Abnormal blink rate: Early deepfake tech rarely blinked (fewer closed-eye training images); modern deepfakes may blink excessively
Head rotation artifacts: When the head rapidly turns beyond 45 degrees to the side, facial rendering quality visibly degrades
Skin tone boundaries: Under different lighting, the swapped face's skin tone may not match the neck or ears
Lip sync mismatch: Especially in specific languages (like Chinese), lip movement may not perfectly match the audio

AI Voice Cloning: When the Phone Isn't Who You Think

Modern AI voice cloning technology (like ElevenLabs, Coqui TTS, OpenAI's Voice Engine) requires only 3-5 seconds of voice sample to generate convincing clones. Cost: virtually zero. This dramatically lowers the technical barrier for phone fraud.

🎙️ AI Voice Cloning Identification Characteristics

Abnormal breathing rhythm: AI voices often lack natural breath sounds, or breathing occurs at unnatural sentence positions
Overly flat prosody: Emotional passages (anger, excitement, sadness) have less natural pitch variation than real humans — sounds like "reading a script"
Background audio "splicing feel": There may be slight volume or audio quality switching between AI-synthesized voice portions and background ambient sound
Specific pronunciation errors: Chinese dialects, Taiwanese, regional accents, and technical terms are where AI voice cloning most often fails

Multimodal Deepfakes: The Most Dangerous Combination Attack

Single-modality deepfakes (only visual, or only voice) are relatively easier to detect. But when attackers simultaneously fake visual, audio, and text modalities, the three mutually "confirm" each other, dramatically improving deception success rates. This is called "multimodal deepfake attack" — currently the most technically mature and dangerous form of deepfake.

Typical attack flow: Attackers first collect public videos and audio of the target (e.g., a corporate executive); use face swap to generate deepfake video; use voice cloning to generate audio; and forge email to "confirm" the instructions. The victim sees the video, hears the voice, receives the email — three channels all "pointing to the same instruction" — therefore believing its authenticity.

Slide Deck

01 / 05

The Manipulation Spectrum

Type	Threshold	Detection
Cheapfake	⭐	⭐⭐
GAN Synthetic Face	⭐⭐⭐	⭐⭐⭐⭐
Face Swap	⭐⭐⭐⭐	⭐⭐⭐⭐
Voice Cloning	⭐⭐⭐	⭐⭐⭐⭐⭐
Multimodal	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐

02 / 05

Three Cheapfake Techniques

🐌 Speed Manipulation

Slow to 70-80%; anyone looks drunk or slow

✂️ Context Stripping

Keep only out-of-context clips, change original meaning

🔁 Loop Editing

Make a few-second clip into seamless loop, exaggerate event duration

🔍 Detection

Find original full version; notice voice pitch; watch for repeating background objects

03 / 05

Face Swap Visual Tells

💡 Facial Boundary Halo

Semi-transparent halo appears during fast movement or profile views

👁️ Abnormal Blink

Too few or too many blinks, inconsistent with natural rhythm

🌈 Skin Tone Mismatch

Face and neck/ear skin tones don't match under different lighting

👄 Lip Sync Mismatch

Slight timing gap between lip movement and audio

04 / 05

Detecting AI Voice Clones

🫁 Breathing Rhythm

AI lacks natural breathing, or breathing at unnatural positions

🎵 Flat Prosody

Emotional passages have unnatural pitch variation — sounds scripted

🔊 Audio Splicing

Slight audio quality switching between voice and ambient sound

🗣️ Dialect Errors

Taiwanese, Hakka, regional accents often inaccurate

05 / 05

Zelensky Deepfake Case Analysis

March 2022: Technical tells in the Zelensky "surrender announcement" deepfake video:

Tell	Description
Head Proportion	Head visibly oversized relative to shoulders
Neck Boundary	Clear halo at face-neck boundary
Pitch Deviation	Voice pitch vastly different from real Zelensky

Even with technical flaws, it spread widely under war panic conditions.

Case Studies

FAKE · Political Deepfake

Zelensky Deepfake "Surrender Announcement" (2022)

March 2022 · Ukraine / Global

About three weeks after Russia's invasion of Ukraine, a deepfake video began circulating on Telegram, Facebook, and Twitter in which "Ukrainian President Zelensky" ordered soldiers to lay down their weapons. The video first appeared in a hacked Ukrainian TV station's live broadcast, then spread massively on social media.

Technical analysis showed this was a lower-quality face swap deepfake: head disproportionately large relative to shoulders (typical insufficient training data problem), clear semi-transparent halo at face-neck boundary — especially obvious when the head turns. The voice pitch was about half a semitone higher than the real Zelensky, lacking his distinctive speech cadence.

Social Media's Rapid Response

Both Meta and YouTube flagged and removed the video within hours. The real Zelensky immediately released a rebuttal video filmed outside government buildings, emphasizing "We are here, I have not surrendered." Ukrainian fact-checking organization StopFake published a verification report within 90 minutes of the video appearing.

Core Learning

This case illustrates an important principle: Even technically low-quality deepfakes can be effective under certain social conditions (war panic). Defense strategy: Any video involving major political decisions or high-consequence statements like "surrender/attack" must wait for official media and government channel confirmation, not be judged based on the first social media source.

Reuters Fact Check, BBC, Meta Security Blog, March 2022

FAKE · Fraud Deepfake Ad

Taiwan Celebrity AI Investment Fraud Ads (2023-2024)

November 2023 onwards · Taiwan

From November 2023, Taiwan saw a flood of deepfake video ads featuring celebrities (including TSMC founder Morris Chang, Terry Gou, Lai Ching-te, and others) massively distributed through YouTube and Facebook ad systems. These videos claimed celebrities "personally endorsed" investment platforms promising high returns, precisely targeting retirees.

Related fraud cases are estimated to have caused over NT$1 billion in losses for Taiwanese citizens, with multiple victims losing their life savings. Technical analysis showed these videos shared common tells: ① Lip movement not perfectly synced with Chinese dubbing ② Slight "melting" at hair edges when turning sideways ③ Direct eye contact with almost no blinking ④ Unnaturally applied background blur.

Legal Developments

In a 2024 New Taipei District Court ruling, the court accepted AI forensic reports as supporting evidence confirming the defendant used deepfake technology to produce advertising videos, sentencing them to 4 years and 6 months imprisonment. This was one of the first deepfake fraud convictions in Taiwan using AI forensic reports as primary supporting evidence.