MatchDiffusion:
Training-free Generation of Match-Cuts



MatchDiffusion Results

Autumn leaf falling → Butterfly fluttering
Autumn leaf falling → Butterfly fluttering
Flower blooming → Fireworks bursting over a calm lake
Flower blooming → Fireworks bursting over a calm lake
Circular highway interchange → Ice skating in circles on a frozen lake
Circular highway interchange → Ice skating in circles on a frozen lake
Whiskey bottle on wooden table → Cozy cabin in snowy woods
Whiskey bottle on wooden table → Cozy cabin in snowy woods
Waves lapping on shore → Ants marching on forest floor
Waves lapping on shore → Ants marching on forest floor

MatchDiffusion enables the automatic generation Match-Cuts, i.e., visually consistent and semantically distinct scenes. Match-diffusion is a training-free approach that employs joint and disjoint diffusion stages to balance coherence and divergence, producing high-quality transitions suitable for diverse artistic needs.


What Are Match-Cuts?

A Match-Cut is a film editing technique where two shots are seamlessly connected by matching their visual, structural, or conceptual elements. These cuts guide the viewer’s attention and create a smooth transition while maintaining narrative or aesthetic coherence. Match-Cuts are often used to highlight symbolic relationships or temporal continuity between scenes. For more Match-Cuts' examples visit the website: https://eyecannndy.com/technique/match-cut.

Match-Cut from Turkish-airlines Advertisement
Match-Cut from Turkish-airlines Advertisement
Match-Cut from So Not Worth It (TV series)
Match-Cut from So Not Worth It (TV series)
Match-Cut from Delta Airlines Advertisement
Match-Cut from Delta Airlines Advertisement

MatchDiffusion

Given two prompts (\(\rho'\), \(\rho''\)) describing different scenes, our goal is to generate a pair of videos (\(x'\), \(x''\)) that adhere to their respective conditions while remaining visually cohesive for match-cut transitions. The two videos can be combined in a match-cut, for instance, by joining the first half of \(x'\) with the second half of \(x''\). This approach enables a smooth transition between different semantic scenes while preserving consistent visual characteristics. Previous works have observed how diffusion models inherently establish general structure and color patterns in early denoising stages, with finer details and prompt-specific textures emerging in later stages.

We build on this property to generate videos amenable for match-cuts. In particular, we propose MatchDiffusion, a two-stage training-free pipeline comprising: (1) Joint Diffusion, where we set up a shared visual structure based on both prompts, followed by (2) Disjoint Diffusion, where each video independently develops the semantics corresponding to its prompt.

Pipeline Figure

Pipeline Figure

Reproducing Kubrick's Famous Match-Cuts

Our pipeline enabled us to reproduce what is arguably the most famous Match-Cuts of all time: the bone-to-spaceship transition in Stanley Kubrick’s 2001: A Space Odyssey. This iconic moment seamlessly transitions from a prehistoric tool to a futuristic spacecraft, symbolizing the leap in human evolution. Below, we present the original Match-Cuts alongside our reproduction using MatchDiffusion.

Original: Kubrick's Bone-to-Spaceship
Original: Kubrick's Bone-to-Spaceship
Reproduction: MatchDiffusion
Reproduction: MatchDiffusion

More Results from MatchDiffusion

Below, we showcase additional results generated by MatchDiffusion, demonstrating its ability to produce seamless and visually compelling Match-Cuts across a variety of prompts and transitions.

Guacamaya soaring through jungle → Colombian flag waving on hilltop
Guacamaya soaring through jungle → Colombian flag waving on hilltop
Rain on Amazon foliage → Oasis spring in desert
Rain on Amazon foliage → Oasis spring in desert
Volcanic crater from above → Eye blinking in close-up
Volcanic crater from above → Eye blinking in close-up
Latte art bear → Bear peeking from behind tree
Latte art bear → Bear peeking from behind tree
Sailboat gliding across lake → Hawk soaring over canyon
Sailboat gliding across lake → Hawk soaring over canyon
Driftwood in river → Sand dunes shifting in desert
Driftwood in river → Sand dunes shifting in desert
Ember in a campfire → City skyline lighting up at dusk
Ember in a campfire → City skyline lighting up at dusk
Highway panning → Pier gliding along ocean waves
Highway panning → Pier gliding along ocean waves
Petra's glowing canyon → Sunlit stained glass in cathedral
Petra's glowing canyon → Sunlit stained glass in cathedral
Barracuda slicing through water → Missile streaking through sky
Barracuda slicing through water → Missile streaking through sky
Falling orange leaf → Paw print etched in ground
Falling orange leaf → Paw print etched in ground
Snake weaving through grass → Amazon River winding through rainforest
Snake weaving through grass → Amazon River winding through rainforest
Kaleidoscope turning → Blooming garden in spring
Kaleidoscope turning → Blooming garden in spring
Ocean vortex swirling → Eye resembling a whirlpool
Ocean vortex swirling → Eye resembling a whirlpool
Vintage steam train → Tranquil river winding through forest
Vintage steam train → Tranquil river winding through forest
Wind sweeping wildflowers → Ripples spreading across pond
Wind sweeping wildflowers → Ripples spreading across pond
Market stall of spices → Painter mixing oil colors
Market stall of spices → Painter mixing oil colors
Leaves rustling in autumn → Confetti swirling in celebration
Leaves rustling in autumn → Confetti swirling in celebration
Burning matchstick → Rocket launching
Burning matchstick → Rocket launching
Lighthouse beam on ocean → Car headlights cutting through fog
Lighthouse beam on ocean → Car headlights cutting through fog


Comparison with Baselines

Below, we compare results from MatchDiffusion with three baseline methods across three sample prompts. Each row corresponds to a prompt, and each column shows the results generated by a specific method.

Cog Vid-to-Vid

SMM

MOFT

MatchDiffusion (Ours)

Parchment sunset - V2V
Parchment sunset - V2V
Parchment sunset - SMM
Parchment sunset - SMM
Parchment sunset - MOFT
Parchment sunset - MOFT
Parchment sunset - MatchDiff
Parchment sunset - MatchDiff
Metro conveyor - V2V
Metro conveyor - V2V
Metro conveyor - SMM
Metro conveyor - SMM
Metro conveyor - MOFT
Metro conveyor - MOFT
Metro conveyor - MatchDiff
Metro conveyor - MatchDiff
City fridge - V2V
City fridge - V2V
City fridge - SMM
City fridge - SMM
City fridge - MOFT
City fridge - MOFT
City fridge - MatchDiff
City fridge - MatchDiff

Captions:


Sampling with MatchDiffusion

Below, we several alternatives for the same match-cuts by sampling with different seeds.

Fossil sample 43
Fossil sample 43
Fossil sample 44
Fossil sample 44
Fossil sample 60
Fossil sample 60
Fossil sample 74
Fossil sample 74
Ember sample 42
Ember sample 42
Ember sample 43
Ember sample 43
Ember sample 48
Ember sample 48
Ember sample 52
Ember sample 52
Flower sample 1000
Flower sample 1000
Flower sample 1004
Flower sample 1004
Flower sample 1014
Flower sample 1014
Flower sample 1011
Flower sample 1011

Captions:



Analysis on number of Joint steps (K)

Below, we show a few examples of the effect of the number of shared steps for three different samples.

K = 0

K = 10

K = 20

K = 50

Butterfly sample 0
Butterfly sample 0
Butterfly sample 10
Butterfly sample 10
Butterfly sample 20
Butterfly sample 20
Butterfly sample 50
Butterfly sample 50
Parchment sample 0
Parchment sample 0
Parchment sample 10
Parchment sample 10
Parchment sample 20
Parchment sample 20
Parchment sample 50
Parchment sample 50
Ripples sample 0
Ripples sample 0
Ripples sample 10
Ripples sample 10
Ripples sample 20
Ripples sample 20
Ripples sample 50
Ripples sample 50

Captions:



Boundary Frames Visualization

Below, we visualize the boundary frames for four different samples. The left image corresponds to the last frame of the first video, and the right image corresponds to the first frame of the second video in the Match-Cuts.

Boundary Frame 1 - After
First Frame (Video 2)
Boundary Frame 1 - Before
Last Frame (Video 1)
Boundary Frame 2 - After
First Frame (Video 2)
Boundary Frame 2 - Before
Last Frame (Video 1)
Boundary Frame 3 - After
First Frame (Video 2)
Boundary Frame 3 - Before
Last Frame (Video 1)
Boundary Frame 4 - After
First Frame (Video 2)
Boundary Frame 4 - Before
Last Frame (Video 1)

Stable Diffusion 1.5 + MatchDiffusion

We also experimented using the two MatchDiffusion paths Joint + Disjoint with the Stable Diffusion 1.5 model. Below, we visualize the frames pairs. We observe a similar pattern as the two images share overall structure while being semantically divergent.

Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before

Citation

@InProceedings{Pardo2024,
  title = {MatchDiffusion: Training-free Generation of Match-Cuts},
  author = {Pardo, Alejandro and Pizzati, Fabio and Zhang, Tong and Pondaven, Alexander and Torr, Philip and Perez, Juan Camilo and Ghanem, Bernard},
  booktitle = {ArXiv Preprint},
  month = {November},
  year = {2024},
}