MatchDiffusion:
Training-free Generation of Match-Cuts



MatchDiffusion Results

Autumn leaf falling → Butterfly fluttering
Flower blooming → Fireworks bursting over a calm lake
Circular highway interchange → Ice skating in circles on a frozen lake
Whiskey bottle on wooden table → Cozy cabin in snowy woods
Waves lapping on shore → Ants marching on forest floor

MatchDiffusion enables the automatic generation Match-Cuts, i.e., visually consistent and semantically distinct scenes. Match-diffusion is a training-free approach that employs joint and disjoint diffusion stages to balance coherence and divergence, producing high-quality transitions suitable for diverse artistic needs.


What Are Match-Cuts?

A Match-Cut is a film editing technique where two shots are seamlessly connected by matching their visual, structural, or conceptual elements. These cuts guide the viewer’s attention and create a smooth transition while maintaining narrative or aesthetic coherence. Match-Cuts are often used to highlight symbolic relationships or temporal continuity between scenes.

Match-Cuts from The Gentlemen (TV series)
Match-Cuts from So Not Worth It (TV series)
Match-Cuts from Breaking Bad (TV series)

MatchDiffusion

Given two prompts (\(\rho'\), \(\rho''\)) describing different scenes, our goal is to generate a pair of videos (\(x'\), \(x''\)) that adhere to their respective conditions while remaining visually cohesive for match-cut transitions. The two videos can be combined in a match-cut, for instance, by joining the first half of \(x'\) with the second half of \(x''\). This approach enables a smooth transition between different semantic scenes while preserving consistent visual characteristics. Previous works have observed how diffusion models inherently establish general structure and color patterns in early denoising stages, with finer details and prompt-specific textures emerging in later stages.

We build on this property to generate videos amenable for match-cuts. In particular, we propose MatchDiffusion, a two-stage training-free pipeline comprising: (1) Joint Diffusion, where we set up a shared visual structure based on both prompts, followed by (2) Disjoint Diffusion, where each video independently develops the semantics corresponding to its prompt.

Pipeline Figure

Pipeline Figure

Reproducing Kubrick's Famous Match-Cuts

Our pipeline enabled us to reproduce what is arguably the most famous Match-Cuts of all time: the bone-to-spaceship transition in Stanley Kubrick’s 2001: A Space Odyssey. This iconic moment seamlessly transitions from a prehistoric tool to a futuristic spacecraft, symbolizing the leap in human evolution. Below, we present the original Match-Cuts alongside our reproduction using MatchDiffusion.

Original: Kubrick's Bone-to-Spaceship
Reproduction: MatchDiffusion

More Results from MatchDiffusion

Below, we showcase additional results generated by MatchDiffusion, demonstrating its ability to produce seamless and visually compelling Match-Cuts across a variety of prompts and transitions.

Guacamaya soaring through jungle → Colombian flag waving on hilltop
Rain on Amazon foliage → Oasis spring in desert
Volcanic crater from above → Eye blinking in close-up
Latte art bear → Bear peeking from behind tree
Sailboat gliding across lake → Hawk soaring over canyon
Driftwood in river → Sand dunes shifting in desert
Ember in a campfire → City skyline lighting up at dusk
Highway panning → Pier gliding along ocean waves
Petra's glowing canyon → Sunlit stained glass in cathedral
Barracuda slicing through water → Missile streaking through sky
Falling orange leaf → Paw print etched in ground
Snake weaving through grass → Amazon River winding through rainforest
Kaleidoscope turning → Blooming garden in spring
Ocean vortex swirling → Eye resembling a whirlpool
Vintage steam train → Tranquil river winding through forest
Wind sweeping wildflowers → Ripples spreading across pond
Market stall of spices → Painter mixing oil colors
Leaves rustling in autumn → Confetti swirling in celebration
Burning matchstick → Rocket launching
Lighthouse beam on ocean → Car headlights cutting through fog


Comparison with Baselines

Below, we compare results from MatchDiffusion with three baseline methods across three sample prompts. Each row corresponds to a prompt, and each column shows the results generated by a specific method.

Cog Vid-to-Vid

SMM

MOFT

MatchDiffusion (Ours)

Captions:


Sampling with MatchDiffusion

Below, we several alternatives for the same match-cuts by sampling with different seeds.

Captions:



Analysis on number of Joint steps (K)

Below, we show a few examples of the effect of the number of shared steps for three different samples.

K = 0

K = 10

K = 20

K = 50

Captions:



Boundary Frames Visualization

Below, we visualize the boundary frames for four different samples. The left image corresponds to the last frame of the first video, and the right image corresponds to the first frame of the second video in the Match-Cuts.

Boundary Frame 1 - After
First Frame (Video 2)
Boundary Frame 1 - Before
Last Frame (Video 1)
Boundary Frame 2 - After
First Frame (Video 2)
Boundary Frame 2 - Before
Last Frame (Video 1)
Boundary Frame 3 - After
First Frame (Video 2)
Boundary Frame 3 - Before
Last Frame (Video 1)
Boundary Frame 4 - After
First Frame (Video 2)
Boundary Frame 4 - Before
Last Frame (Video 1)

Stable Diffusion 1.5 + MatchDiffusion

We also experimented using the two MatchDiffusion paths Joint + Disjoint with the Stable Diffusion 1.5 model. Below, we visualize the frames pairs. We observe a similar pattern as the two images share overall structure while being semantically divergent.

Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before
Boundary Frame 1 - After
Boundary Frame 1 - Before

Citation

@InProceedings{Pardo2024,
  title = {MatchDiffusion: Training-free Generation of Match-Cuts},
  author = {Pardo, Alejandro and Pizzati, Fabio and Zhang, Tong and Pondaven, Alexander and Torr, Philip and Perez, Juan Camilo and Ghanem, Bernard},
  booktitle = {ArXiv Preprint},
  month = {November},
  year = {2024},
}