MatchDiffusion enables the automatic generation Match-Cuts, i.e., visually consistent and semantically distinct scenes. Match-diffusion is a training-free approach that employs joint and disjoint diffusion stages to balance coherence and divergence, producing high-quality transitions suitable for diverse artistic needs.
A Match-Cut is a film editing technique where two shots are seamlessly connected by matching their visual, structural, or conceptual elements. These cuts guide the viewer’s attention and create a smooth transition while maintaining narrative or aesthetic coherence. Match-Cuts are often used to highlight symbolic relationships or temporal continuity between scenes. For more Match-Cuts' examples visit the website: https://eyecannndy.com/technique/match-cut.
Given two prompts (\(\rho'\), \(\rho''\)) describing different scenes, our goal is to generate a pair of videos (\(x'\), \(x''\)) that adhere to their respective conditions while remaining visually cohesive for match-cut transitions. The two videos can be combined in a match-cut, for instance, by joining the first half of \(x'\) with the second half of \(x''\). This approach enables a smooth transition between different semantic scenes while preserving consistent visual characteristics. Previous works have observed how diffusion models inherently establish general structure and color patterns in early denoising stages, with finer details and prompt-specific textures emerging in later stages.
We build on this property to generate videos amenable for match-cuts. In particular, we propose MatchDiffusion, a two-stage training-free pipeline comprising: (1) Joint Diffusion, where we set up a shared visual structure based on both prompts, followed by (2) Disjoint Diffusion, where each video independently develops the semantics corresponding to its prompt.
Our pipeline enabled us to reproduce what is arguably the most famous Match-Cuts of all time: the bone-to-spaceship transition in Stanley Kubrick’s 2001: A Space Odyssey. This iconic moment seamlessly transitions from a prehistoric tool to a futuristic spacecraft, symbolizing the leap in human evolution. Below, we present the original Match-Cuts alongside our reproduction using MatchDiffusion.
Below, we showcase additional results generated by MatchDiffusion, demonstrating its ability to produce seamless and visually compelling Match-Cuts across a variety of prompts and transitions.
Below, we compare results from MatchDiffusion with three baseline methods across three sample prompts. Each row corresponds to a prompt, and each column shows the results generated by a specific method.
Captions:
Below, we several alternatives for the same match-cuts by sampling with different seeds.
Captions:
Below, we show a few examples of the effect of the number of shared steps for three different samples.
Captions:
Below, we visualize the boundary frames for four different samples. The left image corresponds to the last frame of the first video, and the right image corresponds to the first frame of the second video in the Match-Cuts.
We also experimented using the two MatchDiffusion paths Joint + Disjoint with the Stable Diffusion 1.5 model. Below, we visualize the frames pairs. We observe a similar pattern as the two images share overall structure while being semantically divergent.
@InProceedings{Pardo2024,
title = {MatchDiffusion: Training-free Generation of Match-Cuts},
author = {Pardo, Alejandro and Pizzati, Fabio and Zhang, Tong and Pondaven, Alexander and Torr, Philip and Perez, Juan Camilo and Ghanem, Bernard},
booktitle = {ArXiv Preprint},
month = {November},
year = {2024},
}