close
close

Google’s ReCapture allows users to change camera angles in existing videos


Summary

Google researchers have developed ReCapture, a new AI technique that allows users to change camera movements in videos after recording. The system aims to provide professional video editing capabilities to casual users.

Changing camera angles in existing footage has traditionally been a challenge. Current methods often struggle to maintain complex motion and detail when processing different types of video content.

Instead of using an explicit 4D representation as an intermediate step, ReCapture draws on the motion knowledge stored in generative video models. The researchers reformulated the task as video-to-video translation using stable video diffusion.

Video: Zhang et al.

Advertisement

Two-stage process combines temporal and spatial layers

ReCapture works in two phases. First, an “anchor video” is created – an initial version of the desired output with new camera movements. This preliminary version may contain some timing inconsistencies and visual artifacts.

To generate the anchor video, the system can use diffusion models such as CAT3D, which create videos from multiple angles. Alternatively, the anchor can be generated through frame-by-image depth estimation and point cloud rendering.

Flowchart: Two-stage video synthesis architecture with anchor video generation and LoRA-based fine-tuning for motion control.
The ReCapture architecture combines spatial and temporal LoRA modules to improve video synthesis. The system uses anchor video and masking for precise motion control and contextual imaging. | Image: Zhang et al.

In the second phase, ReCapture applies fine-tuning to masked videos. This step uses a generative video model that is trained on existing footage to produce realistic movements and temporal changes.

The system includes a temporal Low-Rank Adaptation (LoRA) layer to optimize the model for the input video. This layer specifically handles temporal changes and allows the model to understand and reproduce the specific dynamics of the anchor video without requiring a complete retraining of the model.

Image gallery with six rows of video sequences: butterfly on flower, tiger, photographed drinks, Pomeranian dog, swan in water and transformation from car to robot.
ReCapture makes it possible to subsequently change camera perspectives in existing videos. The example sequences demonstrate these changes in perspective for various subjects – from nature shots to technical scenes. | Image: Zhang et al.

A spatial LoRA layer ensures that image details and content remain consistent with the new camera movements. The generative video model can perform zooming, panning and tilting while maintaining the characteristic movements of the original video.

Recommendation

NeurIPS 2023: These are the top papers and prize winners from the largest AI conference

NeurIPS 2023: These are the top papers and prize winners from the largest AI conference

The project website and research paper provide additional technical details, including post-processing techniques such as SDEdit to improve image quality and reduce blur.

Generative AI for videos is still experimental

While the researchers view their work as a step toward user-friendly video manipulation, ReCapture remains a research project that is far from a commercial release. Google hasn’t yet launched any of its many video AI projects, but its Veo project could be close.

Meta also recently introduced its Movie Gen model, but does not market it like Google. And let’s not dwell on Sora, OpenAI’s video frontier model that was unveiled earlier this year but hasn’t been seen since. Currently, startups like Runway are leading the video AI market, having launched their latest Gen 3 alpha model last summer.

You may also like...