Jiahua Dong; Yu-Xiong Wang;
University of Illinois Urbana-Champaign
Portals
Abstract
We introduce ViCA-NeRF, a view-consistency-aware method for 3D editing with text instructions. In addition to the implicit NeRF modeling, our key insight is to exploit two sources of regularization that explicitly propagate the editing information across different views, thus ensuring multi-view consistency. As geometric regularization, we leverage the depth information derived from the NeRF model to establish image correspondence between different views. As learned regularization, we align the latent codes in the 2D diffusion model between edited and unedited images, enabling us to edit key views and propagate the update to the whole scene. Incorporating these two regularizations, our ViCA-NeRF framework consists of two stages. In the initial stage, we blend edits from different views to create a preliminary 3D edit. This is followed by a second stage of NeRF training that is dedicated to further refining the scene’s appearance. Experiments demonstrate that ViCA-NeRF provides more flexible, efficient(3 times faster) editing with higher levels of consistency and details, compared with the state of the art.
Related Works
Text-to-image diffusion models for 2D editing; Implicit 3D Representation; 3D Generation; NeRF Editing