Authors
Can Wang, Ruixiang Jiang, Menglei Chai, Mingming He, Dongdong Chen, Jing Liao
City University of Hong Kong; The Hong Kong Polytechnic University; Snap Inc.; Netflix; Microsoft Cloud AI; Microsoft Cloud AI
Portals
Summary
Our NeRF-Art stylizes a pre-trained NeRF to match the desired style described by a text prompt. It modulates not only appearance but also geometry of NeRF.
Abstract
As a powerful representation of 3D scenes, Neural radiance fields (NeRF) enable high-quality novel view synthesis given a set of multi-view images. Editing NeRF, however, remains challenging, especially on simulating a text-guided style with both the appearance and the geometry altered simultaneously. In this paper, we present NeRF-Art, a text-guided NeRF stylization approach that manipulates the style of a pre-trained NeRF model with a single text prompt. Unlike previous approaches that either lack sufficient geometry deformations and texture details or require meshes to guide the stylization, our method can shift a 3D scene to the new domain characterized by desired geometry and appearance variations without any mesh guidance. This is achieved by introducing a novel global-local contrastive learning strategy, combined with the directional constraint to simultaneously control both the trajectory and the strength of the target style. Moreover, we adopt a weight regularization method to effectively suppress the cloudy artifacts and the geometry noises when transforming the density field for geometry stylization. Through extensive experiments on various styles, our method is demonstrated to be effective and robust regarding both single-view stylization quality and cross-view consistency.
Related Works
Neural Style Transfer on Images and Videos; Neural Stylization on Explicit 3D Representations; Neural Stylization on NeRF; Text-Driven Stylization
Comparisons
NeuS, VolSDF, CLIP-NeRF, DreamField, StyleGAN-NADA
Overview
In the reconstruction stage, our method first pre-trains the NeRF model of the target scene from multi-view input with reconstruction loss. In the stylization stage, our method stylized NeRF model, guided by a text prompt, using a combination of relative directional loss and global-local contrastive loss in the CLIP embedding space, plus weight regularization loss and perceptual loss.