Authors
Zhiwen Fan, Yifan Jiang, Peihao Wang, Xinyu Gong, Dejia Xu, Zhangyang Wang
The University of Texas at Austin
Portals
Abstract
Representing visual signals by implicit representation (e.g., a coordinate based deep network) has prevailed among many vision tasks. This work explores a new intriguing direction: training a stylized implicit representation, using a generalized approach that can apply to various 2D and 3D scenarios. We conduct a pilot study on a variety of implicit functions, including 2D coordinate-based representation, neural radiance field, and signed distance function. Our solution is a Unified Implicit Neural Stylization framework, dubbed INS. In contrary to vanilla implicit representation, INS decouples the ordinary implicit function into a style implicit module and a content implicit module, in order to separately encode the representations from the style image and input scenes. An amalgamation module is then applied to aggregate these information and synthesize the stylized output. To regularize the geometry in 3D scenes, we propose a novel self-distillation geometry consistency loss which preserves the geometry fidelity of the stylized scenes. Comprehensive experiments are conducted on multiple task settings, including novel view synthesis of complex scenes, stylization for implicit surfaces, and fitting images using MLPs. We further demonstrate that the learned representation is continuous not only spatially but also style-wise, leading to effortlessly interpolating between different styles and generating images with new mixed styles. Please refer to the video on our project page for more view synthesis results: https://zhiwenfan.github.io/INS.
Contribution
- We propose INS, a unified implicit neural stylization framework, consists of a style implicit module, a content implicit module, and an amalgamation module, which enables us to synthesize promising stylized scenes under multiple 2D and 3D implicit representations
- We conduct comprehensive experiments on several popular implicit representation frameworks in this novel stylization setting, including 2D coordinatebased framework (SIREN), Neural Radiance Field (NeRF), and Signed Distance Functions (SDF). The rendering results are found to be more consistent, in both shape and style details, from different views
- We further demonstrate that INS is able to learn representations that are continuous not only with regard to spatial placements (including views), but also in the style space. This leads to effortlessly interpolating between different styles and generating images rendered by the new mixed styles
Related Works
Implicit Function; Implicit 3D Scene Representation; Stylization
Comparisons
Style3D, Adain
Overview
We took SDF with the proposed INS, for example, it inputs with implicit coordinates along with ray directions and style embeddings. Style Implicit Module (SIM) and Content Implicit Module (CIM) are used to extract conditional implicit style features and implicit scene features. Amalgamate Module (AM) is applied to fuse features in the two spaces, generating styliezed density and color intensity of each 3D point. An implicit rendering step is applied on the top of AM (i.e., Volume rendering for NeRF, surface rendering for SDF) to render the pixel intensity.