Authors
Jie Guo, Shuichang Lai, Qinghao Tu, Chengzhi Tao, Changqing Zou, Yanwen Guo, Jie Guo
Nanjing University; Zhejiang University
Portals
Summary
We propose an end-to-end framework to recover UHR SVBRDF maps from a single input image with a UHR of 4K (4096 × 4096) that cannot be processed by most existing learning-based methods properly. Our method is able to recover high-quality full-size SVBRDF maps that preserve both fine spatial details and consistent global structures.
Abstract
Existing convolutional neural networks have achieved great success in recovering Spatially Varying Bidirectional Surface Reflectance Distribution Function (SVBRDF) maps from a single image. However, they mainly focus on handling low-resolution (e.g., 256 × 256) inputs. Ultra-High Resolution (UHR) material maps are notoriously difficult to acquire by existing networks because (1) finite computational resources set bounds for input receptive fields and output resolutions, and (2) convolutional layers operate locally and lack the ability to capture long-range structural dependencies in UHR images. We propose an implicit neural reflectance model and a divide-and-conquer solution to address these two challenges simultaneously. We first crop a UHR image into low-resolution patches, each of which are processed by a local feature extractor to extract important details. To fully exploit long-range spatial dependency and ensure global coherency, we incorporate a global feature extractor and several coordinate-aware feature assembly modules into our pipeline. The global feature extractor contains several lightweight material vision transformers that have a global receptive field at each scale and have the ability to infer long-term relationships in the material. After decoding globally coherent feature maps assembled by coordinate-aware feature assembly modules, the proposed end-to-end method is able to generate UHR SVBRDF maps from a single image with fine spatial details and consistent global structures.
Contribution
- We propose a consistent and implicit neural representation for SVBRDF recovery that can preserve both global structures and local details
- We leverage the MVT, a convolution-augmented vision transformer, to extract rich global features from the UHR input, providing the "global environment" of the material
- We design a CAFA module to assemble "local views" of the underlying material in the feature space, guaranteeing spatial coherency
- Our method is able to recover material maps as large as 4K, which is a challenge for previous learning-based methods
Related Works
SVBRDF Recovery from Multiple Images; SVBRDF Recovery from Single Images; Vision Transformers
Comparisons
HANet, Guided