U-Attention to Textures: Hierarchical Hourglass Vision Transformer for Universal Texture Synthesis

Authors

Shouchang Guo, Valentin Deschaintre, Douglas Noll, Arthur Roullier

University of Michigan, Ann Arbor; Adobe Research

Portals

Abstract

We present a novel U-Attention vision Transformer for universal texture synthesis. We exploit the natural long-range dependencies enabled by the attention mechanism to allow our approach to synthesize diverse textures while preserving their structures in a single inference. We propose a hierarchical hourglass backbone that attends to the global structure and performs patch mapping at varying scales in a coarse-to-fine-to-coarse stream. Completed by skip connection and convolution designs that propagate and fuse information at different scales, our hierarchical U-Attention architecture unifies attention to features from macro structures to micro details, and progressively refines synthesis results at successive stages. Our method achieves stronger 2$\times$ synthesis than previous work on both stochastic and structured textures while generalizing to unseen textures without fine-tuning. Ablation studies demonstrate the effectiveness of each component of our architecture.

Contribution

A novel hierarchical hourglass backbone for coarseto-fine and fine-back-to-coarse processing, allowing to apply self-attention at different scales and to exploit macro to micro structures
Skip connections and convolutional layers between Transformer blocks, propagating and fusing highfrequency and low-frequency features from different Transformer stages
A 2× texture synthesis method with a single trained network generalizing to various texture complexity in a single forward inference

Related Works

Algorithmic texture synthesis; Deep-learning based texture synthesis; Transformers for images

Overview

Proposed U-Attention framework with hierarchical hourglass Transformers. We introduce a multi-scale partition of the feature map between hierarchical Transformer blocks to form input patches of different scales for different Transformers. The input texture image is first projected into feature space by an encoder. We then leverage a succession of Transformer blocks, with up and down convolutions in between (purple arrows), processing the feature maps at different resolutions. Each Transformer block takes the whole feature maps as input, and we partition the feature maps to be sequences of patches of progressively smaller or larger sizes at consecutive stages of the network. Therefore, the input patch size of all the stages forms an hourglass-like scale change (dotted blue line), enabling attention to finer/coarser details at different attention steps. Finally, we add skip connections that propagate and concatenate outputs from different previous stages as part of the inputs for later Transformer stages (yellow arrows).

PDF Preview

2202.11703

U-Attention to Textures: Hierarchical Hourglass Vision Transformer for Universal Texture Synthesis

U-Attention to Textures: Hierarchical Hourglass Vision Transformer for Universal Texture Synthesis

Authors

Portals

Abstract

Contribution

Related Works

Overview

PDF Preview

Like this:

Leave a Reply Cancel reply

U-Attention to Textures: Hierarchical Hourglass Vision Transformer for Universal Texture Synthesis

U-Attention to Textures: Hierarchical Hourglass Vision Transformer for Universal Texture Synthesis

Authors

Portals

Abstract

Contribution

Related Works

Overview

PDF Preview

Like this:

You may also Like:

NeRF-Texture: Texture Synthesis With Neural Radiance Fields

PBR-Net: Imitating Physically Based Rendering Using Deep Neural Network

A Flexible Neural Renderer for Material Visualization

Leave a Reply Cancel reply