Authors
Oscar Michel, Anand Bhattad, Eli VanderBilt, Ranjay Krishna, Aniruddha Kembhavi, Tanmay Gupta
Allen Institute for Artificial Intelligence; University of Illinois Urbana-Champaign; University of Washington
Portals
Summary
We present 3DIT, a model to edit individual objects in the context of a rich scene with language conditioning. 3DIT is able to effectively edit objects while considering their scale and viewpoint, is able to add, remove and edit shadows to be consistent with the scene lighting and is able to account for object occlusions. Training on our new benchmark OBJECT, 3DIT remarkably generalizes to images in the CLEVR dataset as well as the real world.
Abstract
We present OBJect-3DIT: a dataset and model for the 3D-aware image editing. 3D-aware image editing is the task of editing an image in a way that is consistent with a corresponding transformation in the image\'s underlying 3D scene. For example, different from rotating an object in the screen space of an image, a user might want to rotate that object around some axis that exists in the 3D scene. Although only the two-dimesnional image is edited in this task, a 3D-aware editing model must aquire an understanding of the complex interactions between scene objects, light and camera perspective, as well as language in our setting. We allow the user to specify an object to edit with a natural language description, in addition to providing numerical information like an exact rotation angle or a point location on the image. This combination of an intuitive language interface with precise geometric control over the edit yields an editing system that is easy to use yet highly expressive and controllable.
Related Works
Image editing with generative models; 3D-aware image editing; Scene rearrangement; 3D asset datasets; Synthetic datasets for vision models; Benchmarks for image editing
Comparisons
VisProg, Socratic models