GAUDI: Apple's Neural Network for Generating 3D Scenes

The network is capable of generating 3D indoor scenes by using text prompts.

A team of researchers from Apple has introduced GAUDI, a new neural network that can generate realistic-looking 3D scenes from text prompts. According to the team, GAUDI obtains state-of-the-art performance across multiple datasets for unconditional generation and enables conditional generation of 3D scenes from text descriptions or RGB images.

The research paper shared by the team states that the model is composed of two stages: latent representation optimization and generative modeling. To achieve the former, the team designed a decoder with three modules, a scene decoder, a camera pose decoder, and a radiance field, with the parameters of all the modules and the latents for scene and camera poses being optimized in the first stage. During the second stage, the team learns a score-based generative model in latent space.

"We present qualitative results for both unconditional and conditional generative modeling," commented the team. "During inference, we sample latents from the generative model and feed them through the decoder to obtain a radiance field and camera path. In the conditional setting, we train the generative model using pairs of latents and conditioning variables (like text or images) and sample latents given conditioning variables during inference."

The team also plans to release the code behind the model in the following weeks. You can learn more about GAUDI here.

Also, don't forget to join our Reddit page and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.

Published 08 August 2022
Theodore McKenzie
Head of Content