DreamCraft3D produces high-fidelity and coherent 3D objects.
Researchers have presented a new hierarchical 3D content generation method: DreamCraft3D. It takes a 2D picture created by a text-to-image model and turns it into a high-fidelity 3D object.
To create consistent geometries, the researchers use score distillation sampling – a way to generate samples from a diffusion model by optimizing a loss function – via a view-dependent diffusion model, but this might compromise the texture quality. To improve the texture, they propose bootstrapped score distillation.
They then train a personalized diffusion model, Dreambooth, using augmented renderings of the scene to give it a 3D understanding of the scene. "The score distillation from this 3D-aware diffusion prior provides view-consistent guidance for the scene."
"Notably, through an alternating optimization of the diffusion prior and 3D scene representation, we achieve mutually reinforcing improvements: the optimized 3D scene aids in training the scene-specific diffusion model, which offers increasingly view-consistent guidance for 3D optimization. The optimization is thus bootstrapped and leads to substantial texture boosting."
With this method, the authors try to solve the issue of inconsistency of generated 3D objects, which other models suffer from, and the results look pretty nice, at least on paper. Text-to-image diffusion models still have issues with finer details, like text, so you won't find any in the presentation. However, the 3D models made by DreamCraft3D could still streamline asset production and make artists' lives easier.
If you're interested in 2D-to-3D research, check out these articles: