Instruct the model to get a different result.
Researchers from the University of California, Berkeley presented a new method for editing images based on text instructions called InstructPix2Pix. You just need to give the system a picture and a text prompt in form of an order to get a different image.
To gather training data for the method, the creators combined the GPT-3 language model and Stable Diffusion's text-to-image capabilities to produce a vast dataset of image editing examples. InstructPix2Pix generalizes to real images and user-written instructions at inference time.
The model can edit images quickly as it performs edits in the forward pass and doesn't need er-example fine-tuning or inversion.
While InstructPix2Pix produces pretty accurate new images, it still suffers from biases from the data and models it is based upon. For example, flight attendants are usually women in the results and doctors are men.
The model also can't perform viewpoint changes, can make undesired excessive changes, sometimes fails to isolate the specified object, and has difficulty reorganizing or swapping objects with each other.