MIT, Google Using Synthetic Images to Train AI Image Models
Researchers describe a new method for creating highly detailed AI images, using training data made up of AI-generated images
Upon launch, OpenAI’s DALL-E 3 wowed users with its ability to generate highly detailed images compared to prior versions. OpenAI said the model's improved ability to do so came from using synthetic images to train the model. Now, a team of researchers from MIT and Google are expanding on this concept, applying it to the popular open source text-to-image model Stable Diffusion.
In a newly published paper, the researchers described a new approach to using AI-generated images to train image generation models that they call StableRep. It uses millions of labeled synthetic images to generate high-quality images.
The researchers said StableRep is a “multi-positive contrastive learning method” where multiple images generated from the same text prompt are treated as positives for each other, which enhances the learning process. That means an AI image generation model would view several variations of, for example, a landscape and cross-reference them with all descriptions related to that landscape to recognize nuances based on those images. It would then apply them in the final output. This is what creates a highly detailed image.
Outperforms rivals
The MIT and Google researchers applied StableRep to Stable Diffusion to make it outperform rival image generation models such as SimCLR and CLIP, which were trained with the same text prompts and corresponding real images.
StableRep achieved 76.7% linear accuracy on the ImageNet classification with a Vision Transformer model. Adding language supervision, the researchers found that StableRep, trained on 20 million synthetic images, outperformed CLIP, which was trained on 50 million real images.
Lijie Fan, a doctoral candidate at MIT and lead researcher, said that their technique is superior as it “not just feeding it data.” “When multiple images, all generated from the same text, all treated as depictions of the same underlying thing, the model dives deeper into the concepts behind the images, say the object, not just their pixels.”
StableRep does have its flaws. For example, it is slow to generate images. It also gets confused on semantic mismatches between text prompts and the resultant images.
StableRep’s underlying model, Stable Diffusion, also needed to go through an initial round of training on real data – so using StableRep to create images will take longer and likely be costlier.
Access StableRep
StableRep can be accessed via GitHub.
It is available for commercial use – StableRep is under an Apache2.0 License, meaning you can use it and produce derivative works.
However, you would have to provide a copy of the Apache License with any redistributed work or derivative works and include a notice of the changes. The license also includes a limitation of liability, where contributors are not liable for any damages arising from the use of the licensed work.
This article first appeared on IoT World Today's sister site, AI Business.
Read more about:
AsiaAbout the Author
You May Also Like