SAN FRANCISCO, CA.- ChatGPT can now generate images and they are shockingly detailed.
On Wednesday, OpenAI, the San Francisco artificial intelligence startup, released a new version of its DALL-E image generator to a small group of testers and folded the technology into ChatGPT, its popular online chatbot.
Called DALL-E 3, it can produce more convincing images than previous versions of the technology, showing a particular knack for images containing letters, numbers and human hands, the company said.
It is far better at understanding and representing what the user is asking for, said Aditya Ramesh, an OpenAI researcher, adding that the technology was built to have a more precise grasp of the English language.
By adding the latest version of DALL-E to ChatGPT, OpenAI is solidifying its chatbot as a hub for generative AI, which can produce text, images, sounds, software and other digital media on its own. Since ChatGPT went viral last year, it has kicked off a race among Silicon Valley tech giants to be at the forefront of AI with advancements.
On Tuesday, Google released a new version of its chatbot, Bard, which connects with several of the companys most popular services, including Gmail, YouTube and Docs. Midjourney and Stable Diffusion, two other image generators, updated their models this summer.
OpenAI has long offered ways of connecting its chatbot with other online services, including Expedia, OpenTable and Wikipedia. But this is the first time the startup has combined a chatbot with an image generator.
DALL-E and ChatGPT were previously separate applications. But with the latest release, people can now use ChatGPTs service to produce digital images simply by describing what they want to see. Or they can create images using descriptions generated by the chatbot, further automating the generation of graphics, art and other media.
In a demonstration this week, Gabriel Goh, an OpenAI researcher, showed how ChatGPT can now generate detailed textual descriptions that are then used to produce images. After creating descriptions of a logo for a restaurant called Mountain Ramen, for instance, the bot generated several images from those descriptions in a matter of seconds.
The new version of DALL-E can produce images from multi-paragraph descriptions and closely follow instructions laid out in minute detail, Goh said. Like all image generators and other AI systems it is also prone to mistakes, he said.
As it works to refine the technology, OpenAI is not sharing DALL-E 3 with the wider public until next month. DALL-E 3 will then be available through ChatGPT Plus, a service that costs $20 a month.
Image-generating technology can be used to spread large amounts of disinformation online, experts have warned. To guard against that with DALL-E 3, OpenAI has incorporated tools designed to prevent problematic subjects, such as sexually explicit images and portrayals of public figures. The company is also trying to limit DALL-Es ability to imitate specific artists styles.
In recent months, AI has been used as a source of visual misinformation. A synthetic and not especially sophisticated spoof of an apparent explosion at the Pentagon sent the stock market into a brief dip in May, among other examples. Voting experts also worry that the technology could be used maliciously during major elections.
Sandhini Agarwal, an OpenAI researcher who focuses on safety and policy, said DALL-E 3 tended to generate images that were more stylized than photorealistic. Still, she acknowledged that the model could be prompted to produce convincing scenes, such as the type of grainy images captured by security cameras.
For the most part, OpenAI does not plan to block potentially problematic content coming from DALL-E 3. Agarwal said such an approach was just too broad because images could be innocuous or dangerous depending on the context in which they appear.
It really depends on where its being used, how people are talking about it, she said.
This article originally appeared in
The New York Times.