AI Image Generators Can Reveal the Hidden Artist in All of Us

Doron Fagelson
6 min readMay 3, 2023

--

In January 2021, OpenAI released Dall-E, a powerful AI software that can generate images based on simple user descriptions. The software drew widespread attention for its ability to produce, in a matter of minutes, impressive and often surreal images that were qualitatively on par with the work of skilled illustrators or designers. Since its release, the Internet has been flooded with a plethora of unique and imaginative artworks showcasing the dizzying creative potential of new AI image generators.

The technology behind AI-generated imagery is advancing rapidly, with tools now capable of creating photorealistic human faces and complex scenes with great accuracy. Some call this a pivotal moment in art history that not only brings great promise but raises hard questions of plagiarism and authorship and how to stem the spread of misinformation by these tools.

However, while AI-powered products pose difficult legal and ethical challenges, they can also help spark new forms of creativity and expression and improve productivity. This is the first of a two-part series investigating both the powers and the dangers of these tools in the context of the art industry. In part one, we will look at the mechanics and capabilities of AI image generators and how they may influence and shape how art is made in the future.

How Do AI Image Generators Work?

The AI image generation process involves the use of deep learning models typically based on neural networks or diffusion models.

GANs

Early-stage AI image generators relied on Generative Adversarial Networks (GANs), which consist of two neural networks working together: a “generator” network and a “discriminator” network. The generator network learns to create synthetic data samples, such as images or sounds that are similar to real data, while the discriminator network learns to distinguish between the synthetic and real data.

During training, the generator network attempts to produce synthetic data that the discriminator network cannot distinguish from real data, while the discriminator network tries to correctly identify whether a given sample is real or synthetic. The two networks compete against each other in a game-like fashion, with the generator network attempting to fool the discriminator network, and the discriminator network attempting to accurately identify synthetic data.

Over time, the generator network becomes increasingly adept at creating realistic synthetic data, while the discriminator network becomes increasingly adept at distinguishing between real and synthetic data. This back-and-forth competition between the two networks leads to the development of highly realistic synthetic data that can be used for a variety of applications, such as image and video generation, text-to-image synthesis, and data augmentation for machine learning tasks.

CLIP

The neutral network model known as Contrastive Language-Image Pre-Training (CLIP), developed by OpenAI, advances the AI image generation process by creating images from text descriptions. CLIP learns to associate textual and visual information in a way that allows it to perform a wide range of tasks related to natural language understanding and computer vision.

The basic idea behind CLIP is to use a large-scale pre-training approach to learn a rich and diverse representation of images and their corresponding textual descriptions. To do this, CLIP is trained on a massive dataset of image-text pairs, where the goal is to teach the model to predict whether a given image and its textual description match each other or not.

During training, CLIP learns to represent images and textual descriptions as high-dimensional vectors in a shared embedding space. This allows CLIP to perform a wide range of tasks, such as image classification, object detection, and image generation, simply by comparing the similarity of the representations of the image and the textual description.

Diffusion Models

In a diffusion model, noise is diffused across the image to create new features and details.

More specifically, the diffusion process in a diffusion model involves iteratively applying a sequence of noise-dependent transformations to the image. At each step, the noise is spread across the image using a transformation that is designed to smooth out the noise and create new image features.

The result of each diffusion step is a slightly blurred version of the image, which contains new features and details that were not present in the original image. As the diffusion process proceeds, the image becomes increasingly clear and detailed, until it eventually converges to the final image.

Language at the Heart of AI Generative Tech

What may come as a surprise is that language plays a central role in the AI image-generation process.

David Holz, the founder of MidJourney, believes that language and images are closely connected and that computers may learn to understand them better in tandem rather than separately. He compares the process of talking to AI systems and generating images to converting spoken language into visual language, akin to how Google Translate translates from one language to another.

In practice, leading AI-powered image generation platforms like DALL-E 2, MidJourney, and Stable Diffusion are trained on massive datasets containing millions of images to learn particular styles or aesthetics and corresponding text descriptions, so these systems could create unique images in response to text prompts.

AI Can Elevate Human Creativity

Every new technology wave ignites some degree of hostility at birth, and AI text-to-image generators are not immune to this trend. The reaction of some artists, photographers, and graphic designers to the power of this new AI has been nothing less than an existential fear of losing their livelihoods. Others have a more sanguine perspective: Brennan Buck, a practicing architect, sees AI image generators as a valuable tool for creators, much like other types of breakthrough technologies to have gone mainstream, such as pencils or paints, Photoshop, and 3D modeling software.

Credits: A still image from Refik Anadol’s giant LED wall, “Living Paintings Immersive Editions,” at Jeffrey Deitch. (Refik Anadol Studio)

While AI may become central to the ongoing evolution of new, visually appealing art forms, the human factor is nonetheless critical. AI-generated art is a collaborative process combining input from creators with machine algorithms to conjure original outputs that may then be subsequently tweaked, refined, or modified further by the creator. In the words of Diego Conte Peralta, a computer graphics artist based in Madrid who has experimented with AI-image generators and edits their results rather than treating AI-generated images as a final product, “That’s much more interesting for me because you can go places where even AI cannot, and the output still has a human element.” Hence, AI serves as a complementary tool in the creative process, but ultimately, the human touch remains essential in shaping the output of these AI tools. In other words, the role of AI in the particulars of how art is made is not meant to replace human creativity but rather to augment and empower it.

Conclusion

While there are legitimate concerns and fears about the unchecked power of AI, we should be open to exploring the creative opportunities afforded to us by this technology. The most profound lesson from AI image generators is that creativity need not be thought of as an elusive, supernatural force reserved for the few but rather an accessible and rewarding phenomenon that can enrich our human endeavours beyond our imaginations.

As more AI-powered technology products reach the market and their adoption surges, it is crucial to understand their real risks and to find ways to mitigate them. In the next article in this series, we will explore these risks, their implications and potential consequences, and what to do about them in depth.

Author: Doron Fagelson,
Vice President of Media and Entertainment Practice at
DataArt

Originally published on https://www.dataart.com/blog/

--

--

Doron Fagelson
Doron Fagelson

Written by Doron Fagelson

Doron Fagelson is an Engagement Manager in the Media and Entertainment Practice at DataArt.

No responses yet