Boosting Creativity: Prompt Engineering | a16z Podcast ft. Steph Smith 

On the latest episode of the a16z Podcast, host Steph Smith sits down with Guy Parsons (@GuyP) to discuss the growing importance of prompt engineering in the age of AI. As AI continues to change modern life and the job market, creative roles like prompt engineering are emerging to work alongside the technology Check out Guy Parsons’ DALL-E 2 Prompt Book

Host: Steph Smith (@stephsmithio)

The DALL-E2 Prompt Book was created as a collection of cool examples and terms used to create amazing visual effects using DALL-E 2, which is OpenAI’s text-to-image model The Prompt Book was essentially a slide deck that grew to be 8100 slides long Guy shared the Prompt Book online as a jumping-off point for people to realize the kind of stuff that these tools were capable of

Guy estimates that he has spent a couple of hundred hours mastering the idea of prompting in Midjourney, DALL-E 2, and Stable Diffusion, but he wouldn’t say he’s a master Some people have done thousands or even hundreds of thousands of prompts using these tools The capabilities of these tools have advanced significantly in the last six months

Prompt engineering is a new field, and nobody could definitively say they’re an expert yet

Steph asks if there is a parallel skill set in prompt engineering that is similar to other skills such as coding, effective storytelling, and processing numbers in Excel Guy mentions an era where there was a category of people who were good at googling stuff and the ability to use specific search queries to find information

There is a debate over whether there is any artistry in text imaging, but Guy believes there is something about discovering an image that hasn’t quite existed until it’s manifested through words Steph brings up the abundance of information online and how it’s a skill to learn how to parse and surface what others find interesting using tools such as subreddit stats, Ahrefs, and other data sets

Steph asks Guy if there are certain learnings or an 80/20 approach to becoming a good prompt engineer Guy explains that when you’re new to using these tools, the best way to understand how they work is to describe something as if it already exists like it’s an image in a downloadable clip art library or a photography gallery He emphasizes the importance of using natural language that mimics the kind of descriptions you would see in those contexts, as this gives the tools a sense of what you’re looking for and what prompts work well

Guy notes that AI tools are generally bad at describing images in great detail (e.g., what people are wearing), but are good at describing the general topic or concept of the image Steph points out that this is how these AIs were trained, by using Alt text from online images and using those as descriptors

Guy suggests that someone looking to improve their prompting skills could review the Alt text on different images online to see how things are described, and how an AI might interpret a given prompt

Steph notes that the level of detail required in a prompt can be surprising and that it’s easy to underestimate the number of iterations that might come back from a seemingly simple prompt Guy agrees and adds that longer prompts tend to have diminishing returns and that his prompt book includes many different ways to describe a shot (e.g., camera angle, time period, artistry, artist)

Steph asks about using specific artists’ work to train new images, and Guy acknowledges that there’s some controversy around that approach

New ways of prompting are emerging and evolving all the time, offering more tools and options for users to leverage in their creative projects

One of the most significant developments in prompting tools is the ability to prompt with images This is not simply combining images and words like in Photoshop, but generating prompts based on images and their features Using images as prompts can lead to surprising and unexpected results that may be difficult to control but can also offer new and interesting opportunities for creativity For example, one can create abstract designs using brand colors or personal photos and then multiply that baseline with custom prompts to create a unique visual base

Another major development in prompting is the rise of selfie culture, which has prompted many AI-powered tools to help users generate more selfies and profile pictures based on their features In the image-to-image space, some startups are doing interesting things with image generation, allowing users to input core images and then generate infinite versions of those images based on specific modifiers

With access to prompt libraries and the ability to input images, users are no longer starting from scratch when using these tools They have a baseline to work from, which can be customized with specific prompts to achieve their desired output However, controlling the output can be challenging since users are relying on AI to understand their intentions and generate the desired output It takes time and practice to learn how to refine prompts to get a higher throughput of desired images versus undesired ones

AI models are like a black box, making it difficult to fine-tune or understand every little piece that goes into the input and output Inputting the same prompt into an AI model doesn’t necessarily result in the same output because it starts from a random cloud of noise When testing different prompts, it’s challenging to differentiate whether the result is good or just lucky Some people fall into the trap of generating prompts again and again, hoping to get better results like pulling an AI slot machine

Evidence and other communities can help to learn from other people’s work and prompts to better understand what works and what doesn’t Negative prompts and glitches can occur, such as the infamous hand glitch in generating images of people

The limitations of AI models are that they struggle with specific tasks, and there are still glitches in the matrix

Some models, like DALL-E 2, struggle with the understanding they are drawing things in a square, but users can upload a border image to force it to think inside the box Other models, like Midjourney, have solved the composition problem by understanding the possibilities and limitations of AI and the prompt engineering process

Three popular models: Midjourney, DALL-E 2, and Stable Diffusion The ability to prompt within each model is similar to switching between Excel and Google Sheets Differences between the models are like learning different languages, with similar principles and some variations in newer models Midjourney does the heavy lifting to help create high-quality output, while Stable Diffusion has a larger dataset

Fine-tuning and creative decisions are made on top of the models to optimize them Like driving different cars, some models are more responsive than others Sometimes, another tool is needed to achieve the final refinement of an image, such as Facetune or inpainting/outpainting

The abundance of raw but imperfect materials creates opportunities for new tools and improvements to existing ones Some effects, like a vintage film look, are easier to achieve with other tools such as iPhone apps

Potential for more models to be developed using open-source stable diffusion The challenge and opportunity is to go beyond the text box and create something more user-friendly and inspiring that matches how people think

Designers find it hard when clients can’t explain what they want, and AI models are in the same position Possibility of a conversational interface for AI generation, with the generation happening fast enough to show multiple options and directions

The Prompt book helped with understanding metaphysical painting and the code of chrome, but some other aesthetics and styles have no name Visual art expresses things that cannot be put into words, and the goal is to unleash the inexplicable and undefinable

A better onboarding experience that guides new prompters on how different prompts can fit together would be useful Potential for creating a zip file of a mood board and training AI to work with that particular concept Embedding tricks can be used to train AI with style instead of just faces

Interest in a version of the product where users can upload brand images or colors and iterate with AI to create images that match their brand

Learning with AI tools can bring about personal experiences that help to surface things that were never considered before Using AI tools has two modes: waiting to see what the model shows or visualizing it in your mind and rejecting what doesn’t work

Allowing the AI model to take you where it wants to go can lead to a completely different and unexpected outcome

The variations tool in DALL-E 2 can generate four more images that are similar to the original image Repeatedly using the variations tool can lead to a psychedelic dream-like visual journey

AI tools have practical applications beyond just creating interesting art Some examples include using generated images for blog post sharing or designing products like sneakers Some uses of AI tools may not be explicitly advertised due to ethical and legal considerations

There is potential for AI-generated content to surpass traditional forms of entertainment like Netflix or Instagram AI tools could potentially integrate with 3D printing to create real-life products

There is a debate around the value of AI-generated content compared to traditional forms of art and design, but there are many different levels at which we engage with visual components in everyday life

The idea of a “prompt engineer” may only be mastered by a few individuals, making them more valuable in the field On the other hand, as technology becomes more advanced, anyone can learn to prompt reasonably well, making it a fundamental skill set similar to reading and writing

The development of foundational tools in the AI industry may incentivize making prompt engineering a skill that anyone can do well However, there will still be people who specialize in prompt engineering and explore the boundaries of what’s possible, similar to those who specialize in wood whittling or animating hair

There may also be a need for people who specialize in “secret prompting,” such as copywriters who add a layer of prompts to the AI that consumers don’t see Just like in the music or film industry, there will likely be a range of niche careers in the AI industry, such as prompt engineers who specialize in hair or hands or enterprise SaaS companies

The concept of a “10x prompt engineer” may become a common metaphor in the tech world, similar to the idea of a “10x recording engineer” in the music industry As the AI industry grows, there will be a range of careers available that are not yet even imagined

Steph and Guy discuss the idea of the most popular art or imagery shared online

Steph says that as someone who spends a lot of time on Twitter, memes come to mind as the most popular image She explains that memes are a basic form of imagery consisting of an image with capitalized text on it What people resonate with is not necessarily the most refined or extravagant type of imagery