On the latest episode of the a16z Podcast, host Steph Smith sits down with Guy Parsons (@GuyP) to discuss the growing importance of prompt engineering in the age of AI. As AI continues to change modern life and the job market, creative roles like prompt engineering are emerging to work alongside the technology
  • Check out Guy Parsons’ DALL-E 2 Prompt Book
Host: Steph Smith (@stephsmithio) 
The DALL-E2 Prompt Book was created as a collection of cool examples and terms used to create amazing visual effects using DALL-E 2, which is OpenAI’s text-to-image model
  • The Prompt Book was essentially a slide deck that grew to be 8100 slides long
  • Guy shared the Prompt Book online as a jumping-off point for people to realize the kind of stuff that these tools were capable of
Guy estimates that he has spent a couple of hundred hours mastering the idea of prompting in Midjourney, DALL-E 2, and Stable Diffusion, but he wouldn’t say he’s a master
  • Some people have done thousands or even hundreds of thousands of prompts using these tools
  • The capabilities of these tools have advanced significantly in the last six months
Prompt engineering is a new field, and nobody could definitively say they’re an expert yet
Steph asks if there is a parallel skill set in prompt engineering that is similar to other skills such as coding, effective storytelling, and processing numbers in Excel
  • Guy mentions an era where there was a category of people who were good at googling stuff and the ability to use specific search queries to find information
There is a debate over whether there is any artistry in text imaging, but Guy believes there is something about discovering an image that hasn’t quite existed until it’s manifested through words
  • Steph brings up the abundance of information online and how it’s a skill to learn how to parse and surface what others find interesting using tools such as subreddit stats, Ahrefs, and other data sets
Steph asks Guy if there are certain learnings or an 80/20 approach to becoming a good prompt engineer
  • Guy explains that when you’re new to using these tools, the best way to understand how they work is to describe something as if it already exists like it’s an image in a downloadable clip art library or a photography gallery
  • He emphasizes the importance of using natural language that mimics the kind of descriptions you would see in those contexts, as this gives the tools a sense of what you’re looking for and what prompts work well
Guy notes that AI tools are generally bad at describing images in great detail (e.g., what people are wearing), but are good at describing the general topic or concept of the image
  • Steph points out that this is how these AIs were trained, by using Alt text from online images and using those as descriptors
Guy suggests that someone looking to improve their prompting skills could review the Alt text on different images online to see how things are described, and how an AI might interpret a given prompt
Steph notes that the level of detail required in a prompt can be surprising and that it’s easy to underestimate the number of iterations that might come back from a seemingly simple prompt
  • Guy agrees and adds that longer prompts tend to have diminishing returns and that his prompt book includes many different ways to describe a shot (e.g., camera angle, time period, artistry, artist)
Steph asks about using specific artists’ work to train new images, and Guy acknowledges that there’s some controversy around that approach
New ways of prompting are emerging and evolving all the time, offering more tools and options for users to leverage in their creative projects
One of the most significant developments in prompting tools is the ability to prompt with images 
  • This is not simply combining images and words like in Photoshop, but generating prompts based on images and their features
  • Using images as prompts can lead to surprising and unexpected results that may be difficult to control but can also offer new and interesting opportunities for creativity
    • For example, one can create abstract designs using brand colors or personal photos and then multiply that baseline with custom prompts to create a unique visual base
Another major development in prompting is the rise of selfie culture, which has prompted many AI-powered tools to help users generate more selfies and profile pictures based on their features
  • In the image-to-image space, some startups are doing interesting things with image generation, allowing users to input core images and then generate infinite versions of those images based on specific modifiers
With access to prompt libraries and the ability to input images, users are no longer starting from scratch when using these tools
  • They have a baseline to work from, which can be customized with specific prompts to achieve their desired output
  • However, controlling the output can be challenging since users are relying on AI to understand their intentions and generate the desired output 
  • It takes time and practice to learn how to refine prompts to get a higher throughput of desired images versus undesired ones
AI models are like a black box, making it difficult to fine-tune or understand every little piece that goes into the input and output
  • Inputting the same prompt into an AI model doesn’t necessarily result in the same output because it starts from a random cloud of noise
  • When testing different prompts, it’s challenging to differentiate whether the result is good or just lucky
  • Some people fall into the trap of generating prompts again and again, hoping to get better results like pulling an AI slot machine
Evidence and other communities can help to learn from other people’s work and prompts to better understand what works and what doesn’t
Negative prompts and glitches can occur, such as the infamous hand glitch in generating images of people
The limitations of AI models are that they struggle with specific tasks, and there are still glitches in the matrix
Some models, like DALL-E 2, struggle with the understanding they are drawing things in a square, but users can upload a border image to force it to think inside the box
  • Other models, like Midjourney, have solved the composition problem by understanding the possibilities and limitations of AI and the prompt engineering process
Three popular models: Midjourney, DALL-E 2, and Stable Diffusion
  • The ability to prompt within each model is similar to switching between Excel and Google Sheets
  • Differences between the models are like learning different languages, with similar principles and some variations in newer models
  • Midjourney does the heavy lifting to help create high-quality output, while Stable Diffusion has a larger dataset
Fine-tuning and creative decisions are made on top of the models to optimize them
  • Like driving different cars, some models are more responsive than others
  • Sometimes, another tool is needed to achieve the final refinement of an image, such as Facetune or inpainting/outpainting
The abundance of raw but imperfect materials creates opportunities for new tools and improvements to existing ones
  • Some effects, like a vintage film look, are easier to achieve with other tools such as iPhone apps
Potential for more models to be developed using open-source stable diffusion
The challenge and opportunity is to go beyond the text box and create something more user-friendly and inspiring that matches how people think
Designers find it hard when clients can’t explain what they want, and AI models are in the same position
Possibility of a conversational interface for AI generation, with the generation happening fast enough to show multiple options and directions
The Prompt book helped with understanding metaphysical painting and the code of chrome, but some other aesthetics and styles have no name
Visual art expresses things that cannot be put into words, and the goal is to unleash the inexplicable and undefinable
A better onboarding experience that guides new prompters on how different prompts can fit together would be useful
Potential for creating a zip file of a mood board and training AI to work with that particular concept
Embedding tricks can be used to train AI with style instead of just faces
Interest in a version of the product where users can upload brand images or colors and iterate with AI to create images that match their brand
Learning with AI tools can bring about personal experiences that help to surface things that were never considered before
Using AI tools has two modes: waiting to see what the model shows or visualizing it in your mind and rejecting what doesn’t work
Allowing the AI model to take you where it wants to go can lead to a completely different and unexpected outcome
The variations tool in DALL-E 2 can generate four more images that are similar to the original image
  • Repeatedly using the variations tool can lead to a psychedelic dream-like visual journey
AI tools have practical applications beyond just creating interesting art
  • Some examples include using generated images for blog post sharing or designing products like sneakers
  • Some uses of AI tools may not be explicitly advertised due to ethical and legal considerations
There is potential for AI-generated content to surpass traditional forms of entertainment like Netflix or Instagram
  • AI tools could potentially integrate with 3D printing to create real-life products
There is a debate around the value of AI-generated content compared to traditional forms of art and design, but there are many different levels at which we engage with visual components in everyday life
The idea of a “prompt engineer” may only be mastered by a few individuals, making them more valuable in the field
  • On the other hand, as technology becomes more advanced, anyone can learn to prompt reasonably well, making it a fundamental skill set similar to reading and writing
The development of foundational tools in the AI industry may incentivize making prompt engineering a skill that anyone can do well
  • However, there will still be people who specialize in prompt engineering and explore the boundaries of what’s possible, similar to those who specialize in wood whittling or animating hair
There may also be a need for people who specialize in “secret prompting,” such as copywriters who add a layer of prompts to the AI that consumers don’t see
  • Just like in the music or film industry, there will likely be a range of niche careers in the AI industry, such as prompt engineers who specialize in hair or hands or enterprise SaaS companies
The concept of a “10x prompt engineer” may become a common metaphor in the tech world, similar to the idea of a “10x recording engineer” in the music industry
As the AI industry grows, there will be a range of careers available that are not yet even imagined
Steph and Guy discuss the idea of the most popular art or imagery shared online
Steph says that as someone who spends a lot of time on Twitter, memes come to mind as the most popular image
  • She explains that memes are a basic form of imagery consisting of an image with capitalized text on it
  • What people resonate with is not necessarily the most refined or extravagant type of imagery