Google Translate

Summary of A16z Podcast Episode: Boosting Creativity: Prompt Engineering | a16z Podcast ft. Steph Smith

Podcast: A16z
13 min. read

— Description —

Discover the fascinating world of prompt engineering with the DALL-E2 Prompt Book Explore the artistry of text imaging and unleash your creativity by using images as prompts Learn how AI models like Midjourney, DALL-E 2, and Stable Diffusion can revolutionize your visual effects

Dive into the black box of AI and uncover the potential for AI-generated content to surpass traditional entertainment Whether youre a beginner or an expert, this book will inspire you to explore new career opportunities in the ever-evolving AI industry.

Boosting Creativity: Prompt Engineering | a16z Podcast ft. Steph Smith

Key Takeaways

  • The DALL-E2 Prompt Book was created as a collection of cool examples and terms used to create amazing visual effects using DALL-E 2 (OpenAI’s text-to-image model)
  • Prompt engineering is a new field, and nobody could definitively say they’re an expert yet
  • There is a debate over whether there is any artistry in text imaging, but Guy believes there is something about discovering an image that hasn’t quite existed until it’s manifested through words
  • Guy suggests that someone looking to improve their prompting skills could review the Alt text on different images online to see how things are described, and how an AI might interpret a given prompt
  • One of the most significant developments in prompting tools is the ability to prompt with images
    • This is not simply combining images and words like in Photoshop, but generating prompts based on images and their features
    • Using images as prompts can lead to surprising and unexpected results that may be difficult to control but can also offer new and interesting opportunities for creativity
  • AI models are like a black box, making it difficult to fine-tune or understand every little piece that goes into the input and output
    • Inputting the same prompt into an AI model doesn’t necessarily result in the same output because it starts from a random cloud of noise
    • Some people fall into the trap of generating prompts again and again, hoping to get better results like pulling an AI slot machine
  • Three popular models: Midjourney, DALL-E 2, and Stable Diffusion
    • The ability to prompt within each model is similar to switching between Excel and Google Sheets
    • Differences between the models are like learning different languages, with similar principles and some variations in newer models
  • Visual art expresses things that cannot be put into words, and the goal is to unleash the inexplicable and undefinable
  • Learning with AI tools can bring about personal experiences that help to surface things that were never considered before
  • Using AI tools has two modes: waiting to see what the model shows or visualizing it in your mind and rejecting what doesn’t work
  • There is potential for AI-generated content to surpass traditional forms of entertainment like Netflix or Instagram
    • AI tools could potentially integrate with 3D printing to create real-life products
  • The development of foundational tools in the AI industry may incentivize making prompt engineering a skill that anyone can do well
    • There may also be a need for people who specialize in “secret prompting,” such as copywriters who add a layer of prompts to the AI that consumers don’t see
  • As the AI industry grows, there will be a range of careers available that are not yet even imagined

Intro

  • On the latest episode of the a16z Podcast, host Steph Smith sits down with Guy Parsons (@GuyP) to discuss the growing importance of prompt engineering in the age of AI. As AI continues to change modern life and the job market, creative roles like prompt engineering are emerging to work alongside the technology
    • Check out Guy Parsons’ DALL-E 2 Prompt Book
  • Host: Steph Smith (@stephsmithio) 

DALL-E 2 Prompt Book

  • The DALL-E2 Prompt Book was created as a collection of cool examples and terms used to create amazing visual effects using DALL-E 2, which is OpenAI’s text-to-image model
    • The Prompt Book was essentially a slide deck that grew to be 8100 slides long
    • Guy shared the Prompt Book online as a jumping-off point for people to realize the kind of stuff that these tools were capable of
  • Guy estimates that he has spent a couple of hundred hours mastering the idea of prompting in Midjourney, DALL-E 2, and Stable Diffusion, but he wouldn’t say he’s a master
    • Some people have done thousands or even hundreds of thousands of prompts using these tools
    • The capabilities of these tools have advanced significantly in the last six months
  • Prompt engineering is a new field, and nobody could definitively say they’re an expert yet

Parallel Skills in Prompt Engineering

  • Steph asks if there is a parallel skill set in prompt engineering that is similar to other skills such as coding, effective storytelling, and processing numbers in Excel
    • Guy mentions an era where there was a category of people who were good at googling stuff and the ability to use specific search queries to find information
  • There is a debate over whether there is any artistry in text imaging, but Guy believes there is something about discovering an image that hasn’t quite existed until it’s manifested through words
    • Steph brings up the abundance of information online and how it’s a skill to learn how to parse and surface what others find interesting using tools such as subreddit stats, Ahrefs, and other data sets

80/20 Prompting

  • Steph asks Guy if there are certain learnings or an 80/20 approach to becoming a good prompt engineer
    • Guy explains that when you’re new to using these tools, the best way to understand how they work is to describe something as if it already exists like it’s an image in a downloadable clip art library or a photography gallery
    • He emphasizes the importance of using natural language that mimics the kind of descriptions you would see in those contexts, as this gives the tools a sense of what you’re looking for and what prompts work well
  • Guy notes that AI tools are generally bad at describing images in great detail (e.g., what people are wearing), but are good at describing the general topic or concept of the image
    • Steph points out that this is how these AIs were trained, by using Alt text from online images and using those as descriptors
  • Guy suggests that someone looking to improve their prompting skills could review the Alt text on different images online to see how things are described, and how an AI might interpret a given prompt
  • Steph notes that the level of detail required in a prompt can be surprising and that it’s easy to underestimate the number of iterations that might come back from a seemingly simple prompt
    • Guy agrees and adds that longer prompts tend to have diminishing returns and that his prompt book includes many different ways to describe a shot (e.g., camera angle, time period, artistry, artist)
  • Steph asks about using specific artists’ work to train new images, and Guy acknowledges that there’s some controversy around that approach

New Ways of Prompting

  • New ways of prompting are emerging and evolving all the time, offering more tools and options for users to leverage in their creative projects
  • One of the most significant developments in prompting tools is the ability to prompt with images 
    • This is not simply combining images and words like in Photoshop, but generating prompts based on images and their features
    • Using images as prompts can lead to surprising and unexpected results that may be difficult to control but can also offer new and interesting opportunities for creativity
      • For example, one can create abstract designs using brand colors or personal photos and then multiply that baseline with custom prompts to create a unique visual base
  • Another major development in prompting is the rise of selfie culture, which has prompted many AI-powered tools to help users generate more selfies and profile pictures based on their features
    • In the image-to-image space, some startups are doing interesting things with image generation, allowing users to input core images and then generate infinite versions of those images based on specific modifiers
  • With access to prompt libraries and the ability to input images, users are no longer starting from scratch when using these tools
    • They have a baseline to work from, which can be customized with specific prompts to achieve their desired output
    • However, controlling the output can be challenging since users are relying on AI to understand their intentions and generate the desired output 
    • It takes time and practice to learn how to refine prompts to get a higher throughput of desired images versus undesired ones

Pulling the AI Slot Machine

  • AI models are like a black box, making it difficult to fine-tune or understand every little piece that goes into the input and output
    • Inputting the same prompt into an AI model doesn’t necessarily result in the same output because it starts from a random cloud of noise
    • When testing different prompts, it’s challenging to differentiate whether the result is good or just lucky
    • Some people fall into the trap of generating prompts again and again, hoping to get better results like pulling an AI slot machine
  • Evidence and other communities can help to learn from other people’s work and prompts to better understand what works and what doesn’t
  • Negative prompts and glitches can occur, such as the infamous hand glitch in generating images of people
  • The limitations of AI models are that they struggle with specific tasks, and there are still glitches in the matrix
  • Some models, like DALL-E 2, struggle with the understanding they are drawing things in a square, but users can upload a border image to force it to think inside the box
    • Other models, like Midjourney, have solved the composition problem by understanding the possibilities and limitations of AI and the prompt engineering process

Comparing Models

  • Three popular models: Midjourney, DALL-E 2, and Stable Diffusion
    • The ability to prompt within each model is similar to switching between Excel and Google Sheets
    • Differences between the models are like learning different languages, with similar principles and some variations in newer models
    • Midjourney does the heavy lifting to help create high-quality output, while Stable Diffusion has a larger dataset
  • Fine-tuning and creative decisions are made on top of the models to optimize them
    • Like driving different cars, some models are more responsive than others
    • Sometimes, another tool is needed to achieve the final refinement of an image, such as Facetune or inpainting/outpainting
  • The abundance of raw but imperfect materials creates opportunities for new tools and improvements to existing ones
    • Some effects, like a vintage film look, are easier to achieve with other tools such as iPhone apps

Requested Features

  • Potential for more models to be developed using open-source stable diffusion
  • The challenge and opportunity is to go beyond the text box and create something more user-friendly and inspiring that matches how people think
  • Designers find it hard when clients can’t explain what they want, and AI models are in the same position
  • Possibility of a conversational interface for AI generation, with the generation happening fast enough to show multiple options and directions
  • The Prompt book helped with understanding metaphysical painting and the code of chrome, but some other aesthetics and styles have no name
  • Visual art expresses things that cannot be put into words, and the goal is to unleash the inexplicable and undefinable
  • A better onboarding experience that guides new prompters on how different prompts can fit together would be useful
  • Potential for creating a zip file of a mood board and training AI to work with that particular concept
  • Embedding tricks can be used to train AI with style instead of just faces
  • Interest in a version of the product where users can upload brand images or colors and iterate with AI to create images that match their brand

Learning with AI

  • Learning with AI tools can bring about personal experiences that help to surface things that were never considered before
  • Using AI tools has two modes: waiting to see what the model shows or visualizing it in your mind and rejecting what doesn’t work
  • Allowing the AI model to take you where it wants to go can lead to a completely different and unexpected outcome
  • The variations tool in DALL-E 2 can generate four more images that are similar to the original image
    • Repeatedly using the variations tool can lead to a psychedelic dream-like visual journey

Practical Use Cases

  • AI tools have practical applications beyond just creating interesting art
    • Some examples include using generated images for blog post sharing or designing products like sneakers
    • Some uses of AI tools may not be explicitly advertised due to ethical and legal considerations
  • There is potential for AI-generated content to surpass traditional forms of entertainment like Netflix or Instagram
    • AI tools could potentially integrate with 3D printing to create real-life products
  • There is a debate around the value of AI-generated content compared to traditional forms of art and design, but there are many different levels at which we engage with visual components in everyday life

A Top 1% Prompt Engineer

  • The idea of a “prompt engineer” may only be mastered by a few individuals, making them more valuable in the field
    • On the other hand, as technology becomes more advanced, anyone can learn to prompt reasonably well, making it a fundamental skill set similar to reading and writing
  • The development of foundational tools in the AI industry may incentivize making prompt engineering a skill that anyone can do well
    • However, there will still be people who specialize in prompt engineering and explore the boundaries of what’s possible, similar to those who specialize in wood whittling or animating hair
  • There may also be a need for people who specialize in “secret prompting,” such as copywriters who add a layer of prompts to the AI that consumers don’t see
    • Just like in the music or film industry, there will likely be a range of niche careers in the AI industry, such as prompt engineers who specialize in hair or hands or enterprise SaaS companies
  • The concept of a “10x prompt engineer” may become a common metaphor in the tech world, similar to the idea of a “10x recording engineer” in the music industry
  • As the AI industry grows, there will be a range of careers available that are not yet even imagined

Receive Summaries of your favorite podcasts