Guide to Promptcraft for AI Art Images

Having created 30,000+ images using AI, I’ve run a ton of experiments.

Among them, I’ve come to four core principles you should apply when prompting to create visuals for better results.

Disclaimer though: It’s early days still. Like so much about AI, I don’t know for a fact everything here is absolutely true.

I doubt even the people who designed these systems know. That’s sort of what makes AI cool. It’s like an unknown space, just waiting to be explored.

What I can tell you for certain is that poorly written prompts = waiting more often.

And depending on what tool you’re using, bad prompts don’t just cost you time, they might cost you money.

So the fewer renders you need to perform to get to a result you like, the better.

Here’s how to get great results, often even on your first try.

Note: The two AI image generators I use in this guide are Midjourney (best for beginners) and StableDiffusion (best for tinkerers)

4 Core Principles of Visual Promptcraft

Just these four ideas will vastly improve your results.

1. Follow the right order of information

One of the cardinal rules of image prompting is that the words toward the beginning of the prompt have a stronger impact than the words at the end.

Taking this a step further, the general order of information that seems to work well in all of the current AI art tools I’ve used is:

  1. The kind of image you want
  2. The subject of the image
  3. Details of the main subject
  4. Description of setting or background
  5. Image stylizations

Let’s do an example:

impressionist painting of imperial soldiers in white armor marching on a sleek metal catwalk, suspended high above a dark and industrial Star Wars environment, aura of power and discipline, painted in vivid color with energetic brush strokes
  1. The kind of image you want
    1. impressionist painting
  2. The subject of the image
    1. imperial soldiers
  3. Details of the main subject
    1. in white armor
    2. marching
  4. Description of setting or background
    1. on a sleek metal catwalk suspended high above a dark and industrial Star Wars environment
  5. Image stylizations
    1. aura of power and discipline
    2. painted in vivid color
    3. with energetic brush strokes

Using Midjourney (v4, Feb 6, 2023), we get images like these:

And Stable Diffusion 2.1 gives us these as a starting point.

As you can see, Stable Diffusion isn’t as good as Midjourney out of the box. It requires more tweaking and customizing to get good results.

But, Stable Diffusion is also open source and has a large community creating custom models.

Here are more results using a fan-made Stable Diffusion model called Protogen 58 Rebuilt Sci-Fi.

Much better!

Okay, I know picture scan be distracting, so to review, here are the 5 types of info for a good image prompt and an order to put them in:

(There are other advanced techniques for ordering words and structuring prompts, but this is the most central one.)

Now let’s take it a step further and learn to write better prompts overall.

2. Write with a word’s full meaning in mind

AI art models are trained by classifying images. Basically, a computer looks at the image and guesses what’s in it according to natural language. And another computer that’s been trained to help, tells the first computer if it’s right or if it’s wrong. And sometimes humans help too.

This may seem like a trivial detail, but it is not, as you’ll soon see.

Because…

Everything about your prompt craft can be improved if you consider how an AI ‘thinks’ and what various words can mean.

Take, for example, a prompt to generate a picture of a _________ man .

Now consider the difference between these four fill-in-the-blank adjectives:

  • beautiful
  • attractive
  • handsome
  • hot

Here’s how Midjourney sees those differences:

A beautiful man

A handsome man

An attractive man

A hot man

I knew these images were going to come out this way before I rendered them.

How?

By thinking like an AI.

Beautiful can be a nebulous concept that applies to almost anything visually pleasing.

Beautiful is more often applied to women than men, so it probably brings somewhat feminine or androgynous features.

Architecture can be beautiful. A stream can be beautiful. Someone’s handwriting can be beautiful. It probably brings a lot of different subtle connotations along with it too.

Attractive is an expression of beauty that is generally applied to all people. It’s somewhat genderless.

Attractive can also be used to describe a few other things too, like the terms of a deal can be attractive.

Handsome is most often used to describe men.

Handsome also brings along certain pre-conceptions. Handsome men are generally fit, square-jawed, and broad-shouldered. They dress well. They take care of themselves.

Hot. I put ‘hot’ in here because this is how people speak.

“That guy is so hot!”

Surely, because it is common in language, it would be easily understood by an AI, right?

Well, kind of. AIs do seem to understand that hot = attractive.

But use this word and don’t be surprised if your pictures end up with smoke and fire in the background.

As you can see, just as any word might seem ideal for your prompt, it can also skew it in a direction.

Now consider various words for men:

  • Men
  • Males
  • Guys
  • Boys
  • Dudes
  • Bros

AND imagine guys who are:

  • Dapper
  • Adorable
  • Good-looking
  • Impressive
  • Clean-cut
  • Strong
  • Stylish
  • Cute

Each word brings with context with it, both in how it relates to the idea of men and the other words in the prompt.

All of these prompts would generate something different.

PSA: This is especially important if, like what appears to be a large section of the internet, you want to generate good-looking ladies with AI.

To avoid making pictures of children, don’t prompt a gorgeous girl with your keywords. Instead, prompt a gorgeous female . Girl can skew very childish.

That is, unless you have a solid negative prompt in place.

This brings us to positive and negative prompts.

2. Include the right negatives

Most AI image generators use a combination of two prompts:

  • Positive prompt = what’s in the image?
  • Negative prompt = what’s not there?

Here’s a simplistic example.

Compare the following.

➕: a hot dog
🚫: (nothing in the negative prompt section)

“A hot dog” – Rendering by Midjourney v4 (Feb 6, 2023)

Now look at this one.

➕: a hot dog
🚫: fire, smoke, animals

“a hot dog –no fire, smoke, animals” – Rendering by Midjourney v4 (Feb 6, 2023)

Midjourney didn’t make a perfect hog dog in the second version, I know.

But we’re A LOT closer to what we’re looking for. If I wanted to continue trying to get a perfect hotdog, I’m in a much better spot.

See, words have nuanced meanings.

It seems obvious to us humans because we innately understand that people don’t say, “hot dog” and mean a dog that is on fire.

But with AI, the magic figurer-outer goes out into its pre-trained model, looks at word-image associations, and brings back what it thinks you want based on the data.

It’s computer friend that helped it train on the meanings of words and images never explained that some dogs are indeed, not hot in that way.

Now that we understand both the full meaning of words and negative prompting, the next step is to fix a major issue that most people make negative prompts.

3. Unloading your negatives

Here’s a typical style of prompt that I see all the time:

➕: extremely detailed, full body color photo, Albert Einstein, young man, background laboratory, film grain, skin details, high detailed skin texture, 8k, hdr, dslr
🚫: cgi, 3d, render, sketch, cartoon, drawing, anime, deformed, bad anatomy, disfigured, poorly drawn face, mutation, mutated, distorted hands, deformed, extra limb, ugly, disgusting, poorly drawn hands, missing limb, floating limbs, disconnected limbs, malformed hands, mutated hands and fingers, distorted hands, amputation, missing hands, doubled face, double hands, b&w, black and white, sepia, black and white photo, blur

What you can see is that, on the positive side, we have a series of image tags, each separated by a comma.

There is no established relationship between the tags. This is how 90%+ of people write prompts I think.

When confronted with this style of prompt, the AI model has to infer everything based on the order of the words (toward the front is more important) and the relationships between those words.

We’ll fix the positive side in a sec, but first we have to deal with the dumpster fire of a negative prompt.

So much restriction! And for what.

So before we talk about word order and commas, let’s unload the negative prompt.

How to unload your negative prompts

Here’s what I’m going to teach you to do:

ORIGINAL negative prompt

🚫:  cgi, 3d, render, sketch, cartoon, drawing, anime, deformed, bad anatomy, disfigured, poorly drawn face, mutation, mutated, distorted hands, deformed, extra limb, ugly, disgusting, poorly drawn hands, missing limb, floating limbs, disconnected limbs, malformed hands, mutated hands and fingers, distorted hands, amputation, missing hands, doubled face, double hands, b&w, black and white, sepia, black and white photo, blur

REVISED negative prompt

🚫: cartoon, plastic, arachnid, desaturated, blurred

The technique here is to avoid overloading the negative prompt with too many variables where it sort of blindfolds the AI to all sorts of good image data that it could otherwise make use of.

These 5 words were selected very carefully.

For example, by negative prompting for cartoonand plastic , we attempt to eliminate any illustration (cartoon) or 3D rendered style (plastic-looking skin) without also losing access to other information that might be helpful.

By not removing drawings, anime, sketches, etc, the model has a richer set of inspiration to drawn on for scenes, poses, etc. If a shot still comes out looking like not-a-photograph, we can consider adding more to the positive or negative side.

Dealing with bad anatomy.

With recent updates to most models, you also don’t need to battle very hard against bad anatomy unless it’s really a problem. Not like you used to, anyway.

Here, arachnid stands in for all the bad anatomy keywords.

Arachnids have multiple non-human limbs and extra eyes, after all. If we want to avoid ugly, mutated-looking bodies, let’s start by avoiding anything spider-like.

Note: If you still get bad anatomy, sure, drop additional words in, but be cautious.

Think about it.

If you want good hand anatomy, it seems illogical to negative prompt “extra hands” which would also eliminate many pictures of proper hands from the data the AI explores.

What would be a keyword to fix your problem that doesn’t include the name of the thing you want in the picture?

Continuing on…

Desaturated is better than black and white photo for a similar reason.

We want all photos and colors to help our AI, so we express the concept of black and white in a totally different way.

If you think about how important the words black, white, and photo are, you can see why we’d want to avoid prompting against it. They probably show up in training data everywhere.

As you can see, when over-used, negative prompts behave like static for the language model, cutting off otherwise useful data because of inferred relationships between words.

By unloading negative prompts, you get more specific about what you don’t want and enable the AI to retrieve what you do, making full use of the model’s capabilities.

Now that we’re accidentally limiting our good data, let’s help the AI understand exactly what we want to see.

4: Using joining words and commas

While we don’t usually think of an AI art tool as a large language model, many of the lessons from tools like GPT-3 and ChatGPT still apply here.

The language is everything.

And in language, commas act as a firm separator between linked ideas. What is on one side of a comma is different from what is on the other side.

Using the same example from above, we can easily improve the original prompt for better results.

Let’s look at its original positive side again to refresh our memory.

ORIGINAL positive prompt

➕: extremely detailed, full body color photo, Albert Einstein, young man, background laboratory, film grain, skin details, high detailed skin texture, 8k, hdr, dslr

All of the ideas are separated by commas, with very little grouping as to what belongs with what.

Is this supposed to be a picture of Albert Einstein next to a young man who is holding a DSLR camera, perhaps?

Probably not, but we’re making the AI infer all that, which means we’re not as likely to get an amazing result. At best, we probably get an average image back.

I mean, look at my first test image from Midjourney. All old guys and one shirtless guy with random marker drawings on his body? Nothing in a laboratory at all?

We need to fix this.

Below is my new prompt, and I’ll break down what changes I made.

But first let me show you the images. There are still some issues that need to be addressed, but we have youth in 3 out of 4 and the lab in about 1.5 of the results.

REVISED positive prompt

➕: extremely detailed 8k full body color photo of Albert Einstein as a young man, at a laboratory, taken with a DSLR, with the photograph showing film grain and natural skin with details and texture

Analyze the placement of these words:

  • of
  • as
  • a
  • at
  • with
  • and

…as well as the commas.

I’ll sort of do it for you as an indented list of bullet points to show the hierarchy based on where all the words and commas end up:

  • photo
    • extremely detailed
    • 8k
    • full body
    • color photo
  • Albert Einstein
    • as a young man
    • at a laboratory
  • taken with a DSLR
    • with the photograph showing film grain
    • and natural skin with details and texture

Using commas and joining words, each main idea of the image is grouped together in one comma-separated clause, and all objects and adjectives are associated correctly with each other using natural language.

From the phrase, full body color photo of Albert Einstein as a young man , there can be no confusion of whether I want a picture of Albert Einstein AND a young man or AS a young man. Likewise, taken with a DSLR clarifies whether he is holding a DSLR camera or the picture was taken with one.

Note though, that according to our original prompt structure from the very beginning of this guide, I use the practice of adding smaller details on style at the end. This is because we want the AI to be very clear about the main subject.

To make my follow-along details as accurate as possible, I re-referenced the photo when I wrote with the photograph showing film grain and natural skin with details and texture .

(Attentive readers will notice that color photo and --no desaturated still missed the mark, so we may need to prompt against sepia tones or isolate “color photo” in quotes to get the best result. That’s how AI works. You’re always going to need to do a bit of experimentation.)

Wrap up

These tips may seem simple, but if you start to use them, you’ll notice your images just turn out much closer to the mark and often much nicer.

If you’re paying with your time and for image rendering credits, this can save you a ton of frustration and money.

To review,

Follow the right order:

  1. Type of image
  2. Main subject
  3. Details of the main subject
  4. Description of background
  5. Image stylizations

Include the right negatives.

But also, unload your negatives.

And make smart use of comma and joining word placement.

By applying these four techniques, you can write prompts that are far more specific and far less heavy-handed, allowing the model to apply its full training data to render you beautiful results.

You’ll still likely need to experiment and play around with the words you use. But getting to your ideal endpoint should come much faster and easier this way.

Related Posts: