AI is getting better and better at fooling you with the images it generates. But it has a Kryptonite: Hands!
AI image generators like Dall-E 2 and Midjourney are getting super impressive with the results they can deliver with simple prompts. These AI image generators can create super realistic images from prompts.
There is a real fear out there that generative AI could replace designers and artists. Generative AI is being used for fun and also for professional purposes.
But something is holding back AI-generated images: Hands.
How much ever advanced generative AI has become, generating accurate hands is a hurdle for it.
Take a look at this viral tweet by @mileszim:
The images generated are seriously impressive at first look. Those AI-generated people look too real, but that’s until you stumble up on this part:
This looks like a demon! The hands are seriously messed up, making me uncomfortable looking at it. Take a look at these parts too:
The hands are messed up, and the guys are holding the cups in a physically impossible way.
This issue is not specific to this example. It can be seen across most AI-generated images with any AI tools, such as DallE-2, Midjourney V5, Stable Fusion, Adobe Firefly, etc.
So why does this happen? Why is generative AI tools struggling with generating images of hands? Let us discuss in detail some of the reasons behind this phenomenon.
Human Hands Are Complex
The human hand is a complex body part, not just to generate; it just is. It has five fingers, each looking different and servings different purposes. For a small body part, the hand has a lot of different bones and muscles that make it look like it is and perform the tasks it does.
Compared to human faces, hands can look very different from different angles. If someone is holding an umbrella, the hands look all curled up. If the same person is holding a big bowl in his hand, the hand looks so different, and if the hand is balled up into fists, it looks totally different, with no fingers visible.
Hands have a specific number of fingers, 5. When AI is generating a tree, the number of leaves isn’t a fixed number, and it can generate an unusually high amount of leaves, but it doesn’t matter as there is no particular number to hit. But with hands, the AI can go overboard and generate hands with 7 or 9 fingers, which looks particularly odd.
AI Doesn’t Really Understand The Concept Of Hands
AI knows how things look but doesn’t know how it works. It doesn’t understand the concept of hands; it just knows how it looks.
We just feed the AI tons of images, many of which have hands; some images have five fingers shown, some might have four, and some might not even have fingers shown in the image because of the orientation. The photos are just fed in 2D, and the AI doesn’t understand the 3D structure of the hand from these images, and hands being the complex structure that it is, AI would find it very hard to replicate in the accurate way we want it to be.
Not Many Available Data On Human Hands
Generative AI models like Midjourney and Dall-E are trained with billions of images. The human faces in these images would be so abundant, but the hands are pretty low in number compared to other body parts. It also doesn’t really understand what a hand is and how it connects with the human body.
“It’s generally understood that within AI datasets, human images display hands less visibly than they do faces,” a spokesperson for Stability AI, the company behind Stable Diffusion AI told BuzzFeed News. He added, “Hands also tend to be much smaller in the source images, as they are relatively rarely visible in large form.”
AI would find it hard to generate a hand on particular prompts, especially when it is meant to be interacting with other objects.
We Spot The Irregularities In Hands More
AI-generated images aren’t always perfect; most have many issues and irregularities. If it is an image of an inanimate object, we wouldn’t have many issues with it. Take a look at this image made from this prompt, “A photo of a teddy bear on a skateboard in Times Square”:
The image looks perfect at first look, but you can see a lot of issues with the texture on the back of the teddy bear if you look closely. Even the ears and legs look choppy. But in the first look, we don’t really mind these issues. It is because it is an inanimate object, and the picture as a whole looks good.
Now, take a look at the image below made with this prompt in Dall-E 2, “a guy holding a big orange with both his hands”:
The hands are totally messed up. You only noticed the hands in the image. You did not even check how good the t-shirt of the guy has been generated or how good his half-face looks.
That’s because the hands are something we see every day and know how exactly it should look. Any irregularity, be it small or big, would be the first thing we would notice in the image.
Conclusion
To sum it up, AI isn’t perfect at generating images of hands, especially if the hands interact with another object or hand. And our brains are really good at spotting these irregularities in hands pretty fast because the hands we see every day and know how they should look.
AI is getting better and better at getting hands right. Midjourney V5, the tool’s latest version, has improved the hands part and can now generate much better hands. Take a look at these images:
However, it still needs improvement. Even Midjourney V5 can still get the hands wrong at times. Other tools would also be working hard to fix this issue. AI is a fast-changing landscape, and the struggle with hands may soon be a thing of the past for AI tools.
Leave a Reply