What is SnapFusion? SnapChat’s New Generative AI Tool Explained
Snap Inc has just unveiled a new AI tool for Snapchat, SnapFusion. Here’s everything you need to know about how it works…
No one and nothing is safe from AI. It is the new arms race or gold rush for Silicon Valley’s biggest companies. Big tech has been around for decades, but now the biggest players in Big Tech are pivoting to become the leaders of Big AI.
Google is investing hundreds of billions into AI. Microsoft is doing the same, closely working with OpenAI, and Apple has its own, more nuanced and unique approach to AI. Whatever your thoughts on AI, the technology is here to stay and it will affect every aspect of your life, both digitally and in the real world.
Snap Inc, the parent company of SnapChat, is no stranger to AI; it has already dabbled in AI inside SnapChat. But its latest generative AI tool, called SnapFusion, represents a pretty significant step forwards for test-to-image diffusion models. Below is everything you need to know about SnapFusion.
What is SnapFusion?
SnapFusion is a groundbreaking development in the field of text-to-image diffusion models. These models have the ability to create stunning images from natural language descriptions that can rival the work of professional artists and photographers.
At Snap, we’re inspired by the new features and products that enhance creativity and bring imaginations to life, all enabled by generative AI technology. While there’s huge interest in these experiences, due to their complex technical architecture, they require tremendous time, resources, and processing power in order to come to life–particularly on mobile.
Snap Research achieved this breakthrough by optimizing the network architecture and denoising process, making it incredibly efficient, while maintaining image quality. So, now it’s possible to run the model to generate images based on text prompts, and get back crisp clear images in mere seconds on mobile rather than minutes or hours, as other research presents.Snap Inc.
Traditionally, these models have been large, with complex network architectures and numerous denoising iterations, making them computationally expensive and slow to run.
They typically require high-end GPUs and cloud-based inference to operate at scale, which can be costly and have privacy implications, especially when user data is sent to a third party.
How Does SnapFusion Work?
SnapFusion, developed by Snap Inc. and Northeastern University, overcomes these challenges by introducing a generic approach that allows text-to-image diffusion models to run on mobile devices in less than 2 seconds.
This is achieved by introducing an efficient network architecture and improving step distillation.
Specifically, SnapFusion proposes an efficient UNet by identifying the redundancy of the original model and reducing the computation of the image decoder via data distillation.
It also enhances the step distillation by exploring training strategies and introducing regularization from classifier-free guidance.
The model has been extensively tested on MS-COCO and has shown that with just 8 denoising steps, it achieves better FID and CLIP scores than Stable Diffusion v1.5 with 50 steps.
This development democratizes content creation by bringing powerful text-to-image diffusion models to the hands of users.
In essence, SnapFusion is a significant leap forward in the field of content creation, enabling users to generate photorealistic content using text prompts on their mobile devices quickly and efficiently.
OK… But What Does That ACTUALLY Mean?
Here’s a more simplified overview of what SnapFusion is and its benefits, compared to more traditional test-to-image diffusion models.
Imagine being able to describe a scene with your words and then having a tool that can create that scene as an image. That’s what SnapFusion does. But what makes it special is that it can do this very quickly and on your mobile phone, something that was previously only possible with powerful computers.
This is obviously a big step forwards, both for SnapChat and the concept of text-to-image diffusion. SnapChat has hundreds of millions of users and its reach and usage will do wonders for popularising this type of AU technology.
And because SnapFusion was designed to be used by normal people, people like you and me, it is designed to be as simple to use as possible, despite all the engineering voodoo that went into making it possible.
Make no mistake, this development will have sent shockwaves through Silicon Valley. You don’t just turn up and blow up an existing model. This doesn’t happen. But this is exactly what Snap Inc has just done with SnapFusion.
- User-Friendly: SnapFusion is designed to work on mobile devices, making it accessible to anyone with a smartphone. You don’t need a high-end computer to use it.
- Fast: It can create images from text descriptions in less than two seconds. This speed makes it practical for everyday use.
- Privacy-Friendly: Since it works on your device, your data doesn’t need to be sent to a third party. This means your descriptions and the images they create stay private.
- Efficient: SnapFusion uses an efficient network design, which means it doesn’t use as much of your phone’s resources as other similar tools might.
- Democratizes Content Creation: By making this technology accessible and easy to use, SnapFusion allows anyone to create unique images from text descriptions, opening up new possibilities for content creation.