Skip to content

Valuation of 1 billion in 3 months, Li Feifei’s first model of space intelligence is born! A picture generates a 3D world, and video games will change.

December 3, 2024
AI Era

alt text

[AI Era Introduction] Li Feifei’s World Labs’ first “spatial intelligence” model has just been born! A picture generates a 3D world. Netizens exclaimed: It’s crazy. We have entered the next round of revolution. This is the future of video games and movies.

AI-generated 3D world comes true!

Just now, World Labs, founded by AI godmother Li Feifei, officially announced its "spatial intelligence" model for the first time, which can generate a 3D world with just one picture.

In Li Feifei's words, "No matter how you theorize the idea, it is difficult to describe in words the interactive experience of generating a 3D scene from a photo or a sentence."

alt text

This is the first step towards spatial intelligence.

alt textInteractive portal: https://www.worldlabs.ai/blog#footnote1

All scenes can be rendered in real time in the browser, with controllable camera effects and adjustable simulated depth of field.

alt text

In the future, the virtual world of game NPCs can be switched at will, and it can be generated in minutes.

alt text

alt text

Jim Fan, a senior research scientist at NVIDIA and a disciple of Li Feifei, concluded, "GenAI is creating increasingly high-dimensional snapshots of human experience. Stable Diffusion is a 2D snapshot; Sora is a 2D + time dimension snapshot; and World Labs is a 3D, fully immersive snapshot".

alt text

In April this year, it was revealed that Li Feifei had started her own business, focusing on space intelligence. The new company raised private financing and directly became a US$1 billion unicorn.

Until September, the company called World Lab officially debuted and raised US$230 million in a new round of financing. It received strong support from AI giants Geoffrey Hinton, Jeff Dean, former Google CEO Eric Schmidt and others.

alt textWorld Labs founder team, from left, Ben Mildenhall, Justin Johnson, Christoph Lassner and Li Feifei

Now more than half a year in the making, space intelligence has finally taken shape.

Netizens said excitedly that it’s crazy, we are about to usher in a revolution like the 1980s and 1990s. This will allow many people to realize their ideas, hopefully reducing development costs and helping studios be more adventurous with new intellectual property.

alt text

This is the future of video games and movies.

alt text

VR now has more possibilities.

Explore a new world

Whether it is Midjourney, FLUX, Runway, or DreamMachine, most of the GenAI tools we are familiar with can only produce image/video 2D content.

If generated in 3D, the controllability and consistency of video will be greatly improved.

This means that the production of movies, games, simulators and other digital representations of the physical world will undergo earth-shaking changes.

The original intention of World Labs when it was founded was to use spatially intelligent AI to model the world and reason about objects/locations/interactions in 3D space and time.

This time, they're showing off this 3D generated world for the first time.

The following is a real-time rendering demonstration performed in the browser (Note: AI images are generated by FLUX 1.1 pro/Ideogram/Midjourney).

Input an AI-generated image of a quaint village, and you get a 3D world.

alt textTip: This is a quaint village with cobblestone streets, thatched-roof cabins, and a stone well in the central square surrounded by flower beds

alt text

A magnificent palace, AI vividly displays light and shadow.

alt text

alt text

An AI-generated origami-like picture immediately came to life.

alt text

alt text

Or enter a museum photo, who can imagine what the surrounding area looks like?

alt text

AI helps you imagine everything, from entering the door to the next adjacent exhibition hall and exhibits. ....

alt text

Another example is this real-life picture. AI can also imagine the world around it.

alt text

alt text

Camera effects

You can also reflect different camera effects. After the scene is generated, a virtual camera will be used for real-time rendering in the browser.

Through precise control of this camera, artistic photography effects can be achieved.

For example, simulating different depths of field to keep only objects within a specific distance from the camera clear:

alt text

You can also simulate dolly zoom, achieving this effect by simultaneously adjusting the camera's position and field of view:

alt text

3D special effects

Most generative models predict pixels. Predicting 3D scenes has many benefits:

  • Scene Persistence: Once a world is generated, it exists stably. Even if you look away and look again, the scene doesn't change while you're out of sight.
  • Real-time control: Once the scene is generated, you can move around it in real time. You can look closely at the details of a flower or peek behind a corner to see what's behind it.
  • Geometry Accuracy: The generated world follows basic 3D geometric physics rules. They have a realistic sense of three-dimensionality and spatial depth, in stark contrast to the illusory effects of some AI-generated videos.

The simplest way to visualize a 3D scene is to use a depth map. In the depth map, each pixel is colored according to its distance from the camera:

alt text

alt text

Not only can we use 3D scene structures to create interactive effects:

alt text

alt text

You can also create automatically executed dynamic effects to breathe life into the scene:

alt text

The 3D world in famous paintings can also be interacted with in real time.

alt text

Enter Van Gogh’s outdoor cafe

Now we can experience the iconic work of art in a whole new way!

There is nothing in the original painting, it was generated by the model.

Below, let's step into worlds generated from favorite works of Van Gogh, Hopper, Seurat and Kandinsky.

alt text

Creative Workflow

Now, 3D world generation can be combined very naturally with other AI tools, allowing creators to use the tools they already know and get a completely new and smooth experience.

First, worlds can be built from text by generating images using a text-to-image model.

Different models have their own different styles, and the spatial intelligence world can inherit these styles.

Below are four variations of the same scene using different text-to-image models, all using the same prompt.

Tips: A vibrant anime-style teenage bedroom, with colorful blankets on the bed, a computer scattered on the desk, posters on the wall, and various sports equipment scattered randomly around the room. A guitar leans against the wall, and a cozy rug with a delicate pattern lies in the center of the room. The sunlight filtering in from the window creates a warm and energetic youthful atmosphere for the whole room.

alt text

Now, some creators have already tried it out in advance.

For example, Eric Solorio uses this model to fill the gaps in his creative workflow, allowing characters in the scene to go into battle, and even guiding the precise movement of the camera.

Brittani Natail combined World Labs technology with tools such as Midjourney, Runway, Suno, ElevenLabs, Blender, and CapCut to carefully design camera paths in the generated world.

As a result, different emotions were evoked in the three short films.

Now, the waiting list is open, so without further ado, hurry up and apply.

Spatial intelligence, the next frontier of computer vision

Previously, Li Feifei revealed in detail what "space intelligence" is for the first time at an event:

Visualization leads to insight, seeing leads to understanding, and understanding leads to action.

She attributed human wisdom to two major wisdoms, one is language wisdom, and the other is space wisdom. While linguistic intelligence has received much attention, spatial intelligence will have a significant impact on AI.

alt text

In a TED talk released in April, Li Feifei also shared more of her thoughts on space intelligence, and also foreshadowed the goals of World Labs.

She said, “The ability to move is inherent in all spatially intelligent creatures. Because it can connect perception and action."

“If we want AI to surpass its current capabilities, we need AI that can not only see and speak, but AI that can act.”

alt text

Even NVIDIA senior computer scientist Jim Fan said, "Spatial intelligence is the next frontier of computer vision and physical intelligence."

As World Labs’ official blog explains, human intelligence encompasses many aspects.

Linguistic intelligence allows us to communicate and connect with them through language. The most basic of them is spatial intelligence, which allows us to understand and interact with the world around us.

In addition, spatial intelligence has strong creativity and can present the pictures in our minds in reality.

It is with spatial intelligence that humans are able to reason, act and invent. From simple sandcastles to towering urban visualizations, you can't do without it.

alt text

In a recent interview with Bloomberg, Li Feifei said that human space intelligence has actually evolved over millions of years.

It’s the ability to understand, reason, generate, and even interact in a 3D world. Whether you look at beautiful flowers, try to touch butterflies, or build a city, all of these are part of spatial intelligence.

This can be seen not only in humans but also in animals.

alt text

So, how can computers also have spatial intelligence capabilities? In fact, we have made huge progress, and the development of AI in the past ten years has been quite exciting.

A reminder, AI generates images, videos, and true knowledge can also tell stories. These models have reshaped the way humans work and live in completely new ways.

And we have only seen the first chapter on the eve of the GenAI revolution.

Next step, how to surpass?

How to bring these capabilities to the 3D field. Because the real world is 3D, and human spatial intelligence is based on a very "native" ability to understand and operate 3D.

alt text

Today, a single image generates a 3D model of the world, giving us an early understanding of spatial intelligence.

Reference:

Last updated: