Meta shows stunning full body tracking with just the Quest headset

Image: Meta

The article can only be viewed with JavaScript enabled. Please enable JavaScript in your browser and refresh the page.

Until now, VR systems tracked the head and hands. That could change soon: the predictive talent of AI enables realistic full-body tracking and thus a better embodiment of the avatar based solely on data from sensors from the headset and controllers.

The meta has already shown that AI is the core technology for VR and AR with hand tracking for Quest: a neural network trained with many hours of hand movements enables reliable hand tracking even with the low-resolution cameras of the Quest headset, which are not specifically optimized for hand tracking.

This is driven by the predictive talent of artificial intelligence: thanks to prior knowledge gained during training, only a few real-world inputs are enough to accurately translate your hands into the virtual world. Full real-time acquisition, including VR rendering, would require much more power.

From hand tracking to body tracking with AI prediction

In a new project, Meta researchers are transferring this hand tracking principle, i.e. the most likely and physically correct simulation of virtual body movements based on actual movements, by training artificial intelligence with previously collected tracking data to the whole body. QuestSim can realistically animate a full body avatar using only sensor data from the headset and two controllers.

The Meta team trained the QuestSim AI using artificially generated sensor data. To this end, researchers simulated the movements of the headset and controllers based on eight hours of recording of 172 people’s movements. This way, they didn’t have to re-record headset and controller data along with body movements.

The training data for QuestSim AI was artificially generated in the simulation. The green dots show the virtual position of the VR headset and controllers. | Image: Meta

The motion capture clips included 130 minutes of walking, 110 minutes of jogging, 80 minutes of free gestures, 90 minutes of whiteboard discussion, and 70 minutes of balancing. Avatar simulation training with reinforcement learning lasted about two days.

After training, QuestSim can recognize what a person is doing based on actual data from the headset and controller. Using AI prediction, QuestSim can even simulate the movements of body parts, such as legs, for which no real-time sensor data is available, but for which the simulated movements were part of a synthetic motion capture dataset, i.e. learned by AI. For probable movements, the avatar is also subject to the rules of the physics simulator.


The headset alone is enough to get a believable full body avatar

QuestSim works for people of all heights. However, if the avatar deviates from the aspect ratio of the real person, it affects the avatar’s animation. For example, a tall avatar for a short person goes hunched over. Scientists still see optimization potential here.

The Meta research team also shows that the data from the headset sensors alone, with AI predictions, is sufficient for a credible and physically correct full body animated avatar.

AI movement prediction works best with movements that have been included in the training data and that have a high correlation between upper body movement and leg movement. In the case of complex or very dynamic movements, such as quick sprints or jumps, the avatar may diverge or fall off. Also, since the avatar is physics based, it doesn’t support teleporting.

In a follow-up work, the Meta researchers want to incorporate more detailed skeletal and body information into training to improve the variety of avatar movements.

Leave a Reply