An open-source library for 3D deep learning

3D understanding plays an important role in advancing the ability of AI systems to better understand and operate in the real world β€” including navigating physical space in robotics, improving virtual reality experiences, and even recognizing occluded objects in 2D content. But research in 3D deep learning has been limited because of the lack of sufficient tools and resources to support the complexities of using neural networks with 3D data and the fact that many traditional graphic operators are not differentiable.

Facebook AI has built and is now releasing PyTorch3D, a highly modular and optimized library with unique capabilities designed to make 3D deep learning easier with PyTorch. PyTorch3D provides a set of frequently used 3D operators and loss functions for 3D data that are fast and differentiable, as well as a modular differentiable rendering API β€” enabling researchers to import these functions into current state-of-the-art deep learning systems right away.

PyTorch3D was recently a catalyst in Facebook AI’s work to build Mesh R-CNN, which achieved full 3D object reconstruction from images of complex interior spaces. We fused PyTorch3D with our highly optimized 2D recognition library, Detectron2, to successfully push object understanding to the third dimension. PyTorch3D functions for handling rotations and 3D transformations were also central in creating C3DPO, a novel method for learning associations between images and 3D shapes using less annotated training data.

Researchers and engineers can similarly leverage PyTorch3D for a wide variety of 3D deep learning research β€” whether 3D reconstruction, bundle adjustment, or even 3D reasoning β€” to improve 2D recognition tasks. Today, we are sharing our PyTorch3D library here and open-sourcing our Mesh R-CNN codebase here.


PyTorch3D: Faster, flexible 3D deep learning research

One of the reasons 3D understanding with deep learning is relatively underexplored compared with 2D understanding is because 3D data inputs are more complex with more memory and computation requirements, whereas 2D images can be represented by simple tensors. 3D operations must also be differentiable so gradients can propagate backward through the system from model output back to the input data. It is especially challenging given that many traditional operators in the computer graphics field, such as rendering, involve steps that block gradients.

Data structure for storing and manipulating batches of triangle meshes

video link-