Back to Blog Home

Google's Deepmind Can Create 3D Models From 2D Images

Jay Mishra| AI | 8 months

Deepmind, a Google subsidiary which is focussed on artificial intelligence and is based out of London has created something called Generative Query Network or GQN which is a framework and algorithm which can help to render 3D models of objects and scenes from 2D images. This process is called Scene representation—the process of converting visual sensory data into concise descriptions and usually it requires large sets of labelled data, but with using GQN it removes the relience on labelled data. Scene representation is an important attribute of intelligent behaviour. 



The GQN or Generative Query Network can render an object or scene from any angle even if it's only fed with handful of 2D images. The algorithm is capable of 'imagining' how the scene might look like from relatively low input of data and can render unseen sides of the object and generate a 3D view from multiple angles without leveraging large datasets for supervision or training. 



Generative Query Network (GQN) is a framework within which machines learn to represent scenes using only their own sensors. The GQN takes as input images of a scene taken from different viewpoints, constructs an internal representation, and uses this representation to predict the appearance of that scene from previously unobserved viewpoints. The GQN demonstrates representation learning without human labels or domain knowledge, paving the way toward machines that autonomously learn to understand the world around them.



The technology is definitely impressive but has been used only for simple scenes with small number of objects and is not capable of generating complex 3 dimensional models. 


The GQN first uses images taken from different viewpoints and creates an abstract description of the scene, learning its essentials. Next, on the basis of this representation, the network predicts what the scene would look like from a new, arbitrary viewpoint.



This work illustrates a powerful approach to machine learning of grounded representations of physical scenes, as well as of the associated perception systems that holistically extract these representations from images, paving the way toward fully unsupervised scene understanding, imagination, planning, and behavior.

The company Deepmind is famous for beating human professional Go player Lee Sudol in a five game match. And generalized AI program by the company called AlphaZero is capable of beating programs playing chess, Go or Shogi (which is a form of Japanese chess) after only few hours of playing against itself using reinforced learning mechanism.


Join 1000+ People Who Subscribe to Weekly Blog Updates

Back to Blog Home