Artificial intelligence, Geoffrey Hinton, neural network, GLOM, vectors, visual perception, human perception, intuition
The in-depth study launched the latest AI revolution, transforming the computer’s vision and field as a whole. Hinton thinks so deep learning should be almost all that is needed to completely repeat the human mind.
But despite rapid progress, there are still major challenges. Insert a neural network into an unknown data set or alien environment, and it shows that it is fragile and flexible. Driving cars and writing essays are surprising for creative language writers, but things can go awry. AI visual systems can be easily confused: on the one hand a known cup of coffee would be unknown from above if the system had not been trained in this approach; and can also be taken with the manipulation of a few pixels, a panda ostrich or a school bus.
GLOM addresses two of the most difficult problems in visual perception systems: understanding the whole scene in terms of objects and their natural parts; and recognizing objects when viewed from a new perspective. (GLOM’s approach is in focus, but Hinton hopes the idea can be applied to language as well.)
An object like Hinton’s face, for example, has a tired dog’s eyes (too many people asking questions; too little sleep), a mouth and prominent ears, and a nose, not all of them — mostly gray. And given the nose, it is easily recognizable even in the profile view at first glance.
These two factors — the whole-part relationship and the point of view — are essential, from Hinton’s point of view, to knowing how humans view. “If GLOM ever works,” he says, “perception will do so in a way that is much more human than today’s neural networks.”
Assembling the parts as a whole, however, can be a difficult problem for computers, as the parts are sometimes ambiguous. A circle can be an eye or a donut or a wheel. As Hinton explains, the first generation of AI visual systems sought to recognize objects, primarily based on the geometry of the part-whole relationship — the spatial orientation between parts and between parts and wholeness. The second generation, on the other hand, was mostly based on deep learning – allowing the neural network to train in large amounts of data. With GLOM, Hinton combines the best aspects of both perspectives.
“There’s an intellectual humility that I like,” says Gary Marcus, founder and CEO of Robust.AI and a well-known critic with great confidence in in-depth studies. Marcus admires Hinton’s willingness to question something that brought him fame, acknowledging that it doesn’t fully work. “She’s brave,” he says. “And it’s a great checker to say ‘I’m trying to think outside the box.'”
In Cultivating GLOM, Hinton tried to model some of the mental shortcuts (intuitive or heuristic strategies) that people use to make sense of the world. “GLOM, and a big part of Geoff’s work, is to study what heuristics look like, to build neural networks that can have that heuristic themselves, and as a result, to show that networks do better visually,” says Nick Frosst. A Toronto-born computer scientist who worked with Hinton on Google Brain.
With visual perception, a strategy is to study parts of an object, such as different facial features, and thus understand the whole. If you see a particular nose, you may notice it as part of Hinton’s face; it is a whole-part hierarchy. To build a better view system, Hinton says, “I have a strong intuition that we need to use whole-part hierarchies.” The human brain understands the whole part of composition by creating what is called the “analysis tree,” a branching diagram that shows the hierarchical relationship between the whole, its parts, and its subparts. The face itself is at the top of the tree, and the eyes, nose, ears, and mouth that make up the component form the lower branches.
One of Hinton’s main goals with GLON is to repeat the analysis tree in the neural network; that is, it would be distinguished from the neural networks that preceded it. For technical reasons, it is difficult to do. “It’s difficult because each image would be analyzed by a person in a single parse tree, so we would want a neural network to do the same thing,” says Frosst. “It’s hard to get something with static architecture (neural network), to take on a new structure for each image you see — a pars tree.” Hinton has made several attempts. This is a major review of the previous GLOM 2017 trial, combined with other advances related to the field.
“I’m part of a nose!”
The general way to think about GLOM architecture is as follows: the image of interest (say, a picture of Hinton’s face) is divided into a grid. Each region of the network is a “location” in the image; one location may have the irises of one eye, the other may have the tip of the nose. There are five layers or levels in each location on the network. And level by level, the system makes a prediction with a vector that represents the content or information. At a level near the bottom, the vector that locates the tip of the nose can predict, “I’m part of a nose!” And at the next level, when building a more consistent representation of what is being seen, the vector can predict, “I’m part of a face in side angle view!”
But then the question is, do the vectors of the same level match? When they agree, the vectors point in the same direction, with the same conclusion: “Yes, we are both on the same nose.” Or climb up the parse tree. “Yeah, we’re both on the same face.”
Seeking agreement on the nature of an object — what exactly the object is, after all — GLOM vectors are repeated, location by location and layer by layer, along with the following adjacent vectors, as well as above and below the lower vectors. .
However, Hinton says the network doesn’t have anything to do with “want or don’t want the average”. It makes a selective average with predictions about similarities that show similarities. “This is popular in America, it’s called the echo chamber,” he says. “What you do is you only accept the opinions of people who already agree with you; and then what happens is that you get an echo chamber, that a bunch of people all have the same opinion. GLOM really uses that in a constructive way. ”The analogous phenomenon of Hinton’s system is these“ islands of consensus ”.
“Imagine a bunch of people in a room shouting small variations of the same idea,” says Frosst, or imagine these people as vectors that show slight variations in the same direction. “After a while they would come up with a single idea, and everyone would feel stronger because others around them have confirmed it.” Thus, GLOM vectors reinforce and augment collective predictions about an image.
GLOM uses these islands of agreed vectors to perform the trick of representing the analysis tree in a neural network. In contrast, some of the latest neural networks use vector agreement activation, GLOM uses the agreement representation—Build representations of online things. For example, when several vectors indicate that they are all part of the nose, a small set of consensus collectively indicates the face-to-face nose in the network parse tree. Another small set of matching vectors can represent the mouth in the parse tree; and the large set at the top of the tree would represent the conclusion that the image as a whole is the face of Hinton. “The way to represent the Parse tree here,” Hinton explains, “is that you have a large island at the object level; the parts of the object are smaller islands; the subsections are smaller islands, and so on.”
According to Hinton’s longtime friend and collaborator Yoshua Bengio, a computer scientist University of MontrealIf GLOM were able to solve the engineering challenge of representing a parse tree in the neural network, it would be a feat – it would be important for the neural network to function properly. “Geoff has often created surprisingly strong intuitions throughout his career, many of which are right,” says Bengio. “That’s why I pay attention to them, especially when they feel as strong as with GLOM.”
The power of Hinton’s conviction lies not only in the analogy of the echo chamber, but also in the mathematical and biological analogies that inspired and justified certain design decisions in GLOM’s new engineering.
“Geoff is a very unusual thinker because he is able to use complex mathematical concepts and integrate theory with biological limitations,” says Sue Becker, a former Hinton student who is now a computational cognitive neuroscientist at McMaster University. “Researchers who focus on mathematical theory or neurobiology are much less determined to solve machines and humans to learn and think about how to solve an infinite puzzle.”
Turning philosophy into engineering
So far, Hinton’s new ideas have been well received, especially in the world’s largest echo chambers. “On Twitter, I like them,” he says. And a YouTube the tutorial proclaimed the term “MeGLOMania”.
Hinton is now the first to admit that GLOM is something more than philosophical reflection (he spent a year in philosophy undergraduate studies before moving on to experimental psychology). “If an idea in philosophy sounds good, it’s good,” he says. “How would you ever have a philosophical idea that looks like garbage but comes true? That would not pass as a philosophical idea. ‘ Science, by comparison, is “full of things that look like absolute garbage,” but it works very well, for example, neural networks, he says.
GLOM is designed to be philosophically compelling. But will it work?