So we now know why traditional neural networks don’t work well for images. But that brings forth the question, why do Convolutional Neural Networks work better?
First, we need to understand how humans recognize images.
While it may seem obvious to us what a car is and how to identify them, it isn’t to neural networks.
So what makes a car a car?
We typically attribute certain features to a car (seen below):
Though not all cars look exactly like this Bentley, they generally have all of the features labeled in the image. Using this, we can identify cars that aren’t Bentleys.
For example, the Honda below has the same general features labeled in the Bentley.
So we can identify the Honda as being a car.
However, taking a look at the vehicle below:
We know it isn’t a car because it doesn’t have a bonnet, windshield, bumper, roof, etc.
This is essentially how Convolutional Neural Networks learn to identify images: features.
This way, it doesn’t matter where a car is located in an image or what color, make, or model it is. The general characteristics of a car stay consistent.
The Model T may be old but it’s still a car!