There are a variety of reasons for why traditional neural networks do not work well for image classification tasks.
If you gave an image to a traditional neural network, it would have to process each individual pixel individually as an input.
Already, a problem arises with scalability. For a color image with dimensions 128 x 128, you would need almost 50,000 weights. That may seem manageable but with an image with dimensions 920 x 1080, you would need over 2 million!
As a result, problems with overfitting become substantial.
Now, pretend we have an image of a duck:
Given the nature of traditional neural networks, they will memorize the placement of the duck as being in the middle of the page. However, this isn’t always the case! If you then give it an image with a duck at the bottom right, it will try to correct itself and “learn” that ducks should be at the bottom right of images.
Ducks can be all over an image: