My personal (and uneducated) theory is that this it's all to do with how the brain stores and processes information. When you look at a scene in real life, you're not just looking at a picture - you're interpreting it, focusing on objects, matching them with information you already have, and so on. Certain objects that you need pay no attention to fade into the background, stuff that is important comes into the foreground. Sometimes if there are a lot of important objects, this process may go awry, which explains why it's impossible to find the margarine in the fridge.
It's very hard to reproduce a scene exactly as is seen, without any extra interpretation going on between eyes and hand. Think of the blue sky at the top of the page phenomenon, perspective in small children's artwork (and most artwork before the 15th century or so) or the total inability of young children to draw anatomically correct people.
Perhaps one could attribute this to a sort of intelligent data compression - instead of seeing and remembering exact visual data, most of the time we remember objects - "That's a sign over there, this is the pavement I'm walking on, over there is a parked car, the kerb is over there..."
We don't need to remember the fact that the pole of the sign has a small chip out of it about four feet up, that the sign is bolted on with three bolts, one of which is rusty, that there are exactly x blades of grass at the bottom of the sign. We know roughly what a sign looks like, and we know what this sign says, so we file the sign under 'object, sign, saying blah blah blah'. All these objects are filed in a sort of 3D space which can then be used to work out, for instance, how you're going to walk without banging into stuff.
Personally, I find this process to be fascinating, as well as very, very cool. There's a lot of processing power behind that. However, it creates a problem when you come to putting a representation of a scene on paper. Not only are you trying to represent a 3D scene on a 2D medium, you also need to pay attention to all the little things that are usually discarded by your brain. Things like how clothes fold and ripple as someone moves, for instance.
I haven't even started discussing the fact that even if you manage to hold a perfect mental image of the object you want to draw, it's almost impossible to get your hands to draw what you're seeing in your head. The object has to be broken down into individual lines which can then be drawn onto the paper (or graphics tablet).
The interesting converse of all this is, of course, that the brain is rather good at recognising things which are meant to represent real-world objects, immensely simplified. A couple of squares, two rectangles, a trapezoid and a squiggle of 'smoke' make a house. No matter that the walls and roof have no texture, that nothing is visible through the window, the fact that the house is sitting on a plain of flat white nothingness, or that smoke just doesn't look like that. To our brains, it's a house. For another good example, take any cartoon. Look at the PowerPuff Girls. The main characters are 'obviously' young-ish female humans. They don't have any noses, fingers, recognisable elbows or knees, or chins... but we recognise them as people.
On the subject of the brain's data processing, have you ever wondered how and why the brain can deal with music? I might node some theories on this some time (preferably after I've looked up some data on it first...)
My disclaimer: I am not an artist, neither have I done anything other than basic introductory high school biology. This stuff is all personal conjecture, based on how I think my own brain might work. I am, however, very interested in any actual data in this area, so if you've got any please node it or msg me a url!