AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Understanding what we see, the basis for AI
 
 

I’ve written a few times before on this forum about my opinion on evolution of AI. I’ve written that I actually don’t believe the breakthrough in AI will come from NLP, but from understanding the visual world, without using nor understanding a single word.

Today, I’ve read this comment http://www.chatbots.org/ai_zone/viewthread/213/#1438 by CR Hunt which really triggered me to create this separate post.

In my opinion, this is how the evolution of conversational AI will look like:
-understanding which sensory observations belong to the same object (sound, vision, touch, others sensors). Nowadays referred as ‘multi-modal’ in the AI world.
-understanding similar things in (3d) photo’s (without using language. If this is a car, this is a car too). Actually Google Goggles is a typical example: http://www.youtube.com/watch?v=Hhgfz0zPmH4. Google Goggles is ‘visual search’, it’s not about words.
-understanding similar behaviour in (3d) video: movements (talking is behaviour as well in this sense, lips and body and moving). Security systems detecting suspicious behaviour in car parks are early examples).
-add language to objects (like the Label Me project suggested by CR http://www.csail.mit.edu/node/127)
-add language to behaviour/movements/changes
-add language to object relations (which objects go together most likely?)
-understanding meaning and intention (purpose). This is future oriented. If you understand that everyone has a purpose, you can anticipate on behavior.
-add language to meaning and intention, understanding complex abstractions (‘multi-nationals?’

This is actually exactly how babies/children learn. They see, they feel, they hear. They first have to combine observations (even their left/right senses of identical organs) of their senses. They hear some noises which apparently aren’t correlated with their observations. They have a doll, an later on another. They pet a cat, and later on another. They smile to a human, and later on to another.

I believe research in NLP is extremely useful but will really have powerful applications, when we’ve solved the visual challenges, and have built fully visual interfaces (screen with always enabled built in webcams).

Actually we could use this thread to write a joined article. We could submit this article to AI magazine. Wouldn’t that be cool? Feel free to add to your thoughts/additions/critics below.

 

 
  [ # 1 ]

I believe the knowledge representation model for both NLP and vision should be same. Even a congenital blindman can learn language too. So I believe the machine vision could contribute to NLP but not necessary.

 

 
  [ # 2 ]

I agree with Nathan, visual and audio recognition is not a hard requirement for NLP and an intelligent agent.  Like he said the blind can learn concepts and discuss things, so can the deaf.

 

 
  [ # 3 ]

The blind still have their tactile and auditive cues: they know what they’re talking about

 

 
  [ # 4 ]

“Understanding what we see, the basis for AI”

The basis of AI, or core of AI,  is knowing what to do with the things we see, and the words we hear.  Our eyes, ears, noses, are ony input devices.

The input is evaluated and stored as knowledge.

But what do you do with that knowledge?  how do you reason and understand language?  Those algorithms that deal with that information that has been translated by the senses, those algorithms that combine the knowledge, use that knowledge to understand language, THAT is the basis of AI.

Recongition of visual images to internal representation is only one requirement.

If you have no algorithms that use that information to help understand language, and no algorithms that take that information and use it, and combine it with previous knowledge to deduce an answer, you don’t have AI.

so I disagree that “the basis of AI is what we see”.


here is another example, imagine you have an AI that can understand NL.  It sees nothing, hears nothing, feels nothing.

But you explain the game of chess to it… but you give it NO strategies, just how to play the game.

And it figures out its own way to win and beats you.

you’re saying that would not be AI ?!!!  it would be AI, strong AI !!

It may ever well be possible to model the entire world inside a virtual world.  The bot could “know what it is talking about” without any kind of visual image processing, but reason about the world, communicate, plan and learn and understand language, all without the same type of visual and audio wave processing we do in our physical world.

 

 
  [ # 5 ]

The ability to build a concept of an object by integrating together the input of multiple sensory inputs would definitely qualify as intelligence, assuming it could make inferences about a new object based on its knowledge base. For example, if a sense-enabled robot could feel, hear, and see a car from a few angles, then be presented with a picture of a car at a new angle or the sound of a car, but from far away, and still deduce that its a car.

And there need be no language involved. Indeed the “language” of the AI could really just be pointers. Every object can be described by a series of pointers: this pointer points to all cars, this to all things dark red, this to all things red (hierarchies emerge naturally), this to all things that emit noise in a given frequency range (may include other vehicles, or your annoying uncle).

Really, I think what you’re describing as a necessity for AI (the ability to integrate external stimuli into abstract concepts of objects) is actually more like a necessity for the Kingdom Animalia. (Or at least most of its members.)

On a side note, I remember a conversation I had with a guy named Gar Kenyon at LANL a few years ago. He works on building artificial eyes using our eyes as a model (http://synthetic-cognition.lanl.gov/publications.php) . He thought at the time that movement was a necessary requirement for image processing. When we’re trying to make out the lines that delineate one object from another, we can test how the objects move against each other by moving our head forward and back and side to side a little. Enabling a camera to do the same might help software understand objects in static 3D scenes. It would also mean that projects like “Label Me” might not be enough to develop advanced object recognition software.

 

 
  [ # 6 ]

@Victor: you ares aying interesting things again buddy:

Victor Shulist - Sep 9, 2010:

“Understanding what we see, the basis for AI”

The basis of AI, or core of AI,  is knowing what to do with the things we see, and the words we hear.  Our eyes, ears, noses, are ony input devices.

The input is evaluated and stored as knowledge.

OK, I agree on this. I actually meant you first need to observe, based on cues on your input devices cues, combine these observations and model the outside world.

Victor Shulist - Sep 9, 2010:

But what do you do with that knowledge?  how do you reason and understand language?

My claim is that you can reason without language. Did you know that there are languages out there with no more than 100 words? Apes don’t even have words at all, and their behaviour can still be pretty intelligent.

Victor Shulist - Sep 9, 2010:

“the basis of AI is what we see”.

I might have used the wrong word. What I actually meant is that in order to achieve our ultimately Big Hairy Goal, being more intelligent than human being, that we first need algorithms to understand our own world: recognize objects in visual cues.

Then we go from there and use the knowledge we have built to take the next steps. So that what I meant by ‘basis’, milestone might have been better…

 

 
  [ # 7 ]
C R Hunt - Sep 9, 2010:

Really, I think what you’re describing as a necessity for AI (the ability to integrate external stimuli into abstract concepts of objects) is actually more like a necessity for the Kingdom Animalia. (Or at least most of its members.)

LOL LOL

In 1993, I wrote a thesis on artificial neural networks. Input stimuli: pictures of a car, a cat etc. Desired system: if you offer another car, or a cat, it would be recognized. Actually Google Goggles is approaching this.

Which is probably also part of this: Google introduce face search in 2007:

http://images.google.com/images?q=erwin van lun&imgtype=face

The next step would be to combine this feature with Facebook profile. A super super super computer could spot humans in all photo’s on the web, where they’ve been and what they have been doing.

Scary, but intelligent technology which you could also use simply to recognize the people around you.

 

 
  [ # 8 ]

This seems to be one of those AND versus OR arguments smile

Ok, I agree with you CR that there would certainly be some intelligence required to have an abstract concept of a car, and given several examples, deduce a specifci instance of a car being a car.

But really, how does that compare to being able to teach a bot in natural langauge a subject like physics, electronics engineering, law etc and then giving it a complex problem (not a single line) but an interactive discussion and have it form an opinion and combine knowledge to deduce and answer, or form a plan.

I think there is much more involved in that than ‘simply’ deducing a specific car being a car.

@Erwin - milestone, perhaps, dependancy even better word perhaps.  But I see your point, if you seek to have a bot learn and think the way children do.  My own approach is to have a bot learn and use language, in a way that is most easily done using a von nueman machine, finite state machine, after all that is what it is smile

Intelligence behaviour without language? sure, absolutely. 

BUT, this does not mean that a bot that reasons with language does NOT have intelligence.
There are perhaps many ways a biological or artifical being can be intelligent (from knowing a car is a car to learning an advanced subject).  But the age old problem of not knowing EXACTLY what intelligence is, always hinders our discussions.

Yes monkeys are ‘intelligent’ , but seriously, how intelligent?  I mean compared to a human? can they be taught calculus ?  form a government?  Or do anything that we do that requires language ?

 

 
  [ # 9 ]

At the end of the day, the term “intelligence” encompasses a broad spectrum of behavior and ability. Certainly language allows for quite complex ideas that would be hard (or impossible) to formulate in other types of information encoding. (Such as a recipe for identifying cars, which certainly must be thought about and revised, but not necessarily with language as we know it.)

I think it’s important to point out that some behaviors that are not considered signs of intelligence (or at least, conscious intelligence) in people are certainly signs of intelligence in a robot. Our brains learn to wire themselves to recognize images after we are born, using vast quantities of data input from our eyes. But we are never consciously aware that this is happening (or at least, not that we can remember!).

As far as the sociological impact of picture recognition software, I think it is frightening. We’re going to end up in a global society with the level of privacy typically found in a small village. Everybody who knows you will be able to know everything you’ve done, with the internet gossiping to anyone who cares to listen.

 

 
  [ # 10 ]
Victor Shulist - Sep 9, 2010:

But really, how does that compare to being able to teach a bot in natural langauge a subject like physics, electronics engineering, law etc and then giving it a complex problem (not a single line) but an interactive discussion and have it form an opinion and combine knowledge to deduce and answer, or form a plan.

One of key differences with ape intelligence is that we’re able to foresee what’s coming, for example a water flood, and that we take pre-measures against it by co-operating.

Language is another intelligent thing.

What you’re talking about is making abstractions, talking about law, about physics, about electronics. But let me give another example of abstraction:

-this is a ball
-this is another ball
-this is another ball
-as you can see, ball are flexible
-balls are typically 30 centimters (10 inch) diameter
-and a ball roles
-and a ball spins

Now back to physics
We start to talk about atoms but no one ever has seen in his life. But we know about the concept of balls and we can understand how atoms move.
We understand how water flows, and thus we understand how electrons move in an electrical field.
We understand what ‘adding’ means, and now we learn about the ‘+’ symbol.
We understand that each object that does not move is stick to the surface because of gravity, and now we can create formulas.

In another thread, I wrote about common sense knowledge. If we start with common sense knowledge, then it’s much easier to talk about abstract concepts.

So an electron is something like a ball, but then smaller?

So an electron spins? yes
So an electron is round? yes (I’m not sure, but just as an example)
So an electron roles.. mmm, not really, and here the difference comes in.

 

 
  [ # 11 ]

Yes, and I believe that with the correct algorithms and knowledge representations that we can encode these concepts and rules and relationships between them, without visual recognition, I don’t see that as a requirement.

I can encode these in all kinds of ways and not require a visual representation of a ball.

Just off the top of my head I can define what a bouncing ball is…

at time X, object A is at ground level
at time X+n1 , object A is at ground level + 1 foot
at time X+n2, object A is at ground level
at time X+n3, object A is at ground level + 1 foot

where 0 < n1 < n2 <n3

You define these sequence of events to the bot as ‘bouncing’.

No visual recognition required.

Yes, for the A.I. that controls a robot, sure, absolutely it needs visual recognition. 

But any information that does not require visual or audio input, we don’t need to worry about physical transducers - all input is abstract information in the form of language.

My point is that reasoning and creating a useful, production system that performs intelligent action, does not need to necessarily get audio/video input from the external world (Instead, this ‘information from the external world’ is all the statements encoded in its knowledge base), except where it is mandatory, in controlling a physical robot.

I think a lot of AI researchers are thinking that we have to wait until we get full visual/auditory recognition because so far, we have been unable to create a bot that can pass the Turing test, and they think, we don’t have full visual/auditory recognition… so.. that must be the reason ! No, that is an assumption, and I think its incorrect.

 

 
  [ # 12 ]

If we want, we can easily simulate the tactile or more in a virtual world.

Finally, bots will have vision, audition or more senses. But the lack of senses won’t stop our NLP research. Even in a virtual world, natural language is practical valuable.

We do not need to wait for the vision and the audition of bots.

Erwin Van Lun - Sep 9, 2010:

The blind still have their tactile and auditive cues: they know what they’re talking about

 

 
  [ # 13 ]

I think NLP research has more immediate practical benefit than visual or audio AI, and less disturbing side effects. Not to mention, it’s much more straight-forward to pursue as an amateur. Visual analysis introduces a host of mathematical and technical issues that I’m not academically familiar with. Though I must say, I think we’ll have software that can accurately tag photos before we have a functioning turing-style chatbot. (Heck, we’re almost there with the former.)

 

 
  [ # 14 ]

Just discovered this website: http://roboticscourseware.org/ A veritable fount of information about development of sense-enabled robotics.

From the site:

In developing and populating the site, we have prioritized the following:

  * Providing original, easily-modifiable curricular content, typically in .ppt and .doc formats
  * Covering the range of primary areas of robotics pedagogy, including robot mechanics, control, motion planning, vision, and localization, with less emphasis on secondary areas and courses in which robotics is used as platform to teach concepts in other academic areas

 

 
  [ # 15 ]
Victor Shulist - Sep 10, 2010:

I think a lot of AI researchers are thinking that we have to wait until we get full visual/auditory recognition because so far, we have been unable to create a bot that can pass the Turing test, and they think, we don’t have full visual/auditory recognition… so.. that must be the reason ! No, that is an assumption, and I think its incorrect.

I actually most AI researchers have a strong focus on NLP, that why I started this thread.

Nathan Hu - Sep 10, 2010:

If we want, we can easily simulate the tactile or more in a virtual world.

True AI is even more intelligent than human beings. That would imply it can also measure warmth, and all kind of (quantum) waves, which is can only measure in the real world because we don’t even know *yet) how to simulate them, but true AI might be able to interpret them, without humans understanding the waves. On the other hand, true AI might also be able to simulate the behaviour in virtual worlds. True as well :-s

C R Hunt - Sep 11, 2010:

I think NLP research has more immediate practical benefit than visual or audio AI, and less disturbing side effects.

I defintely agree! What I want to say is that visual search will boost the progress on NLP.

Suppose we would measure NLP capabilities:

1966 - 1
1980 - 10
2000 - 20
2010 - 50
2020 - 200
2030 - 5000

The acceleration after 2010 is mainly to the addition on visual cue interpretation. Withtout that capability NLP intelligence would end on 200 in 2030.

I’ve now way to prove the numbers above btw. Just a way of illustrating my point.

 

 1 2 > 
1 of 2
 
  login or register to react