AI Zone Admin Forum Add your forum

NEWS: survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

This Needed to be Said.

In a terrific paper just presented at the premier international conference on artificial intelligence, Levesque, a University of Toronto computer scientist who studies these questions, has taken just about everyone in the field of A.I. to task. He argues that his colleagues have forgotten about the “intelligence” part of artificial intelligence.

Levesque starts with a critique of Alan Turing’s famous “Turing test,” in which a human, through a question-and-answer session, tries to distinguish machines from people. You’d think that if a machine could pass the test, we could safely conclude that the machine was intelligent. But Levesque argues that the Turing test is almost meaningless, because it is far too easy to game…

To try and get the field back on track, Levesque is encouraging artificial-intelligence researchers to consider a different test that is much harder to game, building on work he did with Leora Morgenstern and Ernest Davis (a collaborator of mine). Together, they have created a set of challenges called the Winograd Schemas, named for Terry Winograd, a pioneering artificial-intelligence researcher at Stanford. In the early nineteen-seventies, Winograd asked what it would take to build a machine that could answer a question like this…



  [ # 1 ]

I think that most serious AGI researchers acknowledge the Turing test only to the extent that they expect that their AGI, if successful, will pass the test as a side effect of its intelligence. The ones interested in directly tackling the Turing test right now are mostly the chatbot authors, who only rarely have any real aspirations to general intelligence for their work. The ‘parlor trick’ approach mentioned isn’t being taken by any serious AGI researchers, and will likely never result in a bot that can actually pass the Turing test (as opposed to merely doing better at it than its competition, which is what competitions like the Loebner Prize test right now).


  [ # 2 ]

Nice grin
But indeed as Jarrod indicates, I would not tie the loss of interest in AGI to the Turing Test. I suspect it’s about research funds, and products based on keywords such as Siri are more financially rewarding on shorter terms of development.

The large ball crashed right through the table because it was made of Styrofoam. What was made of Styrofoam?

This is the kind of test I’d like to participate in (obviously, because it’s an NLP test). At first glance I considered these questions too easy: The answers are right there, and my own program can usually figure out what “it” means easily enough from grammar alone. But then most of these questions deviate from my grammatical rule of thumb, confusing even myself until I stop to reconsider the probability of a styrofoam ball breaking a table smile

In all fairness I should mention that previous Loebner Prize qualifying tests did feature a few questions like “The ball was hit by Bill. What did Bill hit?”, and any Turing Test may feature such questions as stabbing people with towels (breaking tables with styrofoam balls), but actual understanding was never so strict a requirement for passing as it is in this test.


  [ # 3 ]

Now wait a minute. While I really like those ‘Winograd Schemas’ in the test because they require indeed quite some ‘understanding’ of the world and would make up a nice test… Did I miss the memo somehow that the Turing Test is easy to pass these days and that there exists a bot somewhere that can consistently do it? I am confused by all the hints in the article that chatbots rely mostly on google to come up with answers in a conversation, which I think is simply not true?

Nevertheless, interesting article, if only because I never heard the word ‘anaphora’ before - thanks for sharing it Andrew!


  [ # 4 ]

I liked the paper but I disagree with the primary thesis:

I suggest in the context of question-answering that what
matters when it comes to the science of AI is not a
good semblance of intelligent behaviour at all, but
the behaviour itself, what it depends on, and how
it can be achieved

I believe what matters when it comes to the science of AI is a good semblance of intelligent behaviour, regardless of how it is arrived at. As I have said before, what I strive for is the “illusion of intelligence”. If I can use some “party trick” to accomplish the task easily, so much the better.

Until we have a computational environment that more directly emulates the human brain, researchers need to take shortcuts or use other algorithms to accomplish our goals.

The ability to correctly resolve Winograd Schemas and anaphora is a nice goal, but so is being able to emulate a human in a free flowing conversation.

I would submit that resolving anaphora is not that important in a conversational AI. At worse, the AI can guess at the resolution, which gives it a 50/50 chance of being correct. If wrong, the human often corrects the bot and that resolves the issue within 1-2 volleys.

But, in a conversation, there are often clues that can give a bot a much better chance at resolving the anaphora correctly. Even if these do not exist up to the point in the conversation that the statement is made, it could be left unresolved until ultimately it can be resolved, guessed or ignored.

Leaving it unresolved (or assigning both items the attribute until resolution) is similar in concept to using particle swarms with 2 particles.

A particle consists of an assignment of a value to a set of variables and an associated weight. The probability of a proposition, given some evidence, is proportional to the weighted proportion of the weights of the particles in which the proposition is true. A set of particles is a population.

The author also points out another “cheap trick” that lets an mature AI guess at the correct statement:

If you can find no evidence for the existence of something, assume that it does not exist.

This leads the AI to increase the number of correct responses. And, all it takes is 1 correction to add to the AI’s knowledge base. What percentage of correct responses would be good enough. Would humans of different ages perform better/worse on a similar test?

We want multiple-choice questions that people can answer
easily. But we also want to avoid as much as possible questions that can be answered using cheap tricks (aka heuristics).

Consider for example, the question posed earlier:
Could a crocodile run a steeplechase?

The intent here is clear. The question can be answered by
thinking it through: a crocodile has short legs; the hedges in a
steeplechase would be too tall for the crocodile to jump over;
so no, a crocodile cannot run a steeplechase.

Skynet-AI responded:

USER:Could a crocodile run a steeplechase?
AICould a crocodile run a steeplechaseWhy do you ask

“Why do you ask?” may be a correct response to a nonsensical question. But, what if I also told you that the AI brought up a wiki page which contained a place called “Steeplechase Park” that included a thrill ride called “Crocodile Run”. Would the answer then be yes?

I think the real question is related to the “bot grounding” problem. There are 2 environments that AIs live in, open and closed worlds. In a closed world, everything is known. The objects, properties and relationships are all well defined. Some might say this is an expert system.

In an open world, some are known, others are unknown. Some things may be learned during the course of a conversation and some things must be calculated. The “computational linguistics” may include things like math, comparisons, reasoning and logic. Humans have a rich environment and are nurtured from birth to learn about an increasingly rich set of things in their environment. No such support system is there for an AI. Why should we expect a 4 year old AI to communicate and reason at the same level as an adult human?

The process of understanding what is known and unknown, how to represent it efficiently, how to do the comparisons and calculations that humans take for granted, and the learning methodology that integrates all, makes AI a varied and interesting field.



  [ # 5 ]

Interesting opinion. I think the usefulness of this test depends on your goal. Confusable statements like these don’t occur too often in an (English) conversation, so a chatbot may well ignore them and get along well enough. On the other hand if you were building an AI for data extraction from written language, you would want it to understand the contents as correctly as possible, so a supporting reasoning process would be useful then.

But even that isn’t really the point of this test, the point seems to be able to detect with uncheatable accuracy if a program contains intelligent processes similar to a human’s. When shortcuts suffice to your ends, this test does need to not apply.


  login or register to react