AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

New Annual Contest with $25000 First Prize
 
 
  [ # 16 ]
Andrew Smith - Jul 30, 2014:

it is just a lot less tolerant of the cheating that has characterised the Turing Test in practice.

Cheating? What are you talking about?

 

 
  [ # 17 ]

Let’s just say it rules out the possibility of confusing judges, shall we?

I came around to some basic results for one type of Winograd schema, and I have some good news and bad news, depending on your point of view.

The good/bad news:
The popular examples we’ve seen so far were only the relatively easy ones, often solvable directly through knowledge. These are also among those with whom scientists have had a 73% success rate.
The questions near the end of the paper however are more challenging. I wouldn’t know for instance, how to solve this one without spatial relationships:

Tom threw his schoolbag down to Ray after he(?) reached the [top/bottom] of the stairs.

More good news is that contestants will be told the subjects used in the test beforehand, so that winning isn’t just a matter of having a database the size of the planet.

The bad/good news:
In the paper it is suggested that 40 to 50 questions would rule out guesswork. I checked with a random() function, and it turns out there is still a fair likelyhood of correctly guessing 75% of 40 questions. This means that until AI reach that bar, the contest will not be an encouragement for AI if a simple randomiser can walk away with $3000.

If this is not addressed, I shall see fit to participate not only with my small handful of NLP methods, but also with a random guess function to handle all questions that the AI fails to deduce smile

 

 
  [ # 18 ]

Don,
Have you found any information on the method for participation? Are you expected to send the AI in, travel there, will online participation be allowed? I could not find any specifics.

Vince

 

 
  [ # 19 ]

No, further details haven’t been made public yet. But I expect all forms of external access would be disabled to prevent remote human control. On the other hand, human responses to this type of question could be distinguishable by their longer response time, seeing as the AI doesn’t have to simulate human response times in this contest. Worth suggesting, perhaps.

The test will be administered on a yearly basis starting in 2015. The first submission deadline will be October 1, 2015. Additional details will appear at http://www.commonsensereasoning.org/winograd.

More information. Visit http://www.commonsensereasoning.org/winograd or contact Leora Morgenstern at leora.morgenstern () leidos.com or Charlie Ortiz at charles.ortiz () nuance.com.

 

 
  [ # 20 ]

On second read, I don’t think I have much chance of handling these questions. It looks like the AI would need to know all potential causes and effects of feelings, physics, properties and actions. It was said the organisation will disclose the words that will be used, but even then it requires more common knowledge than I could likely gather. I may have to leave this one to the pros.

Frank felt [vindicated/crushed] when his longtime rival Bill revealed that he was the winner of the competition. Who was the winner of the competition?
Sam pulled up a chair to the piano, but it was broken, so he had to [stand/sing] instead. What was broken?
The man couldn’t lift his son because he was so [weak/heavy]. Who was [weak/heavy]?

 

 
  [ # 21 ]

Does anyone know if this Winograd schema contest is actually happening? The link http://www.commonsensereasoning.org/winograd is not working for me.

 

 
  [ # 22 ]

To be frank, I don’t know. The site has been offline for at least two weeks now.
What I do know is that last March there was a big gathering of AI researchers discussing alternatives to the Turing Test, and the Winograd Schema Challenge was put forth as one of the defending champions. There was also never any mention of doubt or prerequisites to the contest being held as scheduled, nor has there been news to the contrary.
My guess would be they’re working on a dedicated website, as the one hosting it so far did not seem to be intended to host it.

 

 
  [ # 23 ]

The site is back up. No new details, but it’s still 4 months anyway.
I’ve only been able to come up with methods for 1/8th of the questions so far, and have severe doubts whether it’s worth implementing them for the rare odds of passing 2 questions in the Loebner Prize qualifying round, when pattern matchers are probably going to beat me at it anyway.

 

 
  [ # 24 ]

I’m looking forward to this contest and hope it does happen. I’ve been working on solving Winograd Schemas with AIML and (touch wood) my code should be able to handle most of the simpler ones. It will be a good test of it in the Loebner Prize pre-qualifier, as long as they don’t make the questions too long and drawn out.

If it works for the Loebner, I’ll look at developing it some more for this contest.

 

 
  [ # 25 ]

Well in that case, you’re on. smile
I have now begun to notice that several types of Winograd Schemas are susceptible to, shall we say, shortcuts. Not necessarily dumb, not necessarily intelligent, but still arriving at a correct interpretation. I have included methods for 3 types of these questions in my program, covering about 1/4th of 40 samples, though still dependent on some basic knowledge to infer from. Whether they will be of use in answering the Loebner Prize’s two questions will largely depend on luck in their choice of subjects.
For obvious reasons I shan’t reveal the methods, but I will blog about the vulnerabilities in various Winograd questions shortly after the submission date.

 

 
  [ # 26 ]

I am also adding a Winograd module. Any interest in sharing test questions?

 

 
  [ # 27 ]

I based my code on the structures in these questions:
https://www.cs.nyu.edu/davise/papers/OldSchemas.xml

Like Don, I too found a shortcut to possible success. And like the TED-X prize of writing a 3 minute presentation on any subject, it too requires little in the way of intelligence.

Without giving too much away, let’s take this example:
The trophy doesn’t fit into the brown suitcase because it’s too [small/large]. What is too [small/large]?

First of all you can ignore anything after “because”. All the information you need is in the first part of the sentence:
The trophy doesn’t fit into the brown suitcase.
or
X doesn’t fit into Y
From here, you can work out which is big and small, large or little.

I will post more after the contest if my method works.

 

 
  [ # 28 ]

Yep, I use those samples too. I would actually advise against using the other list composed by students, because it contains a lot of questions that don’t meet the official criteria for Winograd Schemas.

Monkeys hate lions because “they” are scary.

For instance, Google shows 10x more results for “scary lion” than for “scary monkey”. In the Winograd Challenge only Google-proof questions are allowed. Outside of the contest though, this is still a valid and helpful way to disambiguate a sentence, and probably a more human-like associative process than we like to think.

I tend to overdo things so I test all possible combinations of e.g.

X does (not) fit in Y because “it” is (not) small/large. What is (not) small/large?

It involves a bit of extra math. I’m no good at math though.

 

 
  [ # 29 ]

Interesting approach Steve! So if we treat the basic input as:

(1) X doesn’t fit into Y because IT is too big/small (Winograd)
(Or 2) X doesn’t fit into Y * (simpified version)

Then I agree you can work out which is big and small, large or little without using the ‘because’ part of the sentence. This would allow answering questions like WHAT IS TOO BIG etc. But I thought the idea of the Winograd schemas was pronoun resolution. Thus we might expect questions like WHAT DOES IT REFER TO and we would be expected to return X or Y depending on analysis of the latter part of the sentence. Does the ‘simplified’ approach allow for this?

That said the Winograd schema question in last year’s Loebner qualifiers didn’t include a pronoun check. So maybe it doesn’t matter.

 

 
  [ # 30 ]

All questions of the Winograd Schema Challenge follow the same consistent format. One of the pillars of this test is to be clear-cut in method and results with as little variables as possible, and “what does he refer to?” would introduce a second pronoun with a very different meaning: a quotation. Consider the confusion in this case: “The doctor refers the patient to a specialist, because he was better. Who does he refer to?”

This does however not guarantee that the Loebner Prize organisers will stick to the strict guidelines from the Winograd paper. It’s still a Turing Test, after all.

 

 < 1 2 3 4 >  Last ›
2 of 6
 
  login or register to react