AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Loebner Prize 2015
 
 
  [ # 16 ]

@Daniel - Don beat me to the link. I hope Mitsuku does well with the Winograds but as always, it depends on what the bots get asked and how the judges feel they responded.

 

 
  [ # 17 ]

Uberbot entered the comp, although he was too big for the dropbox.
< rant > Even though I submitted Uberbot well in advance it took several days for him to travel the 100 odd miles from the channel islands to London via the postal service. This is in spite of using Jersey post’s inaccurately named “next day delivery” method. < /rant >
Anyway good luck to all entrants,
Will

 

 
  [ # 18 ]
Don Patrick - Jul 3, 2015:

Glad to have you on board Merlin, I take it then that Lisa from last year was your entry too.
Semi-genuine techniques in my case.

Yes, last year’s Lisa was mine too.  I decided to enter at the last minute, and a late change introduced lots of bugs. Although it didn’t come in last, there were lots of errors in the output and it still had many robot related responses.

Lisa is coded in Python (first Python bot to enter?). Although the design takes inspiration from Skynet-AI, all of the code/interpreter had to be done from scratch. Unlike Skynet, Lisa tries to act like a human. I forgot how hard it is creating a bot from scratch.

Will’s bot is 600mb+. Lisa checked in via dropbox at a mere 5.8MB.

I don’t know if I can crack the top 4, but Lisa should make a much better showing this time.

 

 
  [ # 19 ]

Winograd questions are no different than other questions, even easier. They require guessing one of 2 at worst (assuming you can detect the 2 entities referred to).  They are just generic pattern match problems, like any other question you might get from a user.

 

 
  [ # 20 ]

Bruce, it is interesting that the schema does give you the answer in the question so you could just take a guess at one of the two subject/object words. Seems to me to be a weakness in the Winograd schema, that allows a simple guess to have a good chance of being correct.

 

 
  [ # 21 ]

true. And even so. for existing questions, it’s just a pattern match made general. Here, for example, is the pattern for last year’s Loebner qualifier wino question:

#! The car couldn’t fit into the parking space because it was too small.  What was too small?
#! The trophy would not fit in the brown suitcase because it was too big. What was too big?
#! The table won’t fit through the doorway because it is too wide. What is too wide?
#! The table won’t fit through the doorway because it is too narrow]. What is too narrow?
s: ( _~noun _0!?~xbeings * not *~1 [ fit squeeze cram] * [in into within through inside on onto on_top_of underneath under] * _[~containers ~surfaces door ~conduits] * because * it)  refine()
# it was too small
a: ([~undersize_adjectives narrow thin ~short_adjectives])  ^winoanswer(‘_1)
# it was too big
a: ([~oversize_adjectives wide thick ~tall_adjectives]) ^winoanswer(‘_0)

 

 
  [ # 22 ]

That’s kind of a brute force approach. I didn’t spot many sample schemas that followed that pattern, but thanks for explaining in what way you considered them easy.
Winograd Schemas are indeed succeptible to guesswork, which is why they aren’t particularly reliable in Turing Tests. In the Winograd Schema Challenge they believe that a large number of questions (40) will distinguish genuine efforts from guesswork. It’s good to include them for the curiosity factor though.

 

 
  [ # 23 ]

Sort of by definition, all of the ones given in the paper and in other sources can be handled that way.  Only new ones cannot, but then again new ones of normal things like “tell me about astral projection” can’t be handled usefully without previously scripting them. But there is a scripting distinction between the 2-sentence schemas and the 3-sentence schema examples.

 

 
  [ # 24 ]

I used an approach similar to Bruce’s (although I don’t have his extensive list of nouns). It boils down to:

1 - Identify an input as a Winograd query.
2 - Attempt to answer the simple queries based on lexical analysis
3 - If not simple, extract the potential answers and try a best guess (I stopped working on these to devote time to other parts of the bot).
4 - If unable to answer or guess, extract the meat of the query to try to get at least a point.

 

 
  [ # 25 ]

The latest ChatScript release, in its REGRESS/CHATBOTQUESTIONS/wino.txt has a list of paired wino-grad schema questions used for testing (after :quit those are ones Rose doesn’t handle yet).

 

 
  [ # 26 ]

Merlin, thats more or less what I did.

1 - Identify an input as a Winograd query.
2 - Check if the input follows one of the set patterns to deduce the answer.
3 - If not simple, extract the potential answers and try a best guess.
4 - If unable to answer or guess, then as a failsafe - use the input sentence subjects and object in the output to try and get a point.

It will be interesting to see how all the bots do against the Winograd questions, looks like there will be at least two.

 

 
  [ # 27 ]

I skipped step 1, because I want to correctly resolve references in any kind of input.

 

 
  [ # 28 ]

Don: You say: That’s kind of a brute force approach. I didn’t spot many sample schemas that followed that pattern, but thanks for explaining in what way you considered them easy.
It’s brute force to try to infer meaning by using patterns, but really that is simply the requirement of how to deduce the pronoun. You have an understanding of the world (what things make sense) and what pattern you have in your input. It’s brute force in that you have to enumerate each possible meaning (whether you do it by scripting or by machine learning).  I’m on the scripting side because I don’t think machine learning is going to learn anything not given it already. So neither it nor me is going to recognize what the obvious fact is that makes some new winoquestion tick.

 

 
  [ # 29 ]

smile I withheld my comment because I didn’t want to argue too much: You and I have different purposes and the corresponding tools. At some point everything is patterns. By “brute force” I was thinking of how many more patterns like that you’d have to script to cover the amount of new Winograds that I expect, considering that they include specific nouns.
While my preferred approach is basically just further generalisation to bare-bones axioms applied to semantic roles, you’re right in that it won’t figure out new axioms on its own and it does have to go over all of them to see which apply. In my case: to calculate the probability of each candidate answer before selecting the one with the highest % probability.

For spatial relations, there is not much to generalise: Even if one turned the whole input into a virtual 3D representation, everything would still be derived from the straight-forward axiom “X in Y = X size < Y size”. At most this axiom can be expanded to include opposites “despite being small”, “not small enough”, and “fit around”.

For other pronoun resolutions, there are more general axioms hidden deep in semantics. I made one to theoretically infer any adjective trait from any statement, but as with any inference this relies on available knowledge. I prefer inference because its basic principle is that it can take two facts and deduce a third that it was never told. From three facts it can deduce three more, etc. Its greater flexibility and complexity are however paired with greater risk of error, and I have no idea which will do better at Winograd Schemas in practice. What I do know is that it won’t sound more human, my program has a tendency to over-elaborate.

 

 
  [ # 30 ]

Thank you, Don. I’m certainly not seeking an argument. I was just curious about your viewpoint.  I, too, am interested in generalization.

Certainly for the Loebner qualifiers, I don’t expect them to generate a new wino schema question. I expect them to use an existing one pretty much as is (maybe minor changes).  I will be curious about the Winograd Schema contest itself whether they merely reform existing ones or generate totally new ones.

 

 < 1 2 3 4 > 
2 of 4
 
  login or register to react