AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Final 4 announced!
 
 

The leaderboard has just been posted. No transcripts as yet.

http://www.aisb.org.uk/events/loebner-prize#Results2015

Anyone do ok with the Winograd questions? Mitsuku didn’t

 

 
  [ # 1 ]

Very good progress for Lisa (55% to 80%) and Arckon (59.17% to 70.83%). Congratulations to all finalists. I think the questions were very hard this year (the best result is 83.33 this year, 89.17 last year). For me, the selection questions were more oriented to conversation (“Should Greece leave the Euro?”, “How do you think it’s going?”) than intelligence.

 

 
  [ # 2 ]

Probably 50-50 on winoschema for Rose.  Didn’t recognize room as a container so missed that one. Probably got the lending money one correct.

 

 
  [ # 3 ]

nice. Curious where this will lead to this year. Reminds me of the fact that is should update the winners on this website :-s

 

 
  [ # 4 ]

Will - Is that you?

Lisa was created by our very own Merlin of Skynet-AI fame. I remember from another thread that it was written in Python.

 

 
  [ # 5 ]

Steve is right. Lisa is mine. New interpreter, new code base. Written in Python.
She is much improved with a 25% jump in the ranking and most of the bugs eliminated from last year.

Although she is supposed to be thoroughly human, some robotic responses have still crept in. She is off-line only, and as such, is the only one of the finalists that does not benefit from an on-line debugging.

 

 

 
  [ # 6 ]

“Anyone do ok with the Winograd questions?”

I probably got a point or two out of the questions. A bug may have made me miss one completely.

 

 
  [ # 7 ]

Congratulations to all finalists.  Great job. The questions do seem difficult this year, however the finalists percentages are impressive.  I’m eager to see the logs.  Best of luck to each of you!

 

 

 
  [ # 8 ]

Congratulations to the usual suspects and Python-coded Lisa! Hopefully this year’s finals will be what last year’s should have been, and I’m interested in seeing Lisa in action. I like a bit of tech variety.
I am missing The Professor from the roster, but if so I don’t get why they didn’t manage to run one of several AIML entries.

I think the questions are a bit cheeky, or is that just me? In my eyes some of them are in-jokes for those who have read Alan Turing’s paper, and others border on trick questions and paradoxes. I will harshly criticise the questions later, of course smile, but for the time being I am amused.
I am satisfied with Arckon’s performance (thanks for noticing, Denis). Though I’ve improved plenty of his systems, I’d say his increase in score also has to do with the questions being simpler, linguistically.

Of the Winograd schemas, Arckon should have the first one technically correct, though interpreting “a bed” as beds in general. So the answer sounds just as unlikely as the question.

If a bed doesn’t fit in a room because it’s too big, what is too big?
“Beds are bigger. Rooms are much less big.”

@Steve: I’m not sure why your “not fit in” Winograd pattern didn’t work. Because of the comma parsing?
@Bruce: This is what I meant last time. I just programmed the word “in”, applied to any grammatical subject or object. Otherwise one would have to include cars, spaceships, cupboards, bread baskets, etc.

As for the second schema; not a chance. The triple ambiguity, the required knowledge and the awkward pronoun choice altogether make it too difficult for me to solve by genuine means. Seriously though: How common is it to use plural “they” as a reference to a specific single person? I know it’s used by Shakespeare and among transgenders, but I sure wasn’t taught this in college.

 

 
  [ # 9 ]

Congratulations to the finalists!


As Will and Denis here, my understanding of the 2015 questions is that they were more inclined toward conversation situations than they were in previous years.


It puts our programs under “conversation pressure” much sooner that the Live. Some programs are of course more prepared for that, previous finalists especially considering a Live is even harder than this. But competition in selection is still immense and every one gets their chance.


I believe it is a good move of the AISB to qualify entries this way.

I was myself awaiting to get into finals, to prove it could get there, before being able to free more time & budget to improve conversation modules. That played against me. This is certainly a time allocation problem all candidates experience. I’ll fix that and do better next year!


So… there is a difference in the questions, but as I said I believe the AISB did right and it is up to us developers to improve our results. We just can’t “cheat into” Loebner. wink


For the short story my entry ended up with the name Synth Life version B.

I didn’t take a first name this year; it is because the program changes name every time I launch its generation, as it is based on memory on an individual “human” scale and not a knowledge collection. I believe it was a man named Zach this time.


Version B (Zach) was a fixed version and should have got the first Winograd right.
Transcripts will tell me if it bugged or did it right.
It certainly did not get the second WInograd right though!

I hope I’ll be able to free time and resources to get a version C rocking next year!
Godd luck to our finalist for the Live, they should be very proud to get there this year.

 

 

 
  [ # 10 ]
Don Patrick - Aug 11, 2015:

@Steve: I’m not sure why your “not fit in” Winograd pattern didn’t work. Because of the comma parsing?

It was the way the questions were phrased that I hadn’t coded for. All the Winograds I had seen before were made up of a statement and then a question. I hadn’t anticipated the structure being in one sentence.

If a bed doesn’t fit in a room because it’s too big, what is too big?

Don Patrick - Aug 11, 2015:

I am missing The Professor from the roster, but if so I don’t get why they didn’t manage to run one of several AIML entries.

Peter Lafferty, who runs The Professor, decided he didn’t want to take part due to time constraints and so didn’t submit an entry.

 

 
  [ # 11 ]

You are right on that Steve.
I believe some words may have been very misleading too.

 

 

 
  [ # 12 ]

Will you be coming up to Bletchley Park again this year Christophe? It would be good to meet up once more. The Loebner final acts as a kind of social gathering for us like minded geeks smile

 

 
  [ # 13 ]

Using “they” is possibly a British thing. It would not be uncommon to hear someone say, “I saw Jim and they were going to the shops”. However, “he” would be more internationally correct I feel.

 

 
  [ # 14 ]

I take it it is British slang then, as I was taught official British English, top of all my classes throughout 10 years. I know of a similar nondescript pronoun used like that in Dutch slang.
If they’d said “he”, Arckon would have answered “Joe was broken. I don’t know by who money was needed”. I’d have to check what logic causes him to choose Joe then, but obviously he’s not making the connection between lending money and needing it. Actually it’s pretty simple when I put it like that.

 

 
  [ # 15 ]
Steve Worswick - Aug 11, 2015:

Using “they” is possibly a British thing. It would not be uncommon to hear someone say, “I saw Jim and they were going to the shops”. However, “he” would be more internationally correct I feel.

The “they” messed me up also. In the US, they is always plural. I expected “he”.

 

 1 2 > 
1 of 2
 
  login or register to react