|
|
Senior member
Total posts: 370
Joined: Oct 1, 2012
|
The transcripts for the first round of the bragging rights contest are up. Even though we only had (2) two bots sign up, we decided to go ahead with the contest to see how the format worked. Overall everything ran smoothly and the idea of user supplied questions worked like a charm, and in fact I feel that it added greatly to the diversity of the field. Unfortunately, as with the chatbot league, there was virtually zero interest, so we will also be discontinuing the bragging rights contest. The transcripts are here http://ai.r-i-software.com/BRAGGING_RIGHTS/transcripts.asp The peoples choice poll is up at this location http://www.surveymonkey.com/s/8GTLFL6 for anyone who might be interested. Polling ends at mignight Tuesday (PST) and the final results will be published on Wednesday.
VLG
|
|
|
|
|
Posted: Jul 22, 2013 |
[ # 1 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Thanks for posting the transcripts, Vince. It’s a shame that there isn’t more interest—the transcripts could be a gold mine for developers.
*LOL* Well, very young humans use to do know what they prefer, and that very loudly.
Incidental question on English grammar: is “use to do know” (emphatic “do” I suppose) a standard formulation? A British-ism or some other exotic origin? I thought I’d captured all the exotic verb tenses in my bot, but this one would fail.
“used to do”—> past perfect progressive “do”, with “know” left flapping in the breeze. Unfortunately, it reduces “use to + inf.” as an idiomatic past tense before recognizing emphatic “do”‘s (“I do know the answer!”). I guess it would still work out in this case though, as the bot would parse two versions of this input: “very young humans know what they prefer” and “very young humans use to do what they prefer”, both of which are pretty much on the mark.
|
|
|
|
|
Posted: Jul 22, 2013 |
[ # 2 ]
|
|
Senior member
Total posts: 473
Joined: Aug 28, 2010
|
The judge in this case may not have been a native speaker of English (they stated they were in Germany) and not realised that the “do” was extraneous. Even for native speakers it’s not uncommon to accidentally insert the odd wrong word and not pick it up unless they proof read it. Look how often you see an entry in an instant message log immediately followed by an asterisk to acknowledge a mistake like that.
There’s an entire sub-discipline in discourse analysis devoted to what are called “speech repairs” where the software tries to figure out how to correct the inevitable mistakes in input in the way that makes the most sense.
|
|
|
|
|
Posted: Jul 22, 2013 |
[ # 3 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
That may be the case, Andrew. I just want to make sure I’m not missing something important grammar-wise. Out of curiousity, how do you handle “speech repairs” in your parser?
Handling grammar “mistakes” and idiomatic English is a fascinating and frustrating adventure. Right now I’m taking the “not right now” approach of doing substitutions between idioms and literal meanings, without much consideration for what the choice of idiom means for the intent of the sentence. (I treat idioms the same as mistakes—it’ll try to parse both, as well as the literal/corrected version, and treat them as synonyms.)
These are somewhat “smart” substitutions in that they can handle a phrase containing words that may vary. For example “Give the winner a big hand” means “Clap your hands for the winner” or “Applaud for the winner”, but one may substitute “the winner” for “Andrew” and still want to invoke the same substitutions. Of course, it doesn’t always work smoothly: “Give the winner a big foot” could produce “Clap your feet for the winner”. This one’s kinda funny, but one could imagine a substitution like this just turning out wrong.
The thing about a case like “use to do know” is that a substitute approach can be dangerous for rare mistakes, or mistakes that can actually be grammatically correct in other situations. For example, “The activities he use to do come from his upbringing.” It’s not the best English, but probably much more common than “use to do +inf”—> “use to +inf”. And if both results are produced each time, the bot could produce some wonky responses.
I guess once you have a large database of common phrases, this might take care of itself. If an uncommon case is in the database for a particular phrase, then it only is “believed” when that particular phrase is found and not in the general case. But if searches like this are performed for each input, they better be efficient.
|
|
|
|
|
Posted: Jul 22, 2013 |
[ # 4 ]
|
|
Senior member
Total posts: 370
Joined: Oct 1, 2012
|
“It’s a shame that there isn’t more interest—the transcripts could be a gold mine for developers.”
Interesting observation C.R. Im wondering now if that fear wasnt the reason for the lack of participation. If thats the case, it stems from a lack of understanding of how the patent \ trademark \ copyright laws work. Responses would be considered copyrighted material, and unlike patentable material publishing them actually establishes your right to claim ownership. Im going to check with our legal dept tomorrow, but in the meantime for the transcripts that were published I’ve included a boilerplate copyright notice at the bottom of each page.
VLG
|
|
|
|
|
Posted: Jul 22, 2013 |
[ # 5 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
I hope nobody was put off by the idea that others could scrape their bot’s replies; I wasn’t trying to imply that anyone would do this. I meant more that the questions and conversations themselves would be useful for developers indirectly, to help guide which areas of their bot need improvement.
Besides, I doubt a bot would get very far if it relied solely on regurgitating matched replies, without regard to topic or context.
Perhaps the problem was advertizing? Did you notify other forums? Email participants in the Chatterbox Challenge/Loebner Prize/etc?
|
|
|
|
|
Posted: Jul 22, 2013 |
[ # 6 ]
|
|
Senior member
Total posts: 473
Joined: Aug 28, 2010
|
The subject of speech repairs is not something that I’ve studied in great detail myself yet. Like you I’m saving that problem for later because it would be enough for now just to be able to handle standard English properly.
However I did do enough research to reassure myself that it is actually going to be ok to do that. Just as there are a finite (but large) number of valid constructions that can be used in standard English, there are also a finite number of constructions that are peculiar to particular dialects and idioms, specialised or non-standard uses, and errors stemming from illiteracy or lack of proficiency in the language.
Ignoring completely random errors for the moment, all the erroneous and non-standard uses of the language can be incorporated in the grammar rules just as easily as all the correct uses. Solve the problem of defining all the correct uses in a way that parsing software can use, and you’ve solved the problem of handling all the incorrect uses as well. At least in theory anyway.
The parsing software that I’ve developed handles the problem of random errors by recognising and classifying as much as it can in as much detail as it can. For example, it can recognise a particular grouping of letters as a word, and if the word is in the lexicon then it might recognise it as a particular verb, or even a particular sense of that verb. If it doesn’t recognise the word it can at least guess that it might be a verb from its position, and that narrows the options for correcting it.
Depending on how the grammar is designed it may even just ignore anything it doesn’t recognise, but I think it would be smarter if the software is aware of such anomalies and can challenge the user on them, just as a human would challenge another human who wasn’t making sense.
|
|
|
|
|
Posted: Jul 22, 2013 |
[ # 7 ]
|
|
Senior member
Total posts: 370
Joined: Oct 1, 2012
|
CR
“Perhaps the problem was advertising? “
Im beginning to think the problem was timing. It was probably a mistake trying to get that many geeks to concentrate on something else… while Comic Con was going on
Vince
|
|
|
|
|
Posted: Jul 23, 2013 |
[ # 8 ]
|
|
Senior member
Total posts: 336
Joined: Jan 28, 2011
|
Vincent Gilbert - Jul 22, 2013: CR
“Perhaps the problem was advertising? “
Im beginning to think the problem was timing. It was probably a mistake trying to get that many geeks to concentrate on something else… while Comic Con was going on
Vince
Maybe there needs to be a prize (and/or $$$) associated with the contest to up the stakes? One way of generating a pot would be to have an ante system for contestants.
|
|
|
|
|
Posted: Jul 23, 2013 |
[ # 9 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
That’s a great idea, Carl, but where would the funds come from? The three funding sources for prize money generally come from:
a.) a sponsor,
b.) an entrance fee for competitors, or
c.) charging the people who come to witness the competition (just like a professional sport)
Of the three, the first option is the most viable, but the search for sponsorship is akin to the quest for the Holy Grail ®. The next, of course, is charging an entry fee to all competitors, and that’s a bit of a problem for new botmasters who don’t really think they have a chance, but want to see how their chatbot “stacks up” against the best and the brightest. Option C isn’t really worth serious consideration, but it IS, after all, an option.
Often times, the individuals who organize these contests do so out of love for the field, and as such, foot the (not insignifigant) bill for hosting and in some cases special software to track scoring and such, and the prospect of also fronting (read: giving away) prize money on top of all other expences is not usually possible.
Of course, if you wanted to sponsor such a competition, I’m sure you’ll find no shortage of people who would love to work with you to put together a chatbot contest.
|
|
|
|
|
Posted: Jul 23, 2013 |
[ # 10 ]
|
|
Senior member
Total posts: 336
Joined: Jan 28, 2011
|
Dave Morton - Jul 23, 2013: That’s a great idea, Carl, but where would the funds come from? The three funding sources for prize money generally come from:
a.) a sponsor,
b.) an entrance fee for competitors, or
c.) charging the people who come to witness the competition (just like a professional sport)
Often times, the individuals who organize these contests do so out of love for the field, and as such, foot the (not insignifigant) bill for hosting and in some cases special software to track scoring and such, and the prospect of also fronting (read: giving away) prize money on top of all other expences is not usually possible.
A simple ante for each competing bot, say something relatively small like $10, the winner could also choose to put the proceedes back in the pot to attract more competition.
I realize that there is a LOT of work that goes into these types of competitions, so having some motivated folks to run the whole thing is a given. However, there are a number of folks on this and other boards who seem to enjoy a good, clean and transparent, competition and I am sure would lend a hand if asked.
|
|
|
|
|
Posted: Jul 24, 2013 |
[ # 11 ]
|
|
Senior member
Total posts: 370
Joined: Oct 1, 2012
|
Thanks for the input guys.
@Carl Thats a possibility although I see a couple of problems.
(1) The idea here was to rotate to create a “contest” that was controlled by the community rather than by any individual or group. http://ai.r-i-software.com/BRAGGING_RIGHTS/judges.asp It takes the burden of running something every (2) two weeks off of any one individual or group, plus if anyone doesnt like the way a particular round was scored or run, then they can always put their name in as a judge and take a crack at it themselves. If you start collecting entrance fees, you have to consider how they will be collected and paid out, even paypal requires that you have a linked bank account, so your back to a central organization, which means setting up an LLC or something of that sort.
(2) Going back to the idea of increasing participation, if you look at the number of people who particiated in the earlier days of the CBC and other CBB etc…you can see that the number of contestants has dwindled. I thought that you could create an environment where people who probably did not have much of a chance at winning a more strictly run contest would be encouraged to participate. The use of categories increases the chance that you might do better, plus its free. If creates a larger field, encourages people to develop new ideas, gives the field a greater diversity. There are bots that might do well in certain categories, but still not win, and asking to someone to pony up even 20.00 a month that they know they are not going to recoup would probably result in people dropping out. (Although free didnt exactly encourage anyone to show up)
Automobile manufacturers dont spend hundreds of millions of dollars on Formula 1 teams because they make anything on purses or because they expect to sell a single Formula 1 car, they do it because it allows them to pursue R&D, show case their efforts, get their name out, and increase public awareness of the field in general. With all indications being that this technology is “the next big thing”, (like personal computers in the late seventies), it would have been nice to let the public know that there were technologies that are created by companies or persons ofher than Apple and Google and Microsoft. The more people that participate, the wider the audience. The more awareness there is that this field has been around for a while, and isnt just now being invented by the “big three”, the less chance that independent efforts (the real source of developmment..ALWAYS) wont die. And thats exactly what will happen. The problem is that Formula 1 racing is exciting, whereas unless your involved in the field chatbot contests are about as exciting as watching grass grow. Robotwars has been around for a while now and has public access because it is visually exciting. It attracts sponsors and mainstream attention. Even though you will see a personal assistant (sans physical embodiment) in your home long before you see a robot bodyguard smashing the neighbors dog for barking, chatbotcontests do not.
VLG
(I also had a version where each participant supplied an “effigy” of their bot, and each round corresponded to a bot bodypart which would then be packed with explosives and blown up on over streaming video depending on their score, or regrown like in a video game, ....maybe flame throwers…...)
|
|
|
|
|
Posted: Jul 25, 2013 |
[ # 12 ]
|
|
Senior member
Total posts: 370
Joined: Oct 1, 2012
|
Final results
Laybia 48
Johnny 33
With Johnny making a late comeback in the popular vote. Actually there was only one visitor to the polls which Im assuming was Denis. Overall I think that this contest format worked really well, it demonstrated (sort of) that categories work to keep it more representative of overall performance. The user supplied questions worked and made for a field of questions that felt less targeted, and the polling algorithm, while not having anything to work with in reality, would have been right in the pocket if actual numbers are applied.
VLG
|
|
|
|
|
Posted: Jul 25, 2013 |
[ # 13 ]
|
|
Experienced member
Total posts: 92
Joined: Apr 24, 2012
|
Before all things, I want to thank the organizers of this contest, as well as participants. I imagine it should be some work, I know they are volunteers, the small number of participants is unfortunate, but I encourage them to continue.
My bot lost when there were only two participants. The score and the judges’ comments leave no doubt: this is a defeat. Fortunately, a nice guy voted for Johnny in the poll . We must learn from our mistakes, so that’s what I did in analyzing deeper conversations.
The first observation is that the language is actually a problem for someone who is not English. yes, “son” is a bad translation. To be really fair, challenges should be in Esperanto or ithkuil. But most of the competitions are in English, and like you can see with other challenges, more than half of the participants are english speaking. So I will continue to learn this beautiful language.
My second analysis is more fondammentale: my bot did not answer any questions but than those that I submitted, and the other competitor has not responded to any of mine. Therefore there are two views radically different of what should be the artificial intelligence. I prefer logic and accuracy, while most other bots trying to be friendly and entertaining, even if the answer is wrong. For example, the question “How long is a road?” Johnny replied “I don’t know.” For me, this answer is correct because there is missing data (which road?), And I will not change it. The score of zero indicates that the judges obviously expecting another answer, even false.
I fear that this trend leads us not to have the smartest bot, but simply the best AIML bot. It turns out that the selection questions of Loebner Prize are following the same way this year. No wonder the finalists are all AIML bots (3 on 4). But for me it is an error. This kind of question does not measure intelligence, but only the ELIZA effect. In my point of view, you are trying to measure the temperature with a vernier caliper: you measure intelligence with criteria which only bots with pre-defined responses can win.
Do not think I’m a bad loser. My only goal is to improve the chatterbots and artificial intelligence. The fact that the best chatterbot is mine or not have (almost) no importance.
|
|
|
|
|
Posted: Jul 26, 2013 |
[ # 14 ]
|
|
Senior member
Total posts: 336
Joined: Jan 28, 2011
|
Denis Robert - Jul 25, 2013: The question “How long is a road?” Johnny replied “I don’t know.” For me, this answer is correct because there is missing data (which road?), And I will not change it. The score of zero indicates that the judges obviously expecting another answer, even false.
For your bot to get that kind of question “right”, it has to be able to respond with something that addresses the question- like “which specific road are you talking about?”. Responding with “I do not know” is a ‘0’ score reply because it could be used for almost ANY input at all.
|
|
|
|
|
Posted: Jul 26, 2013 |
[ # 15 ]
|
|
Senior member
Total posts: 336
Joined: Jan 28, 2011
|
btw, 6 of the questions I submitted were derived directly from Google auto complete by entering “what are” What can” “How does”, etc. The other 4, marked with ‘*’ below, test a bots ability to do:
- numeric comparisons,
- deal with “and” other than a simple string split (answering everything before and after ‘and’ as two separate sentences),
- deal with colors (most aiml bots will reply to “* orange” with “Orange.”, and
- provide word definitions (do they have a dictionary).
what are your weaknesses?
what can you buy with bitcoins?
how does the voice work?
how can we stop global warming?
how far is a click?
when will the world end?
*Is 5 plus 5 greater than 10?
*If I would laugh and sing would you too?
*what color is an unripened orange?
*can you define the word naive?
|
|
|
|