|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
With Wendell Cowart’s closing of the Chatterbox Challenge, we’ve lost one of the biggest and best chatbot contests that botmasters have been able to use as a means of testing their chatbots, to find ways to improve, or to gain “bragging rights”, or simply to have a bit of friendly fun. This is a very sad thing, but not the end of the world. I intend to organize a new chatbot contest, and I want your aid, so that this new chatbot competition will be every bit as good, and every bit as fun, as the CBC has been.
So, first off, I’m putting out a “call to arms”, so to speak. I want to create an informal committee, made up of AI and chatbot enthusiasts and experts, to codify a set of goals and guidelines, and to discuss what the shape of a new chatbot contest should be. I’m looking for people to volunteer to spend at least a couple of hours per month, mostly in the form of participating in a “Forum Round Table” discussion, with some possible email correspondence, as well. If this is something you’re willing to help with (and you haven’t already emailed me), then please let me know, either here, or by email.
I want to find out from everyone what the want to see in a chatbot contest. Tell me what you think worked about the CBC, and what didn’t. I also want to hear suggestions, no matter how off the wall or wacky, for what you would like to see implemented. Let’s all work toward making a great chatbot competition that’s challenging, interesting, and most of all, fun!
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 1 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Ok, I’ve got an idea here that I want to share.
One of the goals that I’d like to achieve is to make this new contest more engaging and interesting, not only to those who enter, but also to the public at large. To that end, I think that a more interactive approach may be in order. My specific idea, is a “Battle of the bots”, where a visitor can choose a pair of bots from the list of entrants, and pit them against each other for something like 5 rounds, asking the same question to each of the two bots at the same time, and scoring the results of the volleys on a scale of 1 to 10. The page would have a sort of score sheet that shows only wins, losses and the number of battles, but not the scores, and the page would try to encourage the visitor to use the bots with the lowest number of battles, in an effort to keep all of the bots “in play”. There could also be a couple of other options for the visitor, where either the least battled bots, or a completely random selection of bots are chosen. The questions could either be a list of previously used questions from the CBC (with Wendell’s permission, of course), or from a list of visitor submitted questions. I also think that a small side-contest could be held in advance, where visitors could submit questions, and the “top ten” questions submitted would earn those submitting them props and mention on the contest website.
Now I understand that the “battle of the bots” portion of the contest is just as subjective, if not more so, than the popularity voting has been in the past. In fact, I’dd like to see the popularity voting done away with, to be honest. As has been pointed out in the past, there’s no sure-fire way to prevent cheating, either from over enthusiastic fans, or from the botmasters themselves, so my thinking is “what’s the point?” There are far better ways to make this contest more engaging than to just present a list of entrants, and bid the visitor choose one.
Ok, I’ve babbled enough. I’d like some feedback please, folks.
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 2 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 3 ]
|
|
Administrator
Total posts: 2048
Joined: Jun 25, 2010
|
Funnily enough Dave, last year I was going to do something similar to your idea and had already started coding on a site:
http://www.chatbotbattles.com/
The idea was like the soccer world cup where we have say, 8 groups of 4 bots who all play each other and the top 2 from each group face a knockout competition. I was thinking of 5 questions per match with a point (goal) awarded for the best answer. Like soccer, whoever scores the most goals gets 3 points, a tie scores each player 1 point and a loss scores 0 points.
After each bot has played each other in their group, the points are tallied up and the top 2 from each group face a knockout competition with 10 questions where the winner progresses.
I would be willing to score the bots which would mean Mitsuku couldn’t enter but this was the same for Wendell not being able to enter his Talk-Bot in the CBC. It is something I was planning to start after the Loebner competition and especially now seeing the CBC is no more.
The downside I can see about your idea of getting the public to vote for the result of each battle is that some will vote for their favourite bot regardless of outcome. For example, Mitsuku has over 3,300 Facebook fans and nearly half of them voted for Mitsuku in the CBC. If they are battling Mitsuku against any other bot, they will vote for her no matter how she replies.
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 4 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
You have a good point Steve. Perhaps something like that could be solved by a blind test? Don’t show the names of the bots competing against each other until after the score has been given? not really like soccer anymore though
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 5 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
I like your idea, Dave, as a way of encouraging public interest. But as far as a ‘formal’ chatbot competition goes, there are two areas I’m most interested in:
1) Ability to handle conversation flow
2) Ability to recognize and execute commands
The first category deals more with casual conversation—how we interact with others. This area would test the bot’s ability to handle NL input, including remembering facts mentioned in conversation and perhaps using logic to make conclusions about those facts. This is an area where, for example, bots like Victor’s and Steve’s would probably shine. And this category is important in general, as NL capability is the distinguishing feature of a chatbot. The thing that separates it from any other search engine/calendar/calculator.
The second category deals with the bot’s ability to be a search engine/calendar/calculator. And to be a better interface to those tools than what’s currently available. One of the most promising functions of chatbots is as a personal assistant. So testing bots on their ability to manage schedules, perform searches for information (either from a database or the internet), solve math problems, etc. is of practical importance. The key focus would be on 1) what tools are at the bot’s disposal and 2) how efficiently it makes use of those tools given NL input. This is an area where, for example, My Marie might perform well.
There would be some work involved fleshing out ideas to fairly test these two areas of chatbot ability. But I think each deserves its own competition. Perhaps with an overall “best in show” bot as well—the one that excels well in both categories.
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 6 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
@Jan: Glad you like it.
@Steve: Great minds think alike!
I had the “Mitsuku Effect” in mind when I mentioned the subjectivity of the idea I had. Given that Mitsuku is one of the most popular and well known chatbots out there (possibly even surpassing ALICE in popularity), this is more or less to be expected, and why I think that such a portion of this new competition, if implemented the way I had originally envisioned, should not count toward the results of the actual competition. But then I got to thinking about the popularity of all of these talent competition shows (e.g. Dancing With the Stars, American Idol, etc. - I’m sure that other countries have similar shows), and I wonder if some similar type of format could be used here. It’s something to consider, I think. More on this later.
@Jan (again): I’m not certain how difficult the logistics of a “blind competition” would be, but it may well be worth looking into. This is more or less a technical issue, so given the skillsets of the folks in this community, it shouldn’t be TOO difficult to work out, if we go that way.
@CR: I agree with you about judging a chatbot’s ability to maintain smooth conversation flow; however, I’m not certain how much emphasis should be placed (if any) on a bot’s ability to execute commands. This is, after a chatbot competition, and while I certainly don’t want to exclude virtual assistants such as Denise, or Laura’s My Marie from any aspect of the competition, I also don’t want to necessarily hand them an unfair advantage over other entrants who are simply virtual companions who don’t have any functional role other than to “talk” to the visitor.
As to the notion of withholding entry in the competition due to conflict of interest, I think this is going to be a “prickly” issue, since nearly everyone here has a chatbot of one sort or another, and I want to keep the list of excluded bots as low as humanly possible. Ideally, I’d like to see every bot competing, but that would be impossible. For everyone that participates in this discussion, I don’t think that your bot is automatically disqualified from competing; in fact, since this stage of the process can only improve the quality of this new contest, the more botmasters that we include in the process, the better. So don’t feel as if you can’t join in the conversation just because you have a bot you want to enter. The actual management of the competition will fall to only a few (or even fewer) individuals.
As to the judging, I want to suggest a small panel of judges (somewhere between 3 and five) with some experience in the field of AI and/or chatbots, or someone in the general field of science who is universally well known. I have a few ideas for likely candidates, but I’d like some input from all of you as to the size and makeup of the judging staff. Who would you like to see as a judge in the competition? Personally, I’d like to try to persuade Dr. Peter Norvig to act as a judge, or maybe even Dr. Michio Kaku, or Professor Brian Cox, but that would be like asking Queen Elisabeth to preside over your dinner party.
So again, who would you like to see judging this contest? Erwin, would you be willing to be a judge here?
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 7 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
I think perhaps the judges shouldn’t enter their own bot, but if organizers don’t judge and make up all the questions (so don’t have an advantage over the rest), I don’t see a problem why they couldn’t enter.
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 8 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Dave Morton - Mar 23, 2012: I’m not certain how much emphasis should be placed (if any) on a bot’s ability to execute commands. [...] I also don’t want to necessarily hand them an unfair advantage over other entrants who are simply virtual companions who don’t have any functional role other than to “talk” to the visitor.
That’s why I think each of the two aspects of chatbot tech would have to be evaluated separately. But if the emphasis is on chatbots, rather than AI per se, practical functionality should be encouraged.
Dave Morton - Mar 23, 2012: As to the judging, I want to suggest a small panel of judges (somewhere between 3 and five) with some experience in the field of AI and/or chatbots, or someone in the general field of science who is universally well known. I have a few ideas for likely candidates, but I’d like some input from all of you as to the size and makeup of the judging staff. Who would you like to see as a judge in the competition? Personally, I’d like to try to persuade Dr. Peter Norvig to act as a judge, or maybe even Dr. Michio Kaku, or Professor Brian Cox, but that would be like asking Queen Elisabeth to preside over your dinner party.
So again, who would you like to see judging this contest? Erwin, would you be willing to be a judge here?
I think the whole “celebrity judge” thing might work for a one-off contest, but would be hard to maintain if you made this an annual event. In general I think high-profile people in science would want some solid idea ahead of time about the nature of the contest, who’s involved, etc. When your professional reputation is involved, the higher up the totem pole you are, the more careful you tend to be. (I have met several Nobel laureates/well-known physicists who would attest to this!)
What I would suggest is perhaps doing a public round of judging transcripts to narrow down the bot selection (and encourage active participation by a general audience), followed by more rigorous testing by judges in the fields of computer science or perhaps linguistics. We’d probably have better luck with judges who are involved in tech more so than science. I think AI researchers tend to try to disassociate themselves from Turing-style contests.
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 9 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
I completely agree, Jan; which is why I’m reluctant to ask a botmaster to be a judge, which would disqualify their bot.
Also, it’s funny you should bring up the questions.
I think it would be interesting to ask each potential contestant to submit with their entry a list of three questions that they would like to see their bot asked, with the judging staff selecting from all of the submitted questions anywhere from one to three of them that would be used in the competition. since the entry process will be done from the website rather than email, the submitted questions can be added to a database anonymously, so that the judges won’t know who submitted which question. Any question that mentions or indicates in some fashion that it comes from a certain chatbot, then that question will get tossed out. questions that are of an inappropriate nature would also be discarded.
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 10 ]
|
|
Administrator
Total posts: 2048
Joined: Jun 25, 2010
|
The trouble with asking the botmasters to submit their own questions, is that they could hard code the answers into their bots:
Human: A man and a woman were stuck down a well but which one got out first?
Bot: The woman because she had a “ladder” in her tights!
No other bot is likely to get that correct.
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 11 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Good points, CR!
As to the “Celebrity Judge” thing, I know that my choices are unrealistic, but I was hoping that they would serve as an example, and indicate that no idea is unworthy of consideration. But you have to admit that having Michio Kaku as a judge would be very cool!
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 12 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Steve Worswick - Mar 23, 2012: The trouble with asking the botmasters to submit their own questions, is that they could hard code the answers into their bots:
Human: A man and a woman were stuck down a well but which one got out first?
Bot: The woman because she had a “ladder” in her tights!
No other bot is likely to get that correct.
The thing is, Steve, that with the judging staff selecting the question/questions to use from a common, anonymous pool is that very specific questions of that nature would likely not get selected for use, due to excessive specificity, so I don’t see this as a really big issue.
There would have to be some general criteria regarding the selection of questions to use, such as the level of specificity of the question. If the question is either too vague, or too specific, then it’s tossed out. The same would apply to the list of publicly submitted questions, as well, with the additional restriction that botmasters can’t publicly submit questions if their bot is entered.
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 13 ]
|
|
Administrator
Total posts: 2048
Joined: Jun 25, 2010
|
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 14 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
Perhaps ‘the questions from the botmasters’ can be seperated a little into ‘topics’ and ‘grammar structures’. With some bots being able to define both, others only topics or grammar. Say, my bot can talk about chocolate, ping-pong and dodo’s and supports sentence structures like ‘what is pronoun xxx’, ‘how adj is subj’,....
|
|
|
|
|
Posted: Mar 23, 2012 |
[ # 15 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Dave, yes it would be really cool!
Steve, it took me forever to get your joke. You’d think with tights coming back into fashion, I’d have heard a tear described as a “ladder”, but nope! I just pictured some unfortunate woman with very bulky tights…
|
|
|
|