Hi everybody,
it’s been a long time since there was a challenge to evaluate our chatbots, so I have decided to organize an online Turing test like it was suggested in this thread : https://www.chatbots.org/ai_zone/viewthread/3704/
I don’t want to replace the official Challenges and I wish the Loebner Prize to take place this year. I just want to organize a fun and unpretentious alternative.
I will organize this challenge more or less with the protocol I proposed (see message #2 of the above-mentioned thread) : Each user (botmasters or everybody that want) will chat with either another user or a chatbot. He will have to decide, as quickly as possible, if he chats with an human or with a chatbot.
Like it is an automatic process, this new challenge can be launch regularly. To begin, I propose the first sunday of months 3, 6, 9 and 12 (march, june, september and december). So the first challenge would be the 7 march 2021. It is a little short, but the first challenge will surely serve as test and debugging. And depending of participation, your wishes, this can change.
There will be a possible round every half an hour, during 25 minutes, and this on 24 hours from 00:00 to 24:00 GMT. Of course, if there is no human to talk with, some rounds will not occurs. So there will likely not have 48 rounds per bot. I would like 3 or 4 rounds per bot, to stay Loebner Prize compliant.
The communication protocol will be the same than the Loebner Prize 2017 and 2018 (https://github.com/jhudsy/LoebnerPrizeProtocol and discussions about it : https://www.chatbots.org/ai_zone/viewthread/2861/ ). The only thing I changed is the version of socket.io which was too old (1.4.5), so I updated it to the latest version (3.1.0). Unfortunately, they are not compatible, but there are really very few changes to adapt the programs to this version. I know that it is not the best protocol and some of you will disagree this choice, but it was the only way to communicate over internet without have to set a new protocol. All in all, in 2017 and 2018 everyone successfully implemented this protocol.
I have set up a website where you can now register and test your chatbot here : http://vixia.fr/turing_test/index.php
When the concept of online test was proposed, there was some objections. I will try to answer some of it:
Unfortunately with an online contest I have no idea of a way of actually making it fair. The very idea of making it over the internet means it’s possible that the responses are not actually coming from the robot.
The one problem with online contests is cheating, which I consider a real possibility if there were something at stake
- Each botmaster must certify on his honor that their chatbot is really a chatbot, without any human intervention.
- Each challenge will during 24 hours. It seems unlikely that someone will stay behind their computer for 24 hours to cheat, because there is nothing to be gained. Chatbots that are not connected 24 hours a day will be disqualified.
However, cheating is still possible (for me first), so this challenge is not an official challenge. It should be seen as a game or as a training.
Can we at least make it so the bot doesn’t have to pretend to be human please?
But first of all we need to eliminate the fake emulation of the machine that tries to appear human.
Nothing is mandatory on this point. But obviously, a chatbot which say that it is a chatbot will be quickly unmasked.
Bot main streams such as Alexa, Siri or Cortana could also be involved, as they are always available, in order to have a general overview of the performance of the various systems.
Sorry, I have decided that participants can register only theirs own bots, or the bots with a permission of the author. In the past some people had make chating two chatbots without permission, for example, and the authors was not very happy with that.
Other questions come to mind :
Since botmasters has also the role of judge, and the chatbot are chosen randomly, what happens if a botmaster chat with his own bot?
I know that every botmaster will recognize his bot from the first seconds. Then he’ll be tempted to seem to believe it’s a human to give it a good rating. As long as every botmaster and every bot is in the same situation, the odds remain equal.
How many rounds will there be?
That will depend of the number of users that will play the role of judge, and I hope they will be numerous. In the rules, I say that each botmaster should have at least four conversations. Considering the random connection human / chatbot, this will make at least two rounds for each chatbot. But one would be able to have four rounds whilst another will have only three, for example. Obviously, the notation is function of the number of rounds to be equitable.
How will chatbots be rated?
First, the chatbot will be noted on the times they fooled a judge (proportionnaly of the number of conversations they had, of course). In case of a tie (and probably none of chatbot will completely fool a judge) the average of the time while the judge was not able to decide makes the notation.
I hope have been clear, and that there will be a lot of participants. If you have some questions, don’t hesitate to ask me.
I also encourage those who do not have a chatbot to come and play the role of judge on the day of the challenge. It’s anonymous and it can be fun.
I know that my english is not very good, so if you see typos or not understanding things on my site or in this message, say it me and I will correct it.
All suggestions are welcome, if everybody is agree, I can change the rules, date of challenge, protocol, or anything you want. My only aim is that everybody have fun with this challenge. I do this completely on a voluntary basis, so don’t ask me for too complicated things.
Thanks and best regards