Thunder
I think the topic of the questions is utterly irrelevant. I don’t care if a bot only knows about ‘adult topics’, or only about electronics, greek methodolgy or just, hey, keep me company with some casual chat.
But if I read you correctly, I agree, unless your project employs a half a million people entering data from every conceivable human endevour, it is a bit pointless to test the bot on too wide and range of topics.
I’d rather have a bot that has powerful language skills , and can learn and acquire more knowledge, rich knowledge, via natural language interaction, but starts with not knowing anything, or very little knowledge, than a bot that has quick answers to millions of things, but doesn’t understand any of them enough to be further questioned on its response.
In other words it is much more valuable to initially not know much, but know how to learn, than to ‘fake’ knowing many things, understand nothing really, and unable to learn via strong language skills.
Thus I tend to agree - I’m not going to worry, initially, about my bot knowing the purpose for a hammer. I’m going to worry more about it being able to learn, using core language skills, that a hammer is used for pounding in nails.
If they want to judge a bot by such a variety of things, then forget about the contest—we have a winner, WATSON. Watson is a huge oracle of knowledge. BUT, its not a chatbot. You make a request question, and it gives you closest match. There is no variable number of levels of indirection of language based reasoning and skill involved. It can’t be given natural language statement S1, then deduce S2, and S3 then use S3 to respond to your next question. Instead , it is just a “input - output” pair.
If I was entering the contest, here is how I would judge the questions…
My name is Bill. What is your name?
Fairly good input. Your bot has to know how to segment this to realize it is a statement followed by a question. Needs to be able to store the user’s name and provide its own name.
How many letters are there in the name Bill?
Cute - I guess a bot could know this, but very low on importance scale. Why on Earth would someone want to ask that? Shows some language skill yes, but rather stupid - there are much, much more interesting and difficult tests that could replace this.
How many letters are there in my name?
Not bad, a bit stupid still though. However, does show the functionality of the bot realizing it has to first dereference “my name”, then go back and rewrite the question and resubmit to itself. So not bad, but the “how many letters” is stupid- who would ask that? A bot could know that, but I suggest that be super low priority on anyone’s bot project
Which is larger, an apple or a watermelon?
Meh…. pretty pointless and doesn’t test any advanced language skill. Perhaps just for fun, or if the bot is to be used by perhaps a child, they may think it’s fun.
How much is 3 + 2?
How much is three plus two?
Give me a break. build a mathematical expression parser to do some pre-processing, plug in the values, then continue executing your more important NLP… yeah sure, waste of time though. should be super-low priority on any bot project.
What is my name?
Simple effective memory recall test. Good test.
If John is taller than Mary, who is the shorter?
Should read “who is the shortest”....Reasonable language test and relationship determination. Good test.
If it were 3:15 AM now, what time would it be in 60 minutes?
Very good test. This is getting advanced. Ability to realize the input is and if-then, determine on its own the validity. Very good.
My friend John likes to fish for trout. What does John like to fish for?
Very good test. Basic language skill. Notice we must realize the verb conjugate ‘like’ versus ‘likes’
What number comes after seventeen?
stupid
What is the name of my friend who fishes for trout?
Very good. First has to realize it must resolve “who fishes for trout” to an actual person name, plug it in to rewrite the sentence, and resubmit to itself. Steve’s bot got this, but as I said it will take a bit more work for it to NOT be followed by stuff like “My friend bob does NOT like to fish for trout” It should know that it cannot say bob. Also, even better would be
My friend Sam loves to go to ACDC concerts.
...
...
Who likes to attend ACDC concerts?
(also try with “does” and “doesn’t” to fool it).
This would still involve the language skills as the original, but also know when to consider to words as synonyms (‘loves’ vs ‘likes’, and ‘attend’ versus ‘go to’). I’ve done a lot of work and made some great progress with this type of thing with Grace.
What whould I use to put a nail into a wall?
Uh… not a bad idea I guess. Like you say Thunder, it is a weird thing to ask. On the other hand, the bot I guess should start of with some basic knowledge of the world. And a hammer is a pretty common basic tool. Not very important though.
What is the 3rd letter in the alphabet?
dumb.
What time is it now?
Sure.
What should be done is - figure out what type of things you are testing about the bot—is it world knowledge, is it complex lanague understanding skill, etc. Then, for each of those categories come up with say 1-3 questions that target that functionality and weight them. Knowing what the 3rd letter of the alphabet should be low priority, a hammer usage, perhaps higher, but still pretty low compared to being able to first resolve “who likes to fish for trout” and be able to evaluate the input in a multi-stage fashion.
By and large, I agree with Thunder, “how many letters in the word ‘car’ ” are pointless, because you could go to unbelievable levels