Hi, Alan.
I finally got a chance to download the AIML file, and tried to upload it to a chatbot that I created just to test your smoking quiz chatbot, but I got the following errors:
String could not be parsed as XML at line 165
There was a problem adding file xmasquiz.aiml to the database. Please refer to the message below to correct the problem and try again.
String could not be parsed as XML
Fatal Error 9: Input is not proper UTF-8, indicate encoding ! Bytes: 0x96 0x20 0x74 0x68 on line 990
Fatal Error 4: Start tag expected, ‘<’ not found on line 2
The offending line is here, with the source of the error hilighted in red:
OK. <srai>CORRECTANSWER2</srai> There are 2 Christmas Islands – the one in the Pacific Ocean (Kiritimati) was discovered by Captain James Cook in 1777, the one in the Indian Ocean by Captain William Mynors in 1643. Both on Christmas Day.
It seems odd that a simple hyphen should “break” the AIML code, but not when you consider that that character isn’t a ‘-’ (which has an ASCII value of 45 - 0x2D hex). but a different character completely. Let me see if I can explain a bit better.
Program O was designed to handle not just English, but other languages, as well, such as Thai, Russian and Chinese, too; and as such, uses UTF8 for it’s character set, along with multi-byte string (mbstring) processing. That particular character (’–’) has a ‘character value’ of 150 (0x96 in hexadecimal, the first reported ‘illegal’ character in the line of AIML code, above), which has no corresponding character in UTF8 (there are no “legal” characters between 0x7E and 0xA0 in UTF8), so Program O’s XML validator reported an error over it.
The “fix” is simple, in that the offending character just needs to be replaced with a simple dash character (UTF8 value 0x2D, or ASCII 45 - the key just to the right of the 0 key on an English US keyboard) to correct the problem. The characters look identical to us humans, but to computers they’re completely different.
All of this probably sounds like complete gibberish, not to mention being a bit overly-technical, and potentially confusing. It’s not my intention to sound condesending, or to “talk down” to anyone here. To be completely honest, I don’t fully comprehend some of the nuances of character sets and such, and I spent well over a year trying to work all this stuff out in order to get Program O to just support Spanish! Luckily, getting the project to work in Spanish was the hard part. From there, getting it to work with Thai and Chinese was easy.
Ok, off topic a bit; sorry. Long story short, replacing that one character in the file fixed the validation problem and allowed me to upload the file to the new chatbot, and I’m testing it now. I’ll report back later today with my results.