Hello everyone! I’m obviously brand new to the forum, but I’m very glad to have found a site that is solely focused on my current area of interest.
A bit about me:
I’m a Web Programmer with a background in Computer Science.—Web just happens to be where the jobs were at.
I’m fluent in C and Java, as well as the PHP/MySQL/HTML/JavaScript suite. I’ve dabbled in Python and Lisp, and have a pretty good handle on C++.
Okay now for the exciting bit…
Project River:
———————-
Project River is my own pet project of creating a Smart House. I’m starting small though, and focusing on writing a Natural Language companion in C++.
While the end goal is to emulate something along the lines of “S.A.R.A.H.” from Eureka or Jane from Ender’s Game, I’m going to take it in baby steps.
River uses the Microsoft Speech API 5.3 along with voices from NeoSpeech to speak to the user.
She listens to the user currently by using SAPI5 as well, however I’m in the process of converting her to use Dragon Naturally Speaking for her speech recognition.
My primary focus for River is to make sure that she is capable of at least minimal learning. I want her to be able to slowly learn from every conversation she has with the user, as well as use WordNet, and other information sources to expand her knowledge.
Unlike the majority of chatbots that I read about, River will work solely in a single location and will have a vast amount of computing power available to her. She will only cater to one or two users at a time, which will allow the computing power to be more focused.
What I need is advice on where to start. I’ve read extensively today about ...
1. Pattern and Topic Matching - such as AIML and the core A.L.I.C.E. bot
2. Statistical Matching - Namely: http://courses.ischool.berkeley.edu/i256/f06/projects/bonniejc.pdf
3. First Word approach—first word is often an indicator for a good response.
4. Lowest Frequency Approach—words with lowest frequency are most significant and should help dictate response (related to statistical)
I’m very intrigued as to the science and math behind Victor’s CLUES V3 implementation, and would love more information about the methods he’s using.
Based on the background of what I’ve learned so far, and the fact I’m using C++—does anyone have any tips for me as to where I should start?
As a parting thought, I’ll leave you with one of the core ideas I hope to include in my engine. I intend to program in key phrases such as “that didn’t make any sense”, which would in turn trigger the bot to adjust it’s algorithm regarding the reply it just issued. In the simplest form it would mark that reply to never again be given for a similar question, but ideally it would do some work to figure out why it was a bad response.