Hi there, Simon, and welcome to chatbots.org!
Making a chatbot completely from scratch is a pretty ambitious undertaking, and firmly falls under the category of “Rome wasn’t built in a day”, meaning that it’s going to be a long term project. Trust me on this. I’ve either written, or have helped write, a couple of chatbot engines already.
The chatbot project that I’m currently working on is called Program O, and it has been around for a while now. It was created by Liz Perreau, and I joined the development team almost 2 years ago. Program O is a web-based chatbot engine (also called an AIML interpreter) that’s written in PHP, and uses a MySQL database to store the chatbot’s responses. I mentioned this to give you a bit of background on me, so that you know I’m not just spouting stuff at random.
Now, if you want to build your own platform for a chatbot, you first need to decide a few things:
1.) How much do you want to “re-invent the wheel”?
Now, there’s nothing wrong with the idea of re-inventing the wheel, but some people look at it as a waste of time, so I thought I would ask. By starting a new chatbot/chatbot platform from scratch, you’ll be taking significantly more time in creating your project, but you’ll have far greater control over your chatbot’s behavior, so there’s a pretty big trade-off involved.
2.) How “smart”, or how “human” do you want your bot to be?
The answer to this question will directly impact the complexity of the project. If you don’t mind that your chatbot will respond to the same input in exactly the same way, every single time, then your script will not need to be as complex as it would if you wanted your chatbot to respond in a more human-like manner. Other chatbot attributes that affect it’s perceived intelligence or Humanity will have similar impact on the complexity of the project.
3.) How do you want to store your chatbot’s responses?
This is one of the questions you’ve asked, so I want to address this as best I can, and give you some pointers, based on my experiences.
The choice of using either a “flat file DB”, or a SQL-based one is complex, but it can be “boiled down” to a couple of factors: storage space and performance. Given the size of modern storage devices, size is not really an issue anymore, though the type and the structure of storage media will have an impact on performance. There are a LOT of people who swear by the performance boost that a flat file DB has over a SQL-based one, but virutally all of those people mentioned using a Linux OS of some flavor or another. It’s been my experience, on the other hand, that Windows systems usually works the other way, with SQL beating out flat file storage by a somewhat slim margin. My personal preference is to use MySQL for the DB, because administration of the contents of the DB is, by far, simpler and more efficient than with flat files. Either way, both systems can do the job for you, so it’s simply a matter of preference, when all is said and done.
4.) How do you want your chatbot to learn?
Here, you have a lot of options, but I’ll cover the three main ones.
a.) Botmaster training - where the bot only learns what you teach it.
b.) “Trusted Circle” training - where only a few individuals are allowed to train the chatbot
c.) “public”, or “self” training - where anyone can teach the chatbot how to respond
Now, of the three methods, above, there are a couple that have sub-options: conversational training, where the chatbot “learns” through conversation, or “scripted” training, where the chatbot’s responses are entered directly into it’s DB. The choices between these different methods and sub-methods will also directly impact the complexity of the chatbot’s underlying script. Scripted Botmaster training is simpler to implement by a wide margin, but it entails a lot more work on your part, once the chatbot engine itself is complete. In contrast, public training, where anyone who talks to the chatbot can teach it what to say, is far more difficult to implement. It also has the further disadvantage that little 12 year old, hormonally burdened Billy gets to teach your chatbot all about the wonders of cyber-sex, foul language, and bad grammar. ~NOT~ the best learning environment, I think.
so you can see that you have some choices to make, and a bit of work ahead of you. But that’s ok, because nothing gives someone a bigger sense of accomplishment and pride than to be able to say, “I made this!” - I’m always here, somewhere, and I’m more than happy to help out where I can. If you have specific questions or problems, or even if you want to show off what you’ve got so far, feel free to post here in the forums, and if there’s something that I (or the other folks here) can help with, I’m sure that every effort will be made to assist.
Best of luck with your project, and I hope to hear from you about it soon!