| 
  
   |  |  |  
 | 
        
		 
   Member Total posts: 10 Joined: Apr 19, 2011 | I found the following link which provided some useful info on database programming: http://www.chatbots.org/ai_zone/viewthread/158/ My question is, how much are chatbot databases impacted by the amount of data input. For example, will you eventually get such a slow response due to an overload of data that the program is no longer functional? I ask this as I am interested in alternate methods of populating chatbot databases, but there may be no point if current solutions are limited by the amount of data that can be functionally used. |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: Apr 29, 2011 | [ # 1 ] |  |  
 | 
        
		 
  
  
  
   Administrator Total posts: 3111 Joined: Jun 14, 2010 | Based only on my own personal experience, I’m not a fan of “flat file” database structures unless I’m dealing with very small amounts of data (less than a kilobyte in size). The problem with using data files to store/retrieve information is that, the larger the file, or the more files you have to access, the more performance suffers. With SQL style database structures (MySQL, MSSQL, PostgreSQL, etc.), the structure is such that the amount of data stored has FAR less impact on performance than with flat files. the biggest performance consideration with SQL databases is generally in connecting to the DB, and in network traffic to/from the DB server. I found this post over at Stack Overflow, which discusses the pros and cons of SQL versus a flat file DB structure for dealing with log files, and while there are a LOT of differences between handling log files, and storing chat bot data, many of the arguments are still relevant. Of course, I’m a bit biased, but then again, most folks are when they state a preference of one action or idea over another.  |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: Apr 29, 2011 | [ # 2 ] |  |  
 | 
        
		 
  
  
   Senior member Total posts: 697 Joined: Aug 5, 2010 | There are of course still other possibilities besides flat files and sql databases. The biggest dbs currently in existence aren’t sql based but use attrib-value pairs, like googles. Mine is an indexed-blob.  |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: Apr 29, 2011 | [ # 3 ] |  |  
 | 
        
		 
  
  
   Senior member Total posts: 697 Joined: Aug 5, 2010 | My question is, how much are chatbot databases impacted by the amount of data input. For example, will you eventually get such a slow response due to an overload of data that the program is no longer functional?
 In short: yes, if you don’t design your db properly. But, if you do it correctly, a regular db should be able to carry all you need, just never the entire world knowledge. |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: Apr 29, 2011 | [ # 4 ] |  |  
 | 
        
		 
  
  
  
   Guru Total posts: 1297 Joined: Nov 3, 2009 | Darian Wilson asked, “My question is, how much are chatbot databases impacted by the amount of data input. For example, will you eventually get such a slow response due to an overload of data that the program is no longer functional?” Good question Darian!  Yes.  Eventually, meaning possibly and after a few years of being live on the Internet.  My webhost sent me a warning once for one of my forumbots after it was running for a few years.  On the other hand, you may never have a problem at all.  You may be on a database that does not have a lot of users, and may easily handle your chatbot popularity.  And this is the case, most of the time.  Problems of this type are seldom, but are possible. |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: Apr 30, 2011 | [ # 5 ] |  |  
 | 
        
		 
   Member Total posts: 10 Joined: Apr 19, 2011 | Jan Bogaerts - Apr 29, 2011: There are of course still other possibilities besides flat files and sql databases. The biggest dbs currently in existence aren’t sql based but use attrib-value pairs, like googles. Mine is an indexed-blob.
 @Jan, you read my mind. It seems that every chatbot I have ever used responds at least somewhat slower than google instant, and some have been much slower. I didn’t know if this was due to the actual database in use, in factors related to connecting to the DB, and/or in network traffic to/from the DB server as Dave Morton mentioned. I am guessing it must be a combination of all three, and it would be interesting to know if any comparative studies have been done to show which configuration might provide an optimal solution for large scale chatbot database needs. I would also be very interested in hearing any ideas or examples about some of the various and more novel methods of populating chatbot databases. For example, I like Pandorabots dialogue conversion process, but it also seems there might be ways to improve that concept? Might this even present itself as a piece of low hanging fruit in the never-ending hunt for a better chatbot?                                       |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: Apr 30, 2011 | [ # 6 ] |  |  
 | 
        
		 
  
  
   Senior member Total posts: 153 Joined: Jan 4, 2010 | I’m a bit confused.  Database is such a generic term.  Google uses a database, yet it scales.  The AIML algorithm, if done properly, will scale.  You can even put it into a SQL database and scale that.  Different implementations perform differently.  For example, ChatScript appears to be able to scale, although I haven’t looked at the code yet.  Jabberwacky compiles its corpus into something that seems to scale well. When you talk about a large bot database, what do you mean?  Are we talking about the logs people create in conversations?  Are we talking about scripts the bot plays back?  Are we talking about a vast store of general knowledge which the bot uses to “reason”? Are we talking about just learning certain details during the conversation as though that data is used the same as the bot’s scripts? That last one is tricky because it might be easy to record data, but to learn a fact may affect many levels of information, take many items to encode in terms of a bot’s dynamic programming, where the impacts aren’t easily understood.  For example, “My brother is Tom”.  The bot learns that I have a brother; I have parents; someone in my family is named Tom; Tom is a person; Tom’s age probably is relatively close to mine; etc.  Some of this may be deduced or inferred from the bot’s model of the world. On the other hand, if “Tom plays WOW” and the bot doesn’t understand WOW is a game, then it can only save the statement and hasn’t learned anything.  It would need to explore what WOW is to eliminate things like musical instruments, etc.  Then the additional learning by exploration would have to refer back to “Tom plays WOW”. Isn’t this even more stuff to add to the database?  Therefore just saving what is inputted isn’t enough to “learn” a fact.  Its like that aha moment when the light bulb turns on.  Suddenly a lot of things connect, that is, many things suddenly grow the database. Unless, of course, the connections are deferred processing which by definition slows down the computation when demanded. There is more to consider and reason about that is not indexed or “optimized”.  We would say “digested” - we have to think about it a bit. I am guessing systems like Jabberwacky have to compile the growing corpus again which is more like taking time to “sleep on it”. |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: Apr 30, 2011 | [ # 7 ] |  |  
 | 
        
		 
  
  
  
   Guru Total posts: 1081 Joined: Dec 17, 2010 | I tend to agree with Gary. Databases can scale, but performance is also based on the design goals of your system. I would submit Skynet-AI as an example of how fast a bot could respond.
 Are you operating local, networked, web or cloud? What devices do you want to support (cell phone, web TV, game system, PC) and will the AI (versus the UI) run locally or remotely? There are trade-offs in real-time learning in conversation versus back-ground learning. How much do you trust what the user is saying? There are also trade-offs in how much data you want to have locally versus network accessible. These impact speed and response time. Populating a database (or generating responses) may be a separate activity from how a bot “learns” and how “understanding” is generated. |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: Apr 30, 2011 | [ # 8 ] |  |  
 | 
        
		 
  
  
  
   Guru Total posts: 1297 Joined: Nov 3, 2009 | Darian,  This turned out to be an interesting topic, with some good responses on this thread.  What I do as a solution, is after a few years of valuable input from online users.. I just do an SQL dump for a backup and then restore a working copy of that web database on my localhost linux intranet.   After that I can do some database administration on the web database without having to worry about losing anything because I have it running perfectly on my local machine.   The other option is to just shutdown the chatbot and build another one in its place.  This is a lot more common, and also prevents the chatbot from ever getting overloaded.  I am planning on doing this when the new Program O version 2.0 comes out in May. |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: May 1, 2011 | [ # 9 ] |  |  
 | 
        
		 
   Member Total posts: 10 Joined: Apr 19, 2011 | Thanks all! Merlin’s comment that “Populating a database (or generating responses) may be a separate activity from how a bot “learns” and how “understanding” is generated” was very thought provoking. Seems like there are a lot of different ways to crack this egg, which always makes it more fun. Per Gary’s question, I am currently most intrigued by a solution in which “a vast store of general knowledge which the bot uses to “reason.”   If the ultimate chatbot goal is to supply users with “just the right information,” instead of merely providing a list of hits like information retrieval systems (e.g., search engines) do, then maybe one intern solution could be to combine a chatbot on top of a page followed by a list of search results below. That way, if the user was not satisfied with the chatbot response, then they could follow up by clicking on the best search result, meaning the chatbot could potentially “learn” based on the links that are being clicked on for a particular question. Anyway, just a thought… I have several other ideas I am playing with in a similar vein and am just interested in hearing if they are already out there or if they are not really all that practical? |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: May 1, 2011 | [ # 10 ] |  |  
 | 
        
		 
   Member Total posts: 10 Joined: Apr 19, 2011 | Just to clarify my earlier post, and as an example, I have never found a chatbot that could answer the following question? What is the zip code of Fort Meade, Maryland? In the model above, (for sake of this discussion lets call it gAIlge.com  - better known as “google with brAIns”) I could ask for the zip code of Fort Meade and the chatbot would answer that it did not know, but the search results could give me several possible answers below. If more people clicked on a certain link, than the chatbot would know that page contained the correct zip code. If that is still not possible given current technology, then we could make it a wiki model, asking the user to use the correct link to find the answer and then to input the correct answer “for the community” before they were done. For the record, I do own gAIgle.com  and I think it might make a nice open source project for chatbot.org members   |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: May 4, 2011 | [ # 11 ] |  |  
 | 
        
		 
   Member Total posts: 10 Joined: Apr 19, 2011 | Some ideas are good bad ideas, in that the bad idea leads to a good idea with a little tweaking. Some ideas are bad bad ideas, in that they lead to nothing. Based on the absence of feedback to the idea above, I am starting to think this was one of the bad bad ideas. On the positive side, bad bad ideas offer the ultimate in fail fast fail early.  |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: May 4, 2011 | [ # 12 ] |  |  
 | 
        
		 
  
  
  
   Administrator Total posts: 3111 Joined: Jun 14, 2010 | Look at it this way, Darian: According to Edison, you only have 9,999 more ways that don’t work to go, before you find the one that will.  It’s not a failure until you give up. |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: May 4, 2011 | [ # 13 ] |  |  
 | 
        
		 
  
  
  
   Administrator Total posts: 2048 Joined: Jun 25, 2010 | Darian Wilson - May 1, 2011: Just to clarify my earlier post, and as an example, I have never found a chatbot that could answer the following question? What is the zip code of Fort Meade, Maryland?
 My Mitsuku can “20755” (granted it’s via a google search though). |  
|  |  |  
 | 
 | 
  
   |  | 
	
	| Posted: May 5, 2011 | [ # 14 ] |  |  
 | 
        
		 
   Member Total posts: 10 Joined: Apr 19, 2011 | Steve Worswick - May 4, 2011: Darian Wilson - May 1, 2011: Just to clarify my earlier post, and as an example, I have never found a chatbot that could answer the following question? What is the zip code of Fort Meade, Maryland?
 My Mitsuku can “20755” (granted it’s via a google search though).
 I guess to the end user, it doesn’t really matter where the data comes from, as long as it is accurate, fast and presented in a compelling manner. Knowing that some programs are pulling data from google is very helpful in my ever-evolving understanding of chatbots. Thank you! @Dave, 9,999 to go! Better get busy and start failing faster!   |  
|  |  |  
 |