|
Posted: Feb 1, 2011 |
[ # 31 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
If we define a frame as:
A frame is a data-structure for representing a stereotyped situation, like being
in a certain kind of living room, or going to a child’s birthday party. Attached
to each frame are several kinds of information.
Then “neurons” are more granular. Although, if you defined a frame as a stereotyped input then it might be dead on. In Skynet’s case, on top of the neurons is a structure that groups bundles of neurons that may be more equivalent to a frame. A neuron can activate a “frame” containing a bundle of additional neurons. The result of that activation may be to return a response or to change some environment variables and continue processing the next highest priority neurons.
A frame in Skynet-Ai might be something like “Math” or “Small Talk”.
|
|
|
|
|
Posted: Mar 15, 2011 |
[ # 32 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
I have put some of my projects on hold to get Skynet-AI ready for the Chatterbox Challenge. But I’ve been asked to provide an update on my status.
JAIL (JavaScript Artificial Intelligence Language) - has been used as a tool to build a variety of my projects. Some are internal test rigs to explore AI concepts. The major visible project is Skynet-AI, a web based chatbot where the AI runs in the client browser.
Speed of JavaScript is still excellent, I have done some refactoring and tests in various browsers. In Windows XP, I find Opera fastest, followed by Firefox and IE8. In researching why it is working so well, I found that many browsers are now compiling regular expressions on the fly. A Skynet-AI neuron is based on a regular expression, it gains speed as the Javascript interpreters progress. I haven’t benchmarked Chrome, or Safari or IE, 9. All of the test were done on a single CPU system.
Skynet-AI has run on a variety of other hardware platforms in the browser of that platform. It runs on cell phones, IPods/Pads, and video game consoles. Most recently I tested it on Sony/Google TV and the Nook. Performance on each of those platforms was good. It tells me I still have headroom to add features with acceptable results.
Internal tests:
Real-time part of speech engine with limited dictionary (guesses part of speech but does not resolve final POS)
Word Math Problem Understanding - can solve a range of word math problems. Some of this has been moved into Skynet-AI. Still working on how best to pick a word math problem out of an ongoing conversation.
Knowledge storage and retrieval - How to best to approach real-time learning; Store values when input is read versus scan a free-form dialog and retrieve on the fly. Part of this is in Skynet-AI.
Reasoning systems - syllogism and backward chaining. Trying to create simple memory system that allows inductive and deductive reasoning.
Cascading memory - enable minimal core memory but retrieve extra “memories” off site as needed. This has become more important as I look to how to store memories that may be infrequently needed. A dictionary is a good example. Only a few thousand words are commonly used but all need to be accessible. An AI’s memory on a specific topic might be the same.
The results of some of these test has caused modifications or additions in the JAIL core.
|
|
|
|
|
Posted: Mar 15, 2011 |
[ # 33 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
Speed of JavaScript is still excellent, I have done some refactoring and tests in various browsers. In Windows XP, I find Opera fastest, followed by Firefox and IE8. In researching why it is working so well, I found that many browsers are now compiling regular expressions on the fly.
Don’t most browsers do a JIT on the javascript code as well these days, making it run faster?
|
|
|
|
|
Posted: Mar 15, 2011 |
[ # 34 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Merlin - Mar 15, 2011:
Real-time part of speech engine with limited dictionary (guesses part of speech but does not resolve final POS)
I will be adding this functionality in the not-to-distant future with my project.
I think my bot’s first guess for an unknown word would be to first try considering it a noun. But if the sub-parse trees surrounding it give hints as to imply the word would make more sense as for example a verb, it could consider that.
What POS does Skynet consider an unknown word?
|
|
|
|
|
Posted: Mar 15, 2011 |
[ # 35 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
Don’t most browsers do a JIT on the javascript code as well these days, making it run faster? Yes, JavaScript in the most advanced browsers is now Jitted.
But, beyond that, Regular expressions themselves can be compiled for even faster execution. Opera, chrome, and possibly IE9 do this. A Regular Expression can be created as a string. The best interpreters convert that to a regex and then compile it after the first use. The second and further uses then used the cached machine code. This is done without the programmer needing to do anything. Although I may have defined all my regular expressions as strings, they run as though they were machine compiled. Since it is compiled on the client platform it gets the absolute best performance.
|
|
|
|
|
Posted: Mar 15, 2011 |
[ # 36 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
“What POS does Skynet consider an unknown word?”
First, this is not currently in the on-line version. It is something I am playing with in a test rig and may include for the next version. I am think about adding it if the AI cannot recognize the input and generate a high quality response.
In general, if I was only going to run it locally I could add a large file or files (like word net, named entity database, etc) to look-up the word. But, response time would suffer if the user was downloading these files from a web page. There is not a publicly accessible service that I can easily hook the AI into to check the POS. That leaves me with a couple of choices; roll my own web service, convince someone else to do it, or try to get as close as possible with a small list of words and guess the basic part of speech.
The real-time POS engine I am testing shows me (tags) the possible part of speech for each word as I type it in.
For what I am trying to do, I think I only need simple POS tags (noun, verb, adjective, pronoun, conjunction, determiner) and no attempt is made yet to resolve a word that might be able to be used in multiple parts of speech.
It identifies words that it has guessed on (and those that it does not have a guess for). The word list is about 2,000 of the most common words. Anything not in the list is guessed.
For example:
“What POS does Skynet consider an unknown word?”
generates:
an,consider,does,pos,skynet,unknown,what,word-//word list
what~det POS~N does~v Skynet~N consider~v an~det unknown~ADJ word~n?-//sent with basic POS & guesses
In the preceding sentence;
~N = guessed the word is a noun
~ADJ = guessed the word is an adjective
The other thing I am testing with this is “instant understanding”. I have hooked up some of my math language system so I get real-time conversion from words to numbers. For example:
“I have three cats.”
cats,have,i,three
i~pron have~v 3 cats~N.
Notice the conversion to a number. If I type:
“I have three thousand two hundred three cats.”
cats,have,hundred,i,thousand,three,three,two
i~pron have~v 3203 cats~N.
The number changes as I add each new word.
|
|
|
|
|
Posted: Mar 15, 2011 |
[ # 37 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Merlin - Mar 15, 2011:
In general, if I was only going to run it locally I could add a large file or files (like word net, named entity database, etc) to look-up the word. But, response time would suffer if the user was downloading these files from a web page.
Hum, I see your dilemma. Perhaps the files could be split up? All words that start with ‘a’ in one file, all start ‘b’, another. Or even all files start with ‘ab’ in one file, ‘ac’,etc. That’s only 26x26 = 676 files, that’s not bad. The javascript could figure out the first 2 letters of a given word, then the GET request for the file with that name.
Best of course would probably be a web server I suppose.
|
|
|
|
|
Posted: Mar 15, 2011 |
[ # 38 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
I have thought of that and even implemented some of it. There is a balance problem with some letters having many more words (c versus x). I could balance them but the look-up would be more complex.
I have also thought about splitting based on topic (female names, male names, companies) that then would give some semantic feedback based on where the file was retrieved from.
I may also split the files based on popularity, search for more common terms first, then in less common files. This led me to trying to guess part of speech from what I know in the sentence and a small word list. So far that approach looks promising. If I go down that approach I will probably try to add more semantic knowledge to the base word list.
|
|
|
|
|
Posted: Mar 16, 2011 |
[ # 39 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
right now I use the “first 2 characters” approach to index.
I know, I know, I should go with a full blown SQL server. ..but damn it, I hate to lock myself in to some proprietary thing like that.
|
|
|
|
|
Posted: Mar 16, 2011 |
[ # 40 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
If all you need is to retrieve simple stored data, even if it’s tabular in nature, there’s nothing at all wrong with using a flat-file database schema. It’s when you get into relational data and selecting data from multiple tables at once that the need for SQL (or something similar) becomes more important. I’ve read a few articles on the subject, and most agree that while there is a performance hit when using flat files, unless you’re serving thousands of requests per day, the speed difference is almost negligible. Morti’s AIML “files” are stored in a MySQL database, only because Program O is designed that way. If I were to create a chatbot truly from scratch, with every line of code being my own creation, I would only choose MySQL because I would need the extra features it provides, and performance wouldn’t even enter the equation. IMHO, using flat files for simple data storage is a LOT simpler than creating a DB connection, building a SQL query, executing the query, and then parsing the returned data.
|
|
|
|
|
Posted: Mar 16, 2011 |
[ # 41 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
Have you already tried using an online db to look up the words Merlin?
|
|
|
|
|
Posted: Mar 16, 2011 |
[ # 42 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Also, SQL-Lite could be an option for you.
http://www.sqlite.org/
|
|
|
|
|
Posted: Mar 16, 2011 |
[ # 43 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
Jan-“Have you already tried using an online db to look up the words Merlin?”
I can’t run a database on my current web host. I could move to a full web host solution but so far I haven’t needed it.
I have done preliminary tests using dbpedia or freebase as a look-up first. Getting the web page to call the service has been a bit complex and I put it on the back burner until I have more time to look play with it. Since many of the items in wikipedia have already been ported to these databases, it may provide part of what I am looking for.
Victor-“Also, SQL-Lite could be an option for you.”
I agree, sqlite looks most promising if I want to deploy an app with a database.
HTML 5 now includes the concept of persistent storage. I could down load the files and have them resident. But that is a big step if the visitor just stops by casually. For a user that wants the extra features/knowledge I could just have them down load a file and have the web page reference a local file. All this adds complexity. Right now I don’t allow a user to add information/responses beyond the current session. As I add that functionality, I would like to use the same process to store all resource files. Part of this goes into changes I need to make in how knowledge is stored.
The web standards committees are trying to establish a simple web database standard. In the future that would probably be the way to go.
http://dvcs.w3.org/hg/IndexedDB/raw-file/tip/Overview.html
|
|
|
|
|
Posted: Mar 16, 2011 |
[ # 44 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Yeah, I just realized after I posted that, for an online system, it would probably not be best. More suited to “stand alone” app that is installed on end users workstation. But still, a good product!
|
|
|
|
|
Posted: Aug 11, 2011 |
[ # 45 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
AndyHo - Aug 11, 2011:
One last question, do you use a AIML-like state machine? because I saw that it spots individual words, and for a certain match, it gives out a randomly selected list of answers, and I saw no elaboration on those responses, thus I see a lot of analysis at the input section, ¿is this done by previous parsing or inside the AIML recursive pattern-recognition-section and detection (which I discarded for this job)
¿Is your system script-driven (interpreting) or is it compiled to ML?
FYI: I (at my framework) do not use AIML-like scripting, but do allow pattern matching (even recursive) for back-compatibility and towards solution some light-parsing stuff, but it interferes severely with my initial stage parser-chunker (NLP processing), Because of this duality I had to handle several differently processed input streams at once on each pattern-matcher station on the main understanding chain, and this did give me several headaches :(
Skynet-AI is powered by JAIL (TM) (JavaScript Artificial Intelligence Language)
You can think of it as JavaScript on AI steroids. The core runs as normal JavaScript with extensions that I have added to enable “AI”. It is similar to other JavaScript Frameworks (like JQuery).
The benefits of being JavaScript based:
-It runs on the client (although it could run on a server).
-It is very compatible (Version .003 of Skynet-AI has run on: cell phones, web TVs, game consoles, Ipads/Ipods/Iphones, pretty much anything that has a browser running Javascript)
-It is fast (Because Skynet-AI runs in the client it eliminates lagtime of transfer to the host and back, and now most of the big browsers JIT compile their JavaScript before it runs, giving it the performance of a compiled program). Most responses are generated in under 5 milliseconds (not including if it needs to access a web resource of course).
JAIL can be used to create a variety of “AI like” programs.
This was how I created the original advanced math functions.
Regarding the Skynet-AI implementation:
Do you use a AIML-like state machine?
No, in general Skynet-AI is stateless, context free and non recursive. This was a design goal and I wanted to see how far I could push a conversational AI without it using any context. This way, if I add context support later, it will only get better.
The best method to describe how it works is a prioritized neural net (although different than traditional neural nets). A neuron can be crisp or fuzzy and has a analog priority. Neurons can be bundled together (like the math module). They can fire functions (like accessing a web site), fire and continue processing (similar to AIML THINK tag), fire and return a response, or branch to a bundle (and then continue if need be).
Each new input is totally independent of all prior inputs (with the exception that the AI can store variables and write/delete new neurons as needed).
The neurons are really parsing for concepts rather than specific grammar. (Think of the concept “hello” or “yes”). It does keywords as fuzzy low priority inputs.
The system has a natural language generator. It is designed to be virtually impossible to have two identical conversations (which makes it seem more human). It allows in-line variations of sentences with low overhead (like the AIML random tag).
Is your system script-driven (interpreting) or is it compiled to ML?
It is script driven, but as I mentioned most browsers now internally compile the script giving it speed advantages.
|
|
|
|