Intelligence is hard to define, but can be summarized, in a brief sense, as a feedback loop that operates on and in internal models and external environment. In this sense, there’s just a matter of degree between Alice, a squirrel, and a human mind. Or one of those mesmerising water/oil fulcrum wave contraptions, but I digress…
The internal model of a mind is the sum total of the perceptions, the model of the world, the interpretation of the model of the world, its projections, and pretty much represents the outer boundaries of what could be considered an “intelligence”. The external environment includes the corporeal aspect of the intelligence - the thing itself, not the perceived model - and the actual environment it moves in. Whether this is a virtual environment and the corporeal aspect is a software agent, or whether its a human body hosting a brain surgeon, the two facets interact in very similar ways.
Approximating intelligence should be approached the same way the Wright Bros approximated the flight of birds - they tried flapping, they tried gliding, and then they figured out a reasonable, functional approximation of flight that allowed them to stay aloft. Birdlike flapping doesnt cut it, at least with their level of technology (I’m sure nanomaterials and future technology will enable energy efficiency and materials that enable flapping flight, but in order to begin flying, they needed to work with what they had.)
In the same way, AI researchers need to work with what they have, and provide functional approximations for aspects of intelligence they can observe. This is precisely why current chatbots fall so flat. They’re flapping for all they’re worth, and intelligence just doesn’t work that way. You can’t simply replicate the words that result from intelligent thought processes and drop them in as a replacement for the thought processes themselves. At best, you’ll get a Loebner prize winner, or a couple days notoriety on a slow news week when a couple animated bots deliver a loosely scripted newscast.
At worst, you’ll realize the months and years of work needed to make “flapping” come anywhere near reasonably imitating something intelligence. Even when you end up with a reasonable imitation… it’s still not quite the real thing. No depth.
For the last several years, I’ve been thinking about how to get the intelligent thought processes that result in intelligent behaviors. It’s taken me all the way back to SHRDLU, which demonstrated a wonderful fusion of the internal model and awareness of the external environment, to neural networks, Cyc, AIML, CLIPS, and OpenCog. All of these projects demonstrate some excellent features, but none of them quite meet my criterion - self learning without tedious manual entry, runs on windows, no extensive mucking with arcane OS dependencies, no gratuitously cryptic jargon.
Cyc and OpenCog, btw, are the two most promising AGI projects currently being developed. Cyc may be 10 or 20 years from reaching the self-propelled stage, but it is measurably improving, and other statistical results suggest that eventually, Cyc will achieve AGI, once someone incorporates a valid feedback model with enough internal awareness to boostrap it.
Anyway, my conclusions are these: If you want a smart chatbot, it’s got to have the ability to be smart. It needs memory, reasoning capacity, embodiment, an internal model, and externally constrained behaviors (meaning if it has a robot arm, it doesn’t wildly flail it for no reason, it needs a basic “instinct” that damps the use of that arm unless specifically directed.”
All of those things need to be held together in an internal model that is capable of predicting an arbitrary number of interactions between internal and external stimuli (projecting the future to guide behaviors.)
What this means in English is simple. Instead of a bot mindlessly accepting a string as input, passing it to an if/then loop, and returning an output, it needs an active monitor that watches for input, and when input is given, the internal model updates (recognition) and then produces a behavior (act) and then updates again to account for the fact that something was recognized, something was done, and then, depending on the learned behaviors, other things may be done, or it can go back to waiting for a response.
This simple recognize/act cycle gives the bot a depth that can be expanded on much more easily than a trigger/response cycle that isn’t explicitly aware of time, because it forces the developer to fashion behaviors based on explicitly modeled interactions. Instead of “if input = hello then respond hello” you get “listen for input, respond appropriately”. This means that you need to teach a system general behaviors, and in order for that to work, you have to have a system capable of generalizing.
This is where neural networks are great. When properly used, they can take noisy inputs and produce useful outputs, and can be updated and trained continuously. They can also provide rigid and arbitrary behaviors through deliberate overtraining, and otherwise act as general “buckets” for processing.
My current angle in developing a chatbot is the use of arbitrarily built ordered binary graphs representing sets of well defined fixed or dynamic variables as inputs into a neural network. What this means is that I take some number of concepts that have a logic order - something like this:
000 - Socrates
001 - Zeus
010 - Man
011 - Mortal
100 - AllOf
101 - queryIsAB(A,B)
110 - Yes
111 - No
The network is built on the fly based on the inputs it receives, so in this case if I wanted to train the network to respond appropriately, I would teach it 101 000 010 as inputs with output 110. You could also teach it that Socrates is not Zeus, that Zeus is not Man, that Man is Mortal, and so forth.
A small network like that would require explicitly training every logical rule, without a lot of room for generalization. The network couldn’t generalize answers about a graph very well (or really at all) - it wouldn’t pick up that Socrates is Mortal unless you specifically teach it that bit of logic: if A is in B, and all B is in C, then A is in C.
Expand the size of the network, however, and give it a whole lot of examples of first order logic, with a set of fixed variables, and then you end up with a neural network model capable of some remarkable things. For example, if I incorporated the entire Cyc specification of relationships, and then utilized the remaining variables for a particular domain with specific knowledge and query variables, then I could have the ability to load knowledge in and out of the bots “brain” simply by changing the values of the references and loading the correct neural network with all available contextual information. Memory becomes a matter of storing a sequence of database references - concept lookup points. You don’t need the whole neural network running when you can activate it based on the context provided by the indexed concepts.
The generalization part comes in when I can identify a potentially correct network, fill in the available known information, and because of the nature of neural networks, it’s already got a whole lot of assumptions built into the model, so I don’t have to waste time specifying rules and situations and micromanaging the particulars. I operate on a statistical level, in the sense that the output from the neural network can be interpreted - and gives the system the inherent ability to “not be sure.”
Every concept and neural structure is stored in a simple lookup table (formal database or otherwise) and essentially represents a massive hypergraph. Knowledge is fractal, in the sense that as long as you have a sufficiently expressive initial set of first order logic terms, you have a turing complete network, capable of producing output that can be fed back into the same network, or same series of networks, for nonlinear results.
It also lends itself nicely to recreating particular regions of the brain in functionally and architecturally significant ways. It’s modular - the neural networks are fixed size, but taken together express more computational power than the sum of the parts, without the overhead of a combinatorially explosive monolithic ANN.
The tricky parts I haven’t fully fleshed out yet are the neural network lookup system - given a set of concepts, how do I pull up all “relevant” networks? How deep and how broad should the lookup go? However, I plan on creating a set of test cases, and then building off of them, and then ultimately building fully “cognizant” neural networks that only lack training on specific ontologies, but are otherwise completely trained on whatever implementation of first order logic I end up using. I’m debating on incorporating OWL in its entirety (especially since it comes with fully loaded existing test suites and is well documented) or CycL. The processing system that looks up networks will also need to incorporate back and forward chaining inference within the process, but I see these as surmountable challenges.
The bot learns by training networks sequentially on recognized inputs. Each unique ontology gets its own network, so if new inputs are detected, new networks are spawned. Part of the semantic pruning process would be compressing shared attributes of fresh inputs (age of concepts would be a factor in the pruning process) and then updating the network associations. Every network would be online, and trained incrementally after each recognition occurs. This implies that by changing the learning rate, weights, activation functions, and other aspects of the neural network, you end up with the potential to simulate virtual “drugs” and psyschoses and other fun stuff.
In any case, I’m planning on creating a wide set of initial learning behaviors, rather than incorporating existing knowledge, so as to avoid piegeonholing. Theoretically, then, it would be a matter of creating interfaces (embodiment) that did things like parse text. Some sort of metaprocess would also be needed to clean the neural networks, retrain overlapping concepts, and so forth - semantic pruning. That, however, is a long ways away. The learning behaviors would impact the development of the internal model, and the recognize/act processes through which the system learns things.