|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
I recently had a user who really engaged with Skynet-AI. She would sign on every day and have a multi-hour conversation. Over the course of a number of weeks she hit the boundaries of the conversational patterns (probably somewhere in the neighborhood of 40 - 50 hrs). Her willing suspension of disbelief and willingness to try to teach the bot as if it were a small child made for some of the most in-depth conversations in Skynet’s 3 year history.
To be able to keep conversations fresh, or to rapidly create new bots that have their own personality, I need to expand JAIL’s (JavaScript Artificial Intelligence Language) Natural Language Generation (NLG) tools. After reading Dave’s post regarding Morgaine, I realized others might be interested in discussing strategies for NLG. Chime in if this topic is of interest to you.
http://en.wikipedia.org/wiki/Natural_language_generation
http://blip.tv/pycon-australia/using-python-for-natural-language-generation-and-analysis-3859677
|
|
|
|
|
Posted: Jul 3, 2012 |
[ # 1 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Good to hear Skynet-AI is turning into such a hit !
|
|
|
|
|
Posted: Jul 3, 2012 |
[ # 2 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
Thanks Victor,
It is apparent that if I want to put multiple bots on the net, having them able to generate their own natural language responses eliminates a lot of overhead. Skynet-AI currently includes multiple NLG tools.
Random Responses - On average each response has a minimum of 4 branches. This is probably the most important part of a bot. My statistics indicate that if a bot repeats a response within 10 volleys, there is a much higher chance the user will sign off. The ability to use in-line random fields adds a multiplier effect for the bot and keeps it fresh. For example:
I am glad you [like|are enjoying] our [time together|discussion|interaction].
Can generate 6 different responses.
Full NLG - In the math module the response is generated on the fly. In the smalltalk module it is also composed on the fly within a set of restricted topics (I hope to beef this up with additional context awareness). The memory module adds learned items and responses on its own.
I am looking to add more flexible NLG methods as I move forward.
|
|
|
|
|
Posted: Jul 3, 2012 |
[ # 3 ]
|
|
Senior member
Total posts: 498
Joined: Oct 3, 2008
|
> http://www.meta-guide.com/home/ai-engine/nlg-natural-language-generation
Merlin, you can see my Meta Guide webpage on NLG, above; note links to related Meta Guide pages at bottom.
> http://www.quora.com/Natural-Language-Generation
There is also a link to the NLG topic on Quora; note the relation to the current practice of “autoblogging”, which in turn is related to the new Weavrs.com “infomorphs”, see link below.
> http://en.wikipedia.org/wiki/Infomorph
I would say the most common implementation of “NLG” is probably the plethora of Markovian scramblers.
|
|
|
|
|
Posted: Jul 3, 2012 |
[ # 4 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
Thanks Marcus,
One of my favorite articles about NLG in action is:
http://www.npr.org/2011/04/17/135471975/robot-journalist-out-writes-human-sports-reporter
I haven’t been impressed with any of the Markovian generators, they tend to end up being too random and fail to hold context. The most famous of these is Mark V. Shaney.
http://en.wikipedia.org/wiki/Mark_V_Shaney
http://en.wikipedia.org/wiki/Markov_chain
http://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_you’re_a_dog
|
|
|
|
|
Posted: Jul 3, 2012 |
[ # 5 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Merlin, there’s a slight problem with that last link, due to a small bug in the forum software here. I tried to “fix” it, but even the little “cheats” that I’ve used in the past don’t help. I’ve got another possible way to correct the issue, though. I just need to get a little creative.
[edit] All fixed. And a bug report has been submitted. [/edit]
|
|
|
|
|
Posted: Jul 4, 2012 |
[ # 6 ]
|
|
Senior member
Total posts: 498
Joined: Oct 3, 2008
|
> http://betabeat.com/2012/07/mark-v-shaney-horse-ebooks-markov-chain-twitter-07022012/
Here’s a more recent post about Mark V Shaney, above link.
> http://www.meta-guide.com/home/ai-engine/100-best-autoblogging-videos
100 Best Autoblogging Videos
> http://www.meta-guide.com/home/ai-engine/best-wordpress-autoblog-videos
Best Wordpress Autoblog Videos
> http://www.meta-guide.com/home/ai-engine/best-wordpress-tweetoldpost-videos
Best WordPress Tweet Old Post Videos
> http://www.quora.com/Autoblog
Most autoblogging is happening in the Wordpress world .. and @Weavrs are apparently based on “heavily mutated” Wordpress, called UberStream ..
|
|
|
|
|
Posted: Jul 4, 2012 |
[ # 7 ]
|
|
Senior member
Total posts: 147
Joined: Oct 30, 2010
|
I’ve long wanted to experiment with Markov processes that include linguistic structures such as NP, V, VP, Subject, Verb, Object, instead of simply using single word tokens. Tools such as the lexparser and/or link grammar provide one way of recognizing such linguistic chunks.
(I wish instead of testing us on implementing the CKY algorithm, nlp-class had provided us with simple command-line programs for parsing that I could use to test the idea of using linguistic entities in place of words in Markov chains…I guess the source is there, but too bad the honor code prevents students from sharing their CKY implementations.)
|
|
|
|
|
Posted: Jul 4, 2012 |
[ # 8 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
Robert,
We can share CYK code on the solutions forum in the NLP class discussion group. I have also been thinking about how to use modified Markov chains to produce better results.
|
|
|
|
|
Posted: Jul 4, 2012 |
[ # 9 ]
|
|
Senior member
Total posts: 498
Joined: Oct 3, 2008
|
Why are pirate technologies like “autoblogging” interesting, and how do they relate to natural language generation?? Autoblogging is closely related to the “robojournalism” of companies such as Narrative Science (@narrativesci) and Automated Insights (@AInsights), mentioned in Merlin’s NPR link above, that are generating prose out of raw data. So what is the difference between blackhat autoblogging and whitehat robojournalism? I suggest that the difference is basically Markovian, related to the level of gibberish and its relevancy.
Questions and answers, QA systems, go two ways. How do you generate blogs automatically, and how do you generate news articles automatically?? Just look at the converse, how we generate answers from questions “asked” of blogs and news…. Not only can we generate answers to questions by “searching” online; but, we should also be able to do the reverse with similar technology, construct textual material, such as blog or news, via dialoguing simply by framing the questions right…. ;^)
|
|
|
|
|
Posted: Jul 4, 2012 |
[ # 10 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
Black hat autoblogging is like SPAM for search engines, just reposting original content with little value added. The ability to take raw data (especially from the original source) and create quality content should make data retrieval easier and faster. It has applications for content providers and enhanced customer service.
In a bot, you have a set of data/variables stored. Some trigger is tripped during the conversation and the question is, how do you select and package the information in a human like conversational format. Assuming you have perfect knowledge of the users input, what techniques might one use to create a response on the fly? The goal is to write as few canned responses or template fragments as possible and yet still enable broad and dynamic conversation.
|
|
|
|
|
Posted: Jul 4, 2012 |
[ # 11 ]
|
|
Senior member
Total posts: 498
Joined: Oct 3, 2008
|
Let’s get unstuck from the Turing test model for a moment, and look at the mirror image of this story…. Of course we create bots to mimic people. But what if we created a bot to write essays for us, simply based on our oral outline? So instead of getting the bot to summarize the web for us, the bot would expound textually on our oral summary…. I think my point is that if we can can use the web to generate human-like responses, in other words answers, why can’t we do it the other way round and just use our human responsiveness to generate web-like results? This would be like a Turing test based on human products, for instance essays, created from dialog, rather than based on the dialog itself….
|
|
|
|
|
Posted: Jul 5, 2012 |
[ # 12 ]
|
|
Guru
Total posts: 1081
Joined: Dec 17, 2010
|
Skynet-AI includes 2 features that I believe come close to what you are talking about. Try asking it to “write me a story” or “Who are you?”
The AI will compose a unique digital document based on an outline.
|
|
|
|