AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Applied Problems, an AI evaluating standard for chatbot.
 
 
  [ # 46 ]

I handle this like all other issues, generate all possibilities.  Then concept instantiation discriminates based on word meanings, selecting the most likely parse tree.

This will be CPU intensive though, since if a sentence had:

          A’s B blah blah blah C’s D

I will need 4 permutations:

    A is B, C is D
    A is B, C’s D
    A’s B, C is D
    A’s B, C’s D

And of course in general, 2^n permutations.

Grrrr!... Why did they have to do this in English?  Making the same punctuation mark do two different things !

My earlier example, of “The street was littered with paper, thrown from the windows”, in case your wondering, here is CLUES’ output…

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
]The street was littered with paper, thrown from the windows.

*Processing time: 1 seconds (for 11 word input, 39 parse trees generated in ./gis:).

pos = simple-sentence
subject.noun1.adjective1.val = the
subject.noun1.num-adjective = 1
subject.noun1.val = street
subject.num-noun = 1
predicate1.num-verb = 1
predicate1.verb1.auxiliary-verb1.val = was
predicate1.verb1.num-auxiliary-verb = 1
predicate1.verb1.num-prep-phrase = 1
predicate1.verb1.prep-phrase-list-type = space
predicate1.verb1.prep-phrase1.num-object = 1
predicate1.verb1.prep-phrase1.num-prep = 1
predicate1.verb1.prep-phrase1.object1.num-past-participle-phrase = 1
predicate1.verb1.prep-phrase1.object1.past-participle-phrase-list-type = single
predicate1.verb1.prep-phrase1.object1.past-participle-phrase1.comma-prefix = true
predicate1.verb1.prep-phrase1.object1.past-participle-phrase1.num-object = 1
predicate1.verb1.prep-phrase1.object1.past-participle-phrase1.num-past-participle-verbal = 1
predicate1.verb1.prep-phrase1.object1.past-participle-phrase1.num-prep = 1
predicate1.verb1.prep-phrase1.object1.past-participle-phrase1.object1.adjective1.val = the
predicate1.verb1.prep-phrase1.object1.past-participle-phrase1.object1.num-adjective = 1
predicate1.verb1.prep-phrase1.object1.past-participle-phrase1.object1.val = windows
predicate1.verb1.prep-phrase1.object1.past-participle-phrase1.past-participle-verbal1.val = thrown
predicate1.verb1.prep-phrase1.object1.past-participle-phrase1.prep1.val = from
predicate1.verb1.prep-phrase1.object1.val = paper
predicate1.verb1.prep-phrase1.prep1.val = with
predicate1.verb1.val = littered

 

 
  [ # 47 ]
Victor Shulist - Sep 12, 2010:

I handle this like all other issues, generate all possibilities.  Then concept instantiation discriminates based on word meanings, selecting the most likely parse tree.

[...]

Grrrr!... Why did they have to do this in English?  Making the same punctuation mark do two different things !

Actually, three different things in this case. Don’t forget about “has.” As in, “She’s got to get her parser to run faster.” smile I used to generate all parse trees, as I said, but it was just such a slow process I’ve been cutting corners wherever possible.

Victor Shulist - Sep 12, 2010:

My earlier example, of “The street was littered with paper, thrown from the windows”, in case your wondering, here is CLUES’ output…

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
]The street was littered with paper, thrown from the windows.

*Processing time: 1 seconds (for 11 word input, 39 parse trees generated in ./gis:).

Wow, that is fast. Here’s hoping the next iteration of my parser will even come close. Did your parser determine that the part after the comma was a past participle phrase based on the position of thrown, and then cull options based on rules for past participle phrases? Or were more general rules used to eliminate parses? What would CLUES have done if there was no comma?

 

 
  [ # 48 ]

You’re right.. thanks for pointing that out.  I will make a note in my TODO list to add rule to also generate possibility for “has”.

CLUES doesn’t need the comma.  If it sees a comma, it does this:

  whatever part of speech the text :

              , x

has, it gives a copy of all those properties to just :
     
              x

Right now, I had to manually go throw those 39 parse trees and find the file that was the correct one.  Using some clever “grep” commands smile  and some scripts.

CLUES never does any culling.  It will keep all of its options open, and only by evaluating against concept specs and conversation state will it select the trees with the highest merit (and if more than one with equal merit, then randomly I guess, or perhaps inform the user it is confused, and form a question).

I could, right now, write a ‘concept specification’ which would filter out the 38.    I do have this working now, but I really want to get done all the grammar stuff.  Probably have a solid month or two to go before grammar is completed.

The parser is nice and fast, I was very thrilled when I cut the processing time enormously.  Also later I will write a parallel version and get myself a machine with 2 CPUs with 4 cores each smile  But for now, even running in a single thread is fast, which is extremely promising !

 

 
  [ # 49 ]

Hi Folks,

By way of encouragement I’d like to offer that I’ve been testing my parsing software on entire stories (for example “The Country of the Blind” by H.G.Wells) of ten thousand words, and it typically takes only a small fraction of a second to parse. Thats the interpreted Common Lisp version. I’ve been rewriting my parsers in C to maximize their portability and interoperability with other programming environments, but the significant speed up that has afforded is also a bonus.

Have you researched existing parsing algorithms? For natural language parsing, a good place to start is with Jay Earley’s chart parser which was first developed in 1970. It can be implemented in a few lines of Perl or Common Lisp (there are already open source implementations available for you to experiment with) and it is capable of handling any context free grammar with a minimum of effort.

http://en.wikipedia.org/wiki/Earley_parser

Even faster, albeit much more complex to implement, are generalized left to right (GLR) parsers such as the Tomita algorithm first described in 1984. This one is based on a finite state machine and a stack and while it can handle the most complex of grammars with ease, it is also as efficient as theoretically possible with simpler grammars too.

http://en.wikipedia.org/wiki/GLR_parser

The reality is that with a good parsing algorithm the worst case processing time that you can possibly experience is proportional to the cube of the number of words (N^3). If you take a “brute force” approach to parsing you are faced with a time proportional to the factorial of the number of words (N!). Not a lot of difference for small numbers of words, but while one remains manageable as the number of words increases, the other rapidly becomes astronomical.

Cheers,
Andrew Smith

 

 
  [ # 50 ]

I’ve heard of LR(1) parsers for python, but I’m unfamiliar with GLR. I’ll look into it, thanks smile

Also, neat course found via googling: http://nlp.stanford.edu/courses/lsa354/ I plan to read the lectures there when I get an ounce of free time.

 

 
  [ # 51 ]

I have written and designed several LL (and LR) parsers (and compiler generators - gene generator was one), first using tools like Coco/R, later completely manually (and using gene generator of course). I am sorry to disappoint you, but those types of parsers aren’t very good in parsing English. They can do a small sub part, that is well defined (like programming languages, which are highly structured), but not the variability we are after. these type of parsers simply can’t handle the fact that sometimes, you have ambiguity in the statement and need to rely on semantics.
That said, if you ever want to write a proper parser, you’ll probably have to learn this stuff.

 

‹ First  < 2 3 4
4 of 4
 
  login or register to react