AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

English verbs
 
 
  [ # 16 ]
Andrew Smith - Nov 26, 2011:

Thanks Eulalio, your comments are entirely relevant. I’ll try to describe some of the rules that I’m using here. It would have been easier to just paste the Common Lisp code as it is quite succinct, but it doesn’t format well in this medium and some folks are scared of parentheses.

For English, every verb has a number of principal parts:

infinitive be eat walk
present
-participle being eating walking
past
-participle been eaten walked
present
-third-singular is eats walks
preterite 
=past-participle was ate =walked
past
-plural =preterite were =ate =walked
present
-plural =infinitive are =eat =walk
present
-first-singular =infinitive am =eat =walk 

The principal parts of regular verbs comprise only four distinct inflections (walk, walking, walked, walks) whereas irregular verbs have five (eat, eating, eaten, eats, ate). Only one verb “to be” has eight. Simple present tense and simple past tense can be conjugated using the principal parts alone:

present first singular =present-first-singular i eat
present second singular 
=present-plural you eat
present third singular 
=present-third-singular it eats
present first plural 
=present-plural we eat
present second plural 
=present-plural you eat
present third plural 
=present-plural they eat

past first singular 
=preterite i ate
past second singular 
=past-plural you ate
past third singular 
=preterite it ate
past first plural 
=past-plural we ate
past second plural 
=past-plural you ate
past third plural 
=past-plural they ate 

Other tenses are composed of conjugations of an auxiliary verb combined with a principal-part of the main verb.

future =will infinitive i will eat
past
-perfect =have(past,person,numberpast-participle he has eaten
present
-perfect =have(present,person,numberpast-participle they have eaten
future
-perfect =have(future,person,numberpast-participle you will have eaten
past
-future =be(past,person,numberto infinitive you were to eat
past
-future =would infinitive you would eat 

This much covers the following grammatical categories:

person(first,second,third)
number(singular,plural)
tense(past,present,future,past-perfect,present-perfect,future-perfect,past-future)

I’ve also researched and compiled rules (which I am still refining and debugging) for the additional grammatical categories:

aspect(emphatic,progressive,inchoative,repetitive)
mood(indicative,imperative,interrogative,subjunctive)
voice(active,passive)
polarity(affirmative,negative)

And then there are constructs called verbals which turn verbs into nouns (e.g. I like eating, He provided encouragement). Haven’t started coding those yet so I’m not sure how they’ll turn out.

Andrew, you clearly have a good handle on the whole (maddening) problem that is verbs. More generally, you seem to get the fact that English grammar is itself a poor fit for English!

Bye the bye, although I’m working in C++ right now, I have a fair amount of LISP experience going back to college (graduated Clark University, Worcester in 1985 w/ BS in CS and a specialty in AI). My first job was at MIT Lincoln Labs’ then-new AI department. The day I showed up, they had just received their first LISP Machine from Symbolics. It cost something like $250,000 (service contract included!) and had a monumental 40 megs of RAM if I remember correctly. The documentation was about two feet wide in ten volumes. My boss just handed all of it to me and told me to figure out how it worked and then hold a class to teach everyone same. What a blast!

 

 
  [ # 17 ]
Andrew Smith - Nov 26, 2011:

I’m using the GLR parser that I’ve been researching and developing for the past few years. You may recall that I posted about it a few months back. I’ve attached the latest versions of an example grammar file for discourse analysis and its XML output to this post.

Ah yes I remember. I’ll take a look at your attachments tonight. smile

Andrew Smith - Nov 26, 2011:

The parser is extremely efficient. It examines the input one character at a time and determines everything that the input could possibly represent. Then it gets the next character and eliminates from the list of candidates anything that could no longer match the input, and so on. It never has to examine an input character or consider any given possible candidate more than once.

How do you handle the problem of junk outside your grammar? That is, does your parser always produce a parse? I avoided the LR approach in general because of this issue. (Though I’m sure there are clever work-arounds.) Instead, I’m picking out words within a sentence that indicate the presence of certain phrases or clauses. Then from those start points I’m doing shifts left and right, taking all possible combinations of what the phrase could be.

The reason there are effectively multiple passes has to do with how reductions are handled. This could probably be improved, but at any rate, the goal is to pull out as much meaningful information as I can while always producing a parse.

Andrew Smith - Nov 26, 2011:

As far as performance and capacity is concerned, I’ve tested it successfully with grammars containing hundreds of thousands of rules parsing text files many megabytes in size, which it can process in a matter of seconds. The attached example, a ten thousand word story, only takes a few milliseconds to break down into XML.

Wow, that’s impressive! Out of curiousity, what’s your parser’s success rate on a standard text, say the Brown or Penn corpus?

 

 
  [ # 18 ]

Good questions…

C R Hunt - Nov 29, 2011:

How do you handle the problem of junk outside your grammar? That is, does your parser always produce a parse? I avoided the LR approach in general because of this issue.

I haven’t actually implemented a solution for this problem yet, but I’ve thought about it and researched it a lot. One possibility is to enhance my GLR parser to become a GLR* parser which is able to efficiently discard the minimum number of input symbols necessary to obtain a successful parse.

Another possibility is to ensure that the grammar is guaranteed to match all possible inputs. Inputs that don’t match anything meaningful would get labelled as junk.

A third possibility would be the approach taken by the authors of the English Resource Grammar where they kept adding rules to handle any malformed input that they encountered. I think that this would be the least satisfactory approach though, and I expect to employ a combination of the first two methods.

Out of curiousity, what’s your parser’s success rate on a standard text, say the Brown or Penn corpus?

Unfortunately I haven’t got as far as testing it on anything like that yet, as I am still developing the grammar rules to be able to do so, but it is very exciting to be getting so close now after all these years of careful research, development and testing. Also I’ll probably start with the Susanne corpus which is somewhat smaller, but much more rigorously defined.

 

 

 
  [ # 19 ]
Eulalio Paul Cane - Nov 26, 2011:

Bye the bye, although I’m working in C++ right now, I have a fair amount of LISP experience going back to college (graduated Clark University, Worcester in 1985 w/ BS in CS and a specialty in AI). My first job was at MIT Lincoln Labs’ then-new AI department. The day I showed up, they had just received their first LISP Machine from Symbolics. It cost something like $250,000 (service contract included!) and had a monumental 40 megs of RAM if I remember correctly. The documentation was about two feet wide in ten volumes. My boss just handed all of it to me and told me to figure out how it worked and then hold a class to teach everyone same. What a blast!

Lucky fellow! The biggest computer that my university had at that time only had 2 megabytes of RAM! Other than that I only had microcomputers to play with until I started working for one of Australia’s largest retailers and had some very powerful computers to work with for a while, including an IBM RS9000 (if I can remember the version correctly) ... whatever it was, it was the same model that ran the chess program that beat Gary Kasparov which was pretty exciting for those of us who had an interest in such things.

Although I was very fond of Lisp when I was at university, I spent my first 10 years out of the gates working with various flavours of object oriented C, C++ and SQL. I didn’t start learning Common Lisp until about 12 years ago, but once I started to get a feel for the extraordinary power of it, it felt like all the years of C++ had been a complete waste of time.

Nowadays I prototype everything in Common Lisp and then once I get a good understanding of the problem and solution(s), I write a production version in C (not C++). I’ve found that by doing so, I am forced to implement the software in the plainest and clearest possible way and it ends up being much more maintainable, as well as being really really fast and efficient.

 

 

 
  [ # 20 ]

Here’s an updated list of English verb forms. I believe I’ve covered almost everything now. Still missing are a couple of subjunctive forms (e.g. IF YOU ASK), some modal auxiliary forms (e.g. WE NEED TO ASK) and some gerund forms (e.g. HE LIKES ASKING).

So far I’ve found more than 19000 different ways of using the one verb, and that’s without considering adverbs, or subjects other than pronouns. Some of these forms are semantically equivalent (e.g. WON’T vs. WILL NOT) but it still represents a huge variety of nuances that can be applied to the main verb.

File Attachments
ask.text.zip  (File Size: 55KB - Downloads: 108)
 

 
  [ # 21 ]
C R Hunt - Nov 25, 2011:

I read a paper a while ago (I can search again for it if you’re interested) that discussed the way that people learn grammatical forms. The main question the paper sought to address was why people could learn new grammatical rules so quickly, with only a small set of examples. For example, some verbs are only found followed by specific prepositions. Some only appear in infinitive phrases while others only in participial. Some only appear in certain contexts.

The authors concluded that people encountering a new word would use whatever the most common grammatical forms would dictate until they encounter counter-examples or correction. The barrier for “belief” in the new rule is very low, and people then tend to only use the learned rule (and any other rules known to be associated with the learned rule).

CR I’ve been meaning to get back to you about this. Yes I would be very interested in this paper but please don’t go to any undue trouble to find it on my account. No doubt it will turn up again eventually if it has strong merit.

For reasons which will become apparent if you read it, this might also be a good place to mention a book that Jan Bogaerts told me about in the “AI-Dreams” forum in response to my inquiries regarding good books about English grammar. The book in question is called “Basic English Grammar with Exercises” by Mark Newson and is available online and in downloadable PDF form from a number of sources.

It tackles the problem of specifying rules for English grammar using a somewhat revolutionary methodology called the Minimalist Programme. While this does not yet qualify as a linguistic theory, it was first proposed by Noam Chomsky less than 20 years ago and it seems to have led to some rapid and spectacular advances in the study of linguistics.

I must recommend in the strongest possible terms that anyone remotely interested in natural language processing download this book and start reading! (That’s not to say that this book has all the answers, but is sure seems like a darn good start.)

 

 

 
  [ # 22 ]

http://www.rickharrison.com/language/aspect.html

Here’s a concise and eminently readable article about all the different aspects that verbs can take in different languages.

It is rather more engaging and humorous than the fairly dry articles on the subject that are to be found in Wikipedia, and even has some politically incorrect notes about certain North American dialects. smile

Curious about whether or not anyone else is attempting to handle some or all of these subtle nuances of the language in their chatbots, and if so, what strategies are you using? Even more importantly, did the author miss any?

 

 < 1 2
2 of 2
 
  login or register to react