|
Posted: Oct 5, 2010 |
[ # 286 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
C R Hunt - Oct 5, 2010:
I don’t know how important it is for parsing for the program to understand that “houses” is an object of “painting”.
It’s important; it is absolutely part of the parsing phase.
C R Hunt - Oct 5, 2010:
I’m planning on another set of tools altogether that relates nouns to verbs and adjectives…
Yes, an ACE must be able to support sentences which contain subjects that contain any number of nouns, each of which can be modified by zero or more adjectives, each of which can be modified by zero or more adverbs, and each of which in turn can be modified by zero or more adverbs
|
|
|
|
|
Posted: Oct 5, 2010 |
[ # 287 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
I think I was unclear. I wasn’t referring to modifiers, but to words that might be morphed into other parts of speech. While this isn’t necessary for parsing, such understanding is important for drawing inferences within the knowledge base, or for forming a reply to the input.
My skepticism in this case is that there is no difference in how my parser would handle “painting houses” if it labeled “painting” a gerund or if it labeled “painting” a noun and called it a day. The information that “painting” is a gerund is encoded in the word itself by having an “ing” ending and being labeled a noun. Maybe for extrapolation purposes (ie, to use the new information to tie together other concepts in the knowledge base) it would be useful to use the fact that “painting” can also refer to the verb “paint”, but for the sake of parsing itself, it does not seem important.
Thus far I’ve been attempting to limit the number of pos that a word can have to: article, adjective, adverb, conjunction, noun, preposition, punctuation, verb. Phrases that act as any of these parts of speech are lumped together and given one label. So “painting houses” would be labeled a noun. I guess my philosophy is add complexity only when necessary.
|
|
|
|
|
Posted: Oct 5, 2010 |
[ # 288 ]
|
|
Senior member
Total posts: 697
Joined: Aug 5, 2010
|
@ Hunt: I agree with you on this one. For me to, a gerund is not a pos, but a word form, like a verb conjugation, or the multiple of a noun is also a word form.
|
|
|
|
|
Posted: Oct 5, 2010 |
[ # 289 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Jan Bogaerts - Oct 5, 2010: @ Hunt: I agree with you on this one. For me to, a gerund is not a pos, but a word form, like a verb conjugation, or the multiple of a noun is also a word form.
I guess it comes down to practicality. I haven’t encountered sufficient reason for distinguishing verbals from what they’re disguising themselves as. But perhaps I’ll encounter a situation/parsing problem in the future that will change my mind.
Edited to add: Actually, not all verbals. Infinitives can be tricky beasts.
|
|
|
|
|
Posted: Oct 5, 2010 |
[ # 290 ]
|
|
Senior member
Total posts: 257
Joined: Jan 2, 2010
|
Hi,
After reading about the ‘depth’ of grammar parsers in this thread, a simple math analogy comes to mind.
We could define PI to be 3.141592653589793238462643383279502884197169399375105820974944 (or a gazillion digits)
Or ...
We could define PI as 3.1415927
Both values are useful. With the 2nd value one could earn an “A” on most tests requiring PI for calculations. So it is sufficient 99% of the time. However, I’m just guessing, an astrophysic calculation could be considered incorrect if the 2nd meaning is used.
The number of digits in PI in this analogy represents how many POS it ultimately chosen for a parser.
Now, what if someone mixes digits in the 1st definition of PI? This may significantly impact those precise calculations. This is akin to the way people write. I believe that most people include grammatical mistakes in the way they write. Especially when it comes to casual or informal writing. This is the type of writing a chat bot should expect.
Gotta go!
Chuck
|
|
|
|
|
Posted: Oct 5, 2010 |
[ # 291 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Based on personal experience, the way that people type in MOST internet climates (e.g. posting to a forum, chatting with friends, playing around with a chatbot, etc.), it seems that the equivalent value of Pi would more likely be 3.12 (both incorrect, and uselessly low on “detail”). I’m no perfectionist, but I (along with the vast majority of the people who post here, I can see) strive to be reasonably accurate with my grammar, as well as making sure that my posts are well understood from an informational standpoint, as well. Since I write programs in a large variety of different programming languages and other syntactic structures (e.g. PHP, Visual Basic, CSS, HTML, Javascript), it pays me greatly to be precise. I expect that Chuck feels the same, as well as anyone else in similar positions.
|
|
|
|
|
Posted: Oct 5, 2010 |
[ # 292 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Dave Morton - Oct 5, 2010: Since I write programs in a large variety of different programming languages and other syntactic structures (e.g. PHP, Visual Basic, CSS, HTML, Javascript), it pays me greatly to be precise. I expect that Chuck feels the same, as well as anyone else in similar positions.
I wasn’t trying to imply that it is okay to be sloppy. There is a difference between breaking grammar rules because you don’t know them and breaking them on purpose (or at least, recognizing that you aren’t breaking them in practice but only in name). Victor is absolutely correct in his definition and explanation of gerunds.
But since my system relies on recognizing grammar patterns, and since the grammar pattern for a gerund grouped together with its object as a noun appears identical to the grammar pattern of a sentence in which the whole phrase was actually replaced by a noun, it is useful for me to not add more labels into the mix. (That is, the bot will learn faster and be able to apply its knowledge to a wider variety of sentences.) Until I’m convinced that not specifying a gerund label will get me into trouble, I’d rather forgo it.
|
|
|
|
|
Posted: Oct 5, 2010 |
[ # 293 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
Oh, I wasn’t implying that you thought it was ok to be sloppy, my dear. Far from it, actually. My remarks were more closely directed toward the general public, who will be using our bots at some point in the future, and indicating that it would be in our best interests that we make accommodations for such inaccuracies and vagueness. In fact, I consider your idea of simplifying the grammar testing to be a good idea, at this stage, since it will undoubtedly produce fewer “errors” in parsing that will have to be sifted through and discarded. But this is just my opinion, and can be discarded as well.
|
|
|
|
|
Posted: Oct 5, 2010 |
[ # 294 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
True. I agree with Victor that the way we recognize incorrect grammar is by figuring out the way the grammar presented deviates from a known rule or pattern. Flexibility is key. (And a large enough set of known rules or patterns that the bot can be reasonably sure the text deviates and isn’t just a new rule. I wonder at the computational requirements of all this…)
|
|
|
|
|
Posted: Oct 5, 2010 |
[ # 295 ]
|
|
Senior member
Total posts: 473
Joined: Aug 28, 2010
|
In linguistics, the standard convention is to place an asterisk before any examples of malformed or questionable grammar, like so:
1. John sold the book to Mary.
2. *The book was sold to Mary by John.
3. *John are in the corner.
If you’ve taken a look at the English Resource Grammar which I mentioned in this forum a few weeks ago, you may have noticed that it includes thousands of grammar rules for malformed sentences too. The goal is to be able to understand and correctly parse anything that is written intelligibly, however poorly, while at the same time only generating grammar that is completely correct according to standard usage, whatever that may be.
|
|
|
|
|
Posted: Oct 5, 2010 |
[ # 296 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Andrew Smith - Oct 5, 2010: If you’ve taken a look at the English Resource Grammar which I mentioned in this forum a few weeks ago, you may have noticed that it includes thousands of grammar rules for malformed sentences too. The goal is to be able to understand and correctly parse anything that is written intelligibly, however poorly, while at the same time only generating grammar that is completely correct according to standard usage, whatever that may be.
I think this is an important point. When people use incorrect grammar, they tend to deviate in specific ways. In theory, one could learn these improper rules the same as one learns the proper ones. The disadvantage to not mapping back to a proper grammar rule would be that there may be information lost that is important to include in the knowledge base, and inference about that lost information might require knowing which rule was broken.
|
|
|
|
|
Posted: Oct 6, 2010 |
[ # 297 ]
|
|
Senior member
Total posts: 153
Joined: Jan 4, 2010
|
Its been a while since I visited this thread. I kind of miss the virtual character that lives in a 3D world. The journey seems to be now into the abstract of grammar, etc. While I was away busy, I was adding a natural language generator to my tool set. It is only a surface generator, but it really demonstrates the need for having things like gerunds, infinitives, modifiers, noun phrases and verb phrases and direct objects and indirect objects, etc. All those things that appear to be this current focus of parsing a user’s input. It occurs to me that just because grammar is important to processing the authoring understandable output, it may not be as important to deciphering the input.
When I first started my journey into AI many decades ago, I created a specific definition for each utterance. Then I moved onto the problem of representation and how would I stored all this information. Then I researched expert systems and tried various ways of reasoning and planning. Over time it became clear that character is hard to define without a plot. So I studied story telling. When I had a chat bot which could reply to stimulus, I found the conversation shallow. I eventually came to the point of wanting a model of the world containing a virtual character so that there could be some meaning to what the bot utters. So there would be a story to tell. Then the analysis of the inputs would only serve to complete a perspective or situation in the virtual world.
A long time ago, someone told me that communication is the act of generating in the listener the intentioned message of the speaker. The listener must imagine what the speaker is trying to communicate. And though sometimes what the listener is imagining is not what is sent, the listener must somehow feedback that sent message so the speaker knows the message was communicated.
So understanding the input to a chat bot is the real process of creating that model in the chat bot’s world representing the user’s intentions. The response can be to reiterate the message, or the response can be what the bot would have said without any input, but it should be what is important for the character’s current focus in the virtual world. If knowing that a gerund is a noun phrase or just a gerund is not an advantage in forming that message in the bot’s own terms, perhaps it is not too practical. Knowing that a given string of characters doesn’t follow specific grammar rules is a rare hint. Turning that input into something a bot can use is the secret. And I suggest grammar is more important for expressing the bot’s intentions in the virtual world to the external world than in extracting universal truths from some “clean room” natural language stimulus that may be influencing the virtual world, or at least the character in the virtual world. The line of “Filling in the Blanks” seems a more promising approach. Perhaps after the filling is done, a NLG form can be created to match the rules for grammar parsing. Inputs should be messy…
|
|
|
|
|
Posted: Oct 6, 2010 |
[ # 298 ]
|
|
Senior member
Total posts: 974
Joined: Oct 21, 2009
|
Welcome back Gary.
I think the choice between gerund represented as noun-phrase or single noun depends on how you’ve coded your parsing algorithm, and your own personal approach. Perhaps considering it as a single term like CR is doing is one approach, and then, if need be, the ACE could further analyze and break it down later if need be. For my ACE, I have decided to parse it out right away, that’s just my choice, so far it’s working. If you ARE going to parse it out right away, it really helps for your ACE to be aware that a gerund is a word that simultaneously plays the roles of a noun and verb Also, I’m not sure about everyone else, but parsing (parse tree generation), is only stage 2 of about 5 stages in my ACE.
|
|
|
|
|
Posted: Oct 6, 2010 |
[ # 299 ]
|
|
Senior member
Total posts: 623
Joined: Aug 24, 2010
|
Victor Shulist - Oct 6, 2010: I think the choice between gerund represented as noun-phrase or single noun depends on how you’ve coded your parsing algorithm, and your own personal approach. Perhaps considering it as a single term like CR is doing is one approach, and then, if need be, the ACE could further analyze and break it down later if need be.
I guess we’ll see how well it works. This is the experimental side of the whole development process.
Victor Shulist - Oct 6, 2010: For my ACE, I have decided to parse it out right away, that’s just my choice, so far it’s working. If you ARE going to parse it out right away, it really helps for your ACE to be aware that a gerund is a word that simultaneously plays the roles of a noun and verb Also, I’m not sure about everyone else, but parsing (parse tree generation), is only stage 2 of about 5 stages in my ACE.
Out of curiosity, what does ACE stand for? I don’t think I’m familiar with this acronym…
|
|
|
|
|
Posted: Oct 6, 2010 |
[ # 300 ]
|
|
Senior member
Total posts: 257
Joined: Jan 2, 2010
|
Hi Gary, Its been a while since I visited this thread. I kind of miss the virtual character that lives in a 3D world. The journey seems to be now into the abstract of grammar, etc.
I had to take a few steps back with Walter and his virtual world. I found myself getting caught up with managing threads in code, grabbing real world data, etc. After a few months I had no ‘chat bot’ to speak of.
So, I decided to step back to R&D. I’m now using Excel/VBA to perform various experiments related to the chat bot design I wish to create one day.
I’m constructing a list but I’ve worked on nearly 20 small experiments that resulted in code development. I consider these to be vital ‘proof of concept’ successes.
Additionally, I found myself insanely jealous with Vic’s grammar parser. Reading his posts is like reading a college grammar book. So, I thought I would seize upon some momentum with my curiousity with Vic’s work to build my own parser.
Enough for my excuses…
A long time ago, someone told me that communication is the act of generating in the listener the intentioned message of the speaker. The listener must imagine what the speaker is trying to communicate. And though sometimes what the listener is imagining is not what is sent, the listener must somehow feedback that sent message so the speaker knows the message was communicated.
I shared my experience in this thread I had telling my 5 year-old grandson a story. I could see his eyes moving about as he was trying to imagine the story. Eventually, he was so convinced of the story that he stood up to look outside the window to see the ‘giant boy’.
Good to have you stop by. Try to get around more regularly and tell us about your experiments.
Regards,
Chuck
|
|
|
|