|
Posted: Jul 25, 2016 |
[ # 16 ]
|
|
Experienced member
Total posts: 39
Joined: May 16, 2016
|
Thank you Bruce and Eduardo Bedoya.
As I understand it, the work of POS tagging was charged to a external tool (for a foreing language), like TreeTagger, right?
I will read about this new information in the manuals of chatscript and do some test.
|
|
|
|
|
Posted: Jul 25, 2016 |
[ # 17 ]
|
|
Moderator
Total posts: 2372
Joined: Jan 12, 2010
|
CS engine now enables integration with external pos-taggers to support languages other than english, which would never have been integrated into the engine itself
|
|
|
|
|
Posted: Jul 28, 2016 |
[ # 18 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Hi, Bruce,
I searched for external pos-taggers in the 6.7 manuals with no luck, could you please tell me what manual talks about it? could you please suggest some external pos-tagger? thanks advanced.
|
|
|
|
|
Posted: Jul 28, 2016 |
[ # 19 ]
|
|
Moderator
Total posts: 2372
Joined: Jan 12, 2010
|
ChatScript posparser in esoterica, dnd of manual
|
|
|
|
|
Posted: Aug 2, 2016 |
[ # 20 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Hi Bruce, I read the new page on the posparser manual
so it says that, in order to enable it you need to disable first
#DO_SPELLCHECK, #DO_PARSE, #DO_SUBSTITUTE_SYSTEM
I have some doubts about it…
1. the spanish spell checker will not be necessary anymore ($cs_language = spanish) the foreign spell checker could completly replace it?
2. I remember that I needed #DO_SUBSTITUTE_SYSTEM enabled, in order to match ” I am Jorge, from Canada” otherwise it would only match “I am Jorge from Canada”, can “TreeTagger” do this also???
Thanks advanced.
PD: could you please post a little example of how to communicate with it locally? how to use the ^popen, thanks advanced.
|
|
|
|
|
Posted: Aug 2, 2016 |
[ # 21 ]
|
|
Moderator
Total posts: 2372
Joined: Jan 12, 2010
|
A foreign spell checker “can” replace the existing spell checker. I dont have one to test against and some particular one might not output data in a friendly fashion, not known. The built-in spell checker has an awareness of spanish, but also assumes you have supplied a revised dictionary of words. For no other language is this true, so it needs to be disabled or it will damage the input.
2. Tree tagger cannot make substitutions. almost all substitions in the LIVEDATA files are language dependent. I have not looked up your specific substitution.
3. the GERMAN bot has script that does the example of local communication with external postagger
|
|
|
|
|
Posted: Aug 3, 2016 |
[ # 22 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
thanks bruce,
I use LIVEDATA/interjections.txt and subtitutes.txt
but none of those seem to handle the commas,
I know for sure that if I disable #DO_SUBSTITUTE_SYSTEM, CS will not handle commas, hope Tree tagger could do so, Ill test it.
Thanks again.
|
|
|
|
|
Posted: Aug 9, 2016 |
[ # 23 ]
|
|
Experienced member
Total posts: 39
Joined: May 16, 2016
|
I’d like to give you a feedback about my bot.
The POS Tagger TreeTagger does not have data for brazilian portuguese, just to european portuguese. So, after some research I found NLTK. NLTK has a trained base for brazilian portuguese.
Within NLTK I’m using the NLPNET, a package of NLP, but it does not return the POS-TAG with the canonical word. So, I’m using too the RSLP Stemmer to do this task. I did some changes in the NLPNET to mix it with RSLP Stemmer, in this way with just one function I can get the POS Tagging and the canonical word like TreeTagger do.
I did some tests with ChatScript and everything seems ok.
Thank’s Bruce and Eduardo Bedoya for your help.
|
|
|
|
|
Posted: Aug 12, 2016 |
[ # 24 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
very interesting,
congrats Oberdan
I keep stick with the old CS spanish way, guess I would try the foreign pos tagger in the future
I would check how all words containing foreign characters work on complex pattern rules
Oberdan, could you tell me if that foreign tagger can handle the commas eg.
having we a concept ~malnames [Oberdan Jorge] (malename is already taken by CS default)
I am Oberdan from Brazil
I am Oberdan, from Brazil
can it handle both “oberdan” and detect them as part of ~malnames???
Bruce, how much time should one person take to develop a 1500 lines bot?
Thanks Advanced.
|
|
|
|
|
Posted: Aug 12, 2016 |
[ # 25 ]
|
|
Moderator
Total posts: 2372
Joined: Jan 12, 2010
|
Impossible to answer that accurately. We spent around 5 person months to do a 16000 rule chatbot.
|
|
|
|
|
Posted: Aug 12, 2016 |
[ # 26 ]
|
|
Experienced member
Total posts: 39
Joined: May 16, 2016
|
Thank’s Eduardo!
I don’t know if I got exactly what you really meant. But I did the following test.
I created the concept ~malnames
concept: ~malnames(Oberdan Jorge)
Write the following rule:
t:() Who are you?
a:(_~malnames) Nice to meet you ‘_0.
And did these tests:
JUNIOR: Who are you?
john: > I am Oberdan from Brazil.
JUNIOR: Nice to meet you Oberdan.
JUNIOR: Who are you?
john: > I am Oberdan, from Brazil.
JUNIOR: Nice to meet you Oberdan.
For me it is OK. If I did something wrong, please, correct me and I will do others tests for you.
|
|
|
|
|
Posted: Aug 16, 2016 |
[ # 27 ]
|
|
Experienced member
Total posts: 39
Joined: May 16, 2016
|
Hello Guys!
I really didn’t understand what does the functions ^mark and ^unmark do (System functions manual). Someone could explain better and give some examples?
Thanks!
Oberdan Alves
|
|
|
|
|
Posted: Aug 22, 2016 |
[ # 28 ]
|
|
Senior member
Total posts: 179
Joined: Feb 11, 2015
|
Hi Oberdan,
sorry for the long delay,
yes the test is fine, you test it in portuguese using the external targer, right?
I haven’t use ^mark in my spanish chatbot,
but it has been posted before,
“A concept cannot be triggered by a pair of words directly.
But you CAN create a topic that will do that. It can be run early in your control script or be run as $cs_prepass which happens before $cs_controlmain
u: ( _~verb_infinitive _0?~verb_noobject ) ^mark(~dualconcept1 _0)
and other such rules”
so ^mark sets some kind of BIT to a POS ~verb, ~noun, etc
of course Bruce has the last word, he would correct me if I mess with something.
good luck
|
|
|
|
|
Posted: Aug 22, 2016 |
[ # 29 ]
|
|
Experienced member
Total posts: 39
Joined: May 16, 2016
|
Hi Eduardo!
You are right. I tested with external tagger in portuguese, though the tests that I did were in english language, there is no problem, because every word that are not recognized in the tagger it is classified as a noun (default).
I saw this post about ^mark function, but I just understood it, that this function can mark a word with a bitflag or a concept and I thought that I am wrong.
In the function manual is said this about function ^mark: “Marking and unmarking words and concepts is fundamental to the pattern matching mechanism”. So, I thought that I must use it.
Thank you Eduardo for trying clarify my doubt.
|
|
|
|
|
Posted: Aug 22, 2016 |
[ # 30 ]
|
|
Administrator
Total posts: 3111
Joined: Jun 14, 2010
|
I have to step in and let you folks know that I think you’re doing an excellent job here. As I don’t know either Portuguese or Spanish (well, maybe enough Spanish to get myself beat up!), I can’t contribute much to the overall conversation, but I wanted to toss in some encouragement and a small amount of praise for your efforts.
|
|
|
|