I have a few questions about how punctuation is handled when your scripts put strings into the input queue.
I am trying to inject a string into the input queue and then compare it to a variable which contains the same string. This does not work! The string gets spaces put before all of the commas. This make it impossible to compare the %originalsentence to the variable and determine that they are the same.
In the following code, the output from the :trace is in the comments below line(s) that generate the trace info.
My questions:
1) is there a way to prevent the space from getting added before the commas?
2) Why does ^tokenize(WORD %originalsentence) get made into 11 facts (which I expected) and ^tokenize(WORD $$_newText) get made in to only one fact?
3) why does “ice cream” not ever show up as “ice_cream”? The documentation for ^original() specifically uses ice cream as an example. This code does not seem to follow the doc in that regard.
topic: ~test-original keep repeat ( "test original" )
u: STEP-ONE ( one )
# with trace simple fact input ~test-original
^log(\n your original input was %originalinput ) # your original input was test original oneToken
# TokenControl: DO_SUBSTITUTE_SYSTEM DO_NUMBER_MERGE DO_PROPERNAME_MERGE DO_DATE_MERGE DO_SPELLCHECK DO_INTERJECTION_SPLITTING DO_PARSE
$$_newText = "I do not eat cake, ice cream, or donuts"
^input($$_newText)
^next(INPUT)
# Original User Input: `` I do not eat cake , ice cream , or donuts `
# Tokenized into: I do not eat cake , ice cream , or donuts
# Actual used input: I do not eat cake , ice cream , or donuts
^log(\n next input done \n)
# next input done
^log(\n originalinput is %originalinput \n)
# originalinput is test original one
^log(\n originalsentence is %originalsentence \n )
# originalsentence is I do not eat cake , ice cream , or donuts
@18 = ^tokenize( WORD %originalsentence \n)
^log(\n tokenized original sentence: length = ^length(@18) \n )
##<<
create ( I ^tokenize ^tokenize x1000010 ) Created 215754
....,...create ( do ^tokenize ^tokenize x1000010 ) Created 215755
....,...create ( not ^tokenize ^tokenize x1000010 ) Created 215756
....,...create ( eat ^tokenize ^tokenize x1000010 ) Created 215757
....,...create ( cake ^tokenize ^tokenize x1000010 ) Created 215758
....,...create ( , ^tokenize ^tokenize x1000010 ) Created 215759
....,...create ( ice ^tokenize ^tokenize x1000010 ) Created 215760
....,...create ( cream ^tokenize ^tokenize x1000010 ) Created 215761
....,...create ( , ^tokenize ^tokenize x1000010 ) Created 215762
....,...create ( or ^tokenize ^tokenize x1000010 ) Created 215763
....,...create ( donuts ^tokenize ^tokenize x1000010 ) Created 215764
....,...
tokenized original sentence: length = 11
##>>
@15 = ^tokenize( WORD $$_newText )
^log(\n tokenized newText: length = ^length(@15) \n)
##<<
create ( "I do not eat cake, ice cream, or donuts" ^tokenize ^tokenize x1000010 ) Created 215765
....,...
tokenized newText: length = 1
##>>
@19 = ^uniquefacts(@18 @15)
^log(\n created unique facts: length = ^length(@19) \n)
##<<
created unique facts: length = 11
....,.Result: NOPROBLEM Topic: ~test-original
##>>
if( ^length(@19) == 0){
\n they match!! \n
}else{
\n not sure what this means. \n
# this is the output
}
done!