Hi guys,
I am currently writing a bot that has the task of extracting certain pieces of information from an input text. For example, given this input text:
I am interested in data regarding traffic in the format CSV from Germany
Here, the keyword traffic, the format CSV, and the spatial constraint Germany shall be extracted. This works quite well with individual patterns for all pieces of information mentioned. For example, the format is extracted by the following pattern. It detects words or phrases commonly used for indicating a format and extracts the following word as a variable:
u: ENTITY_FORMAT ( [type format] _*1 )
However, difficulties arise when I am trying to detect possible negation. Let’s say a user wants to exclude a certain country:
I am interested in data regarding traffic but not from Germany
Here, I tried to prepend an optional ~negation to the patterns like so:
u: SPATIAL ( { _~negation } [from spatial] _*1 )
This works well when spatial is the only negated info given, but fails as soon as a negative word occurs earlier in the input, for example in this sentence:
I am interested in data regarding traffic without the format CSV and not from Germany
Then I read that ChatScript ‘locks’ to the first occurrence of a word found in a pattern and continues matching from there. For example, the optional { ~negation } part would lock to the word without, not find from or spatial as the next word, and consequently fail, despite the combination appearing later in the sentence.
I hope it became clear what I am trying to do. If not I am happy to explain it some other way. I tried fiddling with resetting the match positions and splitting the patterns into one with negation and one without, but had no success.
Could you help me build a pattern that will detect pieces of information and the optional negation thereof?