Hi Hans
As you invited me to this post, I’ve read out the surface of “A Practical Semantic Representation For Natural Language Parsing” by Myroslava O. Dzikovska, found it challenging and we have many ideas in common (1) and observed a good approach in solving a theoretical framework to deal with those interpretations. but this is not what I mean and point out. I did not see any implementation of this, ¿did you see any?
(1) Let me be more specific, making a clarification: Similar ideas pops up in different minds fed with the similar-domain information, this may happen because we all (humans) have similar mind-diagram-charting and similar information extraction & relational capabilities, of course: DNA-powered)
What I can see is that there are different layers of ‘interpretation and understanding” ranging from simple perceptions (our senses) merged somehow towards buckets of sounds (phonemes) arranged into segments named words (mostly segmented by air-taking pauses named spaces), which are indeed very simple and ambiguous descriptors of the former (having many of the so-called senses), then aided with some grammar (the power added by linear time-reading short range positioning permutations in the text), and in some languages a little more funny twisting (declinations/derivations) to add some social-conversational+deictic+time information placed inside the sound-pack-form (aka inflected/declined word), towards other more obscure ‘semantics’ and ‘deducted’ relations which happens only (and not every other time) in the mind of a context-aware-listener, with medium intelligence (aka: the ability of doing such analytic and internal constructs). Then it has been understood!
Wow! this was a long sentence! but I think this may express in one tied-together-boring-giant-pack the whole idea on the process of language parsing+shallow understanding (first-analysis), naming the different parts of it and its packing and boundaries!
So what lies inside? 8)
What I can see here, is a complex structure with many ‘folding’ like DNA, primary structure-secondary, and so on until N-ary re-folding. each fold is not unique and adds information complexity and much more information, But the real information after the folding, lies among all the structure and is very robust, being in many ways independent of each fold or even word selection, but dependent on the whole set when analyzed in context. This may explain the ambiguity as a necessary part for the sake of the robustness of our language as a ‘idea’ carrier from individual to individual.
Also, the internal mind processes are for sure not the same in each of us, but the ‘surface’ is common (language, manners, gestures, social interaction, attitude, etc.). Learning from this analogy, we can conclude that we must search more deeply on the relations lying inside the whole text (as we got this as an input for a chatterbot, here).
MODELING THIS
I think that those relations may not be dyadic or ternary like we use to see the math-logic relations, (our cognitive common background: math, graphs, vector spaces, numbers, functions, etc.) They may be radically different, may be not as simple to describe by graphs and even complex math relations or theorems. So what is the approach to thinking this? - May be ‘soft’ bindings, boundaries done with soft-algorithms, indeed the only kind of things we do have is math, cpu’s and logic-procedural programming, (did’t forget a little bit of prolog also..) at least for now!
PLEASE! A LITTLE SOFTER
As I surfed those thoughts many times, belief me! so, I’ve decided to go deeper and create some mechanisms to let the systems be ‘softer’ and create ‘similarity’ parameters, verisimilitude parameters and many other dimensions, far from being linear, they are blurred numeric soft-representations of perceptual magnitudes, yes. perceptual, or.. give me a number with 0.1% of precision to express your sincerity! or measure the verifiability of a tax-procedure! (all of those things, happening in everyday life are ‘computed’ by our minds, decisions are taken based on this non-numerical facts and measures, and we (humans) don’t make math for this.. we do our own math. So this kind of mat must be discovered, unleashed or modeled using what we have so far.. numbers and software. May be in the future we will have a self-calculating-mass like clay, to model this things, but we haven’t this now!.
A SHORT & SOFT STORY
One of this developments I’ve made based on this ideas, was a similarity algorithm to measure (yes measure with a real-numeric parameter) the ‘perceptual similarity’ among 2 written words (say written because the real perception usually is acoustic-phonetic related). so I had to figure out how to ‘say’ this into the cpu and ‘hear’ it out, and figure out what the heck should be the number to give out for any two words, so I’ve got to think. And thought over our perception, on how we ‘feel’ those things like similarity and differences, and then I created a mathematical representation (I had no choice but this tools) off the entities which make us feel the words similar or different, then glued them altogether with some math-tricks (learned from engineering) and voila! we’ve got a number.
So what!
-only another number?
Yes!, a number !
- Measuring what?
Measuring this: “how many different sounding sounded-letters one word feels apart from another for a medium human-being.” (surely not a deaf one)
Seeking on literature we found Soundex and Metaphone, a kind of sonic-indexing thing, limited to English and very very obscure and of course, no numeric output!
And we thought: - This is useless unless we can prove it, but ¿How we can prove the success this kind of algorithm?
We sought for friends and colleges: si, along with a Dr. & researcher of the University, we designed an experiment with 100 humans, presented a list of unrelated non-words (nonexistent words with no meaning at all but pronounceable, to avoid interference from the meaning) and talked them to ‘order’ them in a “ascending” sound-similarity way. Then we got the math to rescue.. The result?
Voila! 85% agreed with the algorithm! (much more than many of the inter-human perception alignment experiments)
We published this on the X Congress of Neurophysiology in Latin America (SLAN 2007)
And what Applications?
Find the most similar word based on a (presumably misspelled) written text, inside a given context.
Find some books whose author’s name is based on ‘how I remember what my friend told me yesterday by telephone” - also it sounded Russian or German to me.. I simply don’t know! (those was impossible tasks until the invention of this algorithm) Now it is not!
This short story on the genesis of an algorithm dealing with human perception and cognitive stuff, should be the inspiration for many to make bigger innovations on this area.
CONCLUSION
I simply wanted to show the way to create ‘odd’ things. like this one.
Andres