Hello again. I am hoping to experiment with Big Data and CS and want to know if it’s possible. My goal is to use the Wikipedia categories to achieve a massive, prebuilt hierarchy of concepts. Wikipedia categories would be CS concepts, and pages would be non-concept members. To choose the brief if exotic example (they go together!) of Phoenicianism, the Wikipedia hierarchy
https://en.wikipedia.org/wiki/Category:Phoenicianism
would convert to the CS concept
concept: ~phoenicianism (~Kataeb_Party ~Lebanese_Front ~Phoenicianists Phoenicianism Al_Tanzim_Al_Amal Guardians_of_the_Cedars Kataeb_Party Kataeb_Regulatory_Forces Lebanese_Christian_Nationalism\:_The_Rise_and_Fall_of_an_Ethnic_Resistance Lebanese_Renewal_Party)
Then if someone mentioned the Kataeb Party your bot could ask you your feelings about Phoenicianism, or in the opposite direction, could go down the tree and ask what you think of Kataeb Party politicians. Simply alternating references to the category above and below some key part of each volley could make for powerfully relevant conversation, with a sufficiently rich dataset.
The Wikipedia category tree and associated pages runs into the tens of millions, all in a logical flow. I assume this would be way too much for local CS storage, so would need a database to host the concepts. Then one could create topics, rules and gambits traversing the concept tree.
I can foresee several tricky bits, but assuming I can normalise the text and eliminate redundancies, how would CS interact with a database with say 10 million concepts?