AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Using ConceptNet as a source for Chatscript facts
 
 

Has anyone ever used ConceptNet as a source for facts for a chatscript chatbot?

“You can get the entirety of ConceptNet 5 under the Creative Commons Attribution-ShareAlike 3.0 license. “

Information regarding ConceptNet’s licensing can be found here:
http://conceptnet5.media.mit.edu/

Data can be downloaded from here:
http://conceptnet5.media.mit.edu/downloads/current/

I downloaded the file:
conceptnet5_csv_5.3.tar.bz2
It contains 8 files: Part_00.txt,Part_01.txt..Part_07.txt

The structure seems a bit too complex for my needs. 
I just wanted simple triples that I could use to import as ChatScript facts.

In my next post I will share how I parsed the assertions from the raw data files and imported them as Chatscript facts.

 

 
  [ # 1 ]

I am using windows 8 so I used powershell to parse the files.
1. Right click on the lower right of the windows 8 screen to get the power menu.
2. select the command prompt option
3. type “powershell” without the quotes.
3. use the “cd” command to change directory to the directory holding the extracted conceptnet data files. 
Example: cd c:\users\alaric\documents\conceptnet5\
4. type and enter the following:
select-string ‘part_00.csv’ -pattern ‘^/a/\[/r/[^/]*/./c/en/chair/./c/en/[^/]*/\]’ | foreach {$_.matches.value} | out-file ‘assertions_part_00_chairsubjects.txt’

This will extract all assertions with “chair” as the subject into a file named “assertions_part_00_chairsubjects.txt”
If you want to copy and paste the command you will need to press ALT Space EP in order to paste in powershell. Crtl-V will not work.
The pattern uses Regex commands so google Regex, powershell, and select-string to find out more.

5. In order to format the data for chatscript facts type the following:
select-string ‘assertions_part_00_chairsubjects.txt’ -pattern ‘.*’ | foreach-object {$_.line -replace "/a/\[/r/"," " -replace "/,/c/en/"," " -replace "/\]","" -replace "[^0-9a-zA-Z -_]","_" } | %{$data = $_.split(); Write-Output "$($data[2]) $($data[1]) $($data[3])"} | out-file parsed_part_00_chairsubjects.txt

This will extract all assertions from “assertions_part_00_chairsubjects.txt” to “parsed_part_00_chairsubjects.txt”

6. Enter the following without the quotes to see the contents: “TYPE parsed_part_00_chairsubjects.txt”
(tip: use ctrl-c to stop a scrolling display)

Sample output:
      ...
chair HasA leg
chair HasProperty comfortable
chair HasProperty shape_to_conform_to_human_anatomy
chair IsA commortable_to_sit
chair IsA good_place_to_sit
chair IsA item_of_furniture
chair IsA sit
chair IsA sit_in
chair PartOf back_you_can_lean_against
      ...
chair UsedFor lead_meet
chair UsedFor make_music
chair UsedFor rest
chair UsedFor reupholstering
chair UsedFor sit_at_table
chair UsedFor sit_down
      ...

7. In Windows browse to your ..\Chatscript 4.7\RAWDATA\WORLDDATA folder create a new textfile called “ConceptNet.tbl”
8. Copy and paste your parsed data into the new file. (I use notepad)
9. Add the following 4 lines to the top of the file above the data:
concept: ~conceptnet
table: ~conceptnet(^subject ^verb ^object)
createfact(^subject ^verb ^object)
data:
10. Save the file. Start Chatscript. Login. Type “:Build 0” without the quotes. 
The fact count that shows when your bot starts will include the count of your newly added facts.

 

 
  [ # 2 ]


Warning: You can only have 800,000 facts in Chatscript. 
If you attempt (like I did) to load in all of the assertions from say the first part_00.txt file you will
get an error because there are over 900,000 assertions in the first file alone.
If you do want to try to convert all assertions at once instead of by subject like “chair” then use the following command in step 4:

select-string ‘part_00.csv’ -pattern ‘^/a/\[/r/[^/]*/./c/en/[^/]*/./c/en/[^/]*/\]’ | foreach {$_.matches.value} | out-file ‘assertions_part_00.txt’

You will then need to go through the file and select which facts you want to copy to tables in the RAWDATA folder for Chatscript and keep the total under 800,000 including the WordNet facts and other facts that are already included in the default setup.

In the next post I will share how I am attempting to use the facts in a chatbot and then discuss the issues I am having, mainly that loading 800,00 facts does not automatically grant common sense to my chatbot (not that I expected it to). smile

If I ask my chatbot “what is a chair used for?” it now knows from ConceptNet5 that:
      chair UsedFor lead_meet
chair UsedFor make_music
chair UsedFor rest
chair UsedFor reupholstering
chair UsedFor sit_at_table
chair UsedFor sit_down

What logic or pseudo logic can people come up with to turn the above “facts” into a dynamically generated English language response?  Unlike in AIML there are many functions in Chatscript for manipulating stings, looping, if…then…else, etc.  so dynamically generated responses based on these facts should be possible.

 

 
  [ # 3 ]

actually, you can configure how much fact space you have.

 

 
  [ # 4 ]

Thanks Bruce.  Is it done in code or are there some parameters/settings that can be set outside of the code?
It seems there are several things to consider: dictionary size, fact size, hash size and text size.  I’m not quite sure how they are related. 

 

 
  [ # 5 ]

So here is some a sample topic file for chatscript:

outputmacro: ^PLURAL(^ARG1)  ^POS(NOUN ^ARG1 PLURAL)

outputmacro: ^PRESENT_PARTICIPLE(^ARG1)  ^POS(NOUN ^ARG1 PRESENT_PARTICIPLE)

TOPIC: ~ConceptNet5 ( conceptnet is used for)
#!x This topic is for experimenting with concept net data

t: I have read the conceptnet5 ontology so I have common sense now.

u: ( what is a _* used for ) ^keep() ^repeat()
@1 = ^query(direct_sv ‘_0 UsedFor ?)
      $$NumberOfFacts = ^length(@1)
      if ( $$NumberOfFacts > 0 )
    {
_3 = ^pick(@1Object)
A '_0 [is][can be][could be][sometimes is][is known to be][is][is][is]
_4 = 0
[used for][good for][useful for][usually employed for][utilized for]
@2 = ^burst(_3)
$$phraselength = ^length(@2)
$$word = ^FIRST(@2subject)
$$count = 0
$$haveverbalready = "false"
loop($$phraselength)
{
$$count = $$count + 1

if ( $$word ? ~linkingverb ) { ^PRESENT_PARTICIPLE($$word) }
    else if ( $$word ? ~verbs AND $$haveverbalready == “false” ) {
^PRESENT_PARTICIPLE($$word)
$$haveverbalready = "true"
}
  else if ( $$word ? ~nounlist ) { ^PLURAL($$word) }
                else { $$word }

  If ( $$count != $$phraselength ) {$$word = ^FIRST(@2subject) }
            }
            . Original ConceptNet: _3
        }
else { I'm not sure what a '_0 is used for. }

 

 
  [ # 6 ]

The topic tries to handle “what is a x used for” type questions.  All sorts of different ways to ask the question and patterns could be created to ^reuse the same code.  I am interested in the problem of formatting the raw conceptnet triple into an English response.  This is a natural language generation problem.

Currently the pattern has logic that states that an ‘x’ 
choose one of: [is][can be][could be][sometimes is][is known to be][is][is][is]
choose one of: [used for][good for][useful for][usually employed for][utilized for]
then loop through each word of the response
change each linking verb to the present participle form
change the first verb in the phrase to the present participle form
change each noun to the plural form
leave all other word types unmodified

The output ends in a period plus the original “object” text from conceptnet for reference.

It is evident that these simple rules need work from the following sample output from chatbot Harry:

Hi, again.
  >What is a fire used for?
A fire is usually employed for making electricities. Original ConceptNet: make electricity
  >What is a chair used for?
A chair is useful for resting. Original ConceptNet: rest
  >what is a brush used for?
A brush can be utilized for brushing your hairs. Original ConceptNet: brush your hair
  >what is a dog used for?
A dog is utilized for protecting person. Original ConceptNet: protect person
  >what is a oven used for?
A oven is usually employed for baking broil and heats foods. Original ConceptNet: bake broil and heat food
  >what is a hammer used for?
A hammer sometimes is useful for knocking thing apart. Original ConceptNet: knock thing apart
  >what is a hammer used for?
A hammer could be useful for driving homes nails. Original ConceptNet: drive home nail
  >what is a sword used for?
A sword could be useful for parrying. Original ConceptNet: parry
  >what is a car used for?
A car is known to be usually employed for going for spinning. Original ConceptNet: go for spin
  >what is a horse used for?
A horse is usually employed for playing poloes. Original ConceptNet: play polo
  >what is a cup used for?
A cup is known to be useful for measuring liquids. Original ConceptNet: measure liquid
  >what is a mind used for?
A mind is good for thinking. Original ConceptNet: think
  >what is a bed used for
A bed is used for napping on. Original ConceptNet: nap on
  >what is a coat used for
A coat is used for keeping you warming in winters. Original ConceptNet: keep you warm in winter

Does anyone have any natural language generation tips or suggestions that they would like to share?  Any pseudo code suggestions for parsing the conceptnet data or modifying my basic rules?  Rules for when to make a noun plural or not seem to be desperately needed here.

 

 

 
  [ # 7 ]

This is interesting, I didn’t know ChatScript could use facts in this way.
I program in C++, and my language generation rules are underdeveloped, but I can share what I’ve figured out so far:

- Words that represent materials (metal, iron, sand, grass) don’t get “a” nor plural “-s”, though they may get “the”. You could either make a gimongous word list to compare with (concepts, I believe such lists are called in ChatScript), or perhaps look up in ConceptNet whether the word represents a material.
- The same goes for words that represent concepts and physics (time, winter, space, heat, wind, gravity).
- Morphology of compound verbs like “drive home” or “give up” (often combinations with up/down/in/out) should be applied to the first word, the remaining words are left as is.
- “thing” in e.g. “to build things” is a special word that is used as a filler and is almost always most natural in plural “things”.

I can not compensate for Conceptnet’s crude categorisation though. You will need to parse what you get from Conceptnet.

 

 
  [ # 8 ]

the command line takes parameter fact=, and you can do stuff like fact=8000000 if you want more facts (assuming your system will give it to you. Windows I have no problems. Linux I have problems with allocating a big chunk of memory all at once).  In general, CS makes a single allocation for the dictionary, string space, and facts. And it merely has default values that you currently get, but can be overridden on the command line.  If you have 8,000,000 facts, likely they will involve sentences or stuff that will then also require more dictionary entries. But if your facts merely are existing english words, then you wont. There is a limit on dictionary size due to bit representations involved.  MAX_DICTIONARY 0x000fffff

 

 
  [ # 9 ]

CS uses wordnet, which mean you can quickly define a concept set for the abstract nouns (materials, concepts, emotion, etc) to recognize a/the distinctions.  CS dictionary recognizes both compound verbs (where the dictionary can mark the syllable that conjugates) and phrasal verbs (and whether they can be separated or not)

 

 
  [ # 10 ]

Thanks Don for your feedback.  It was helpful. 

Bruce, I am going to have to look into the compound nouns and phrasal verbs further.  It sounds promising.

I have been exploring the concept of noncountable nouns.

After adding a ~noncountablenoun concept, I get the following improved responses:
Welcome back
  >what is fire used for?
A fire is used for making electricity. Original ConceptNet: make electricity
  >what is a brush used for?
A brush sometimes is usually employed for brushing your hair. Original ConceptNet: brush your hair
  >what is a dog used for?
A dog is known to be good for protecting people. Original ConceptNet: protect person
  >what is a oven used for?
A oven is known to be used for baking broiling and heating food. Original ConceptNet: bake broil and heat food
  >what is a hammer used for?
A hammer sometimes is good for knocking things apart. Original ConceptNet: knock thing apart
  >what is a hammer used for?
A hammer could be used for pounding nails in. Original ConceptNet: pound nail in
  >what is a car used for?
A car is known to be good for going for spins. Original ConceptNet: go for spin
  >what is a horse used for?
A horse sometimes is utilized for playing polo. Original ConceptNet: play polo

The following responses remain incorrectly formatted:
  >what is a hammer used for?
A hammer is useful for hitting pianoes strings. Original ConceptNet: hit piano string
  >what is a horse used for
A horse is used for rides through nights. Original ConceptNet: ride through night
  >what is a horse used for?
A horse could be usually employed for fulfilling needing for companionship. Original ConceptNet: fulfill need for companionship
  >what is a coat used for?
A coat is known to be useful for keeping you in winters. Original ConceptNet: keep you warm in winter
(Harry left out the word warm because it is a member ~bodily_states which is a member of the ~nounlist concept and there is no plural noun for it. 
I will need to fix my ^PLURAL macro to test for empty responses and return the original word. Also, it seems like the WordNet categories need some tweaking.)

While testing I found that ConceptNet has some interesting concepts:
  >what is a cup used for?
A cup sometimes is used for protecting your junk. Original ConceptNet: protect your junk
  >what is water used for
A water can be utilized for drowning. Original ConceptNet: drown
  >what is a knife used for
A knife is usually employed for performing autopsy. Original ConceptNet: perform autopsy

The improved Harry has strong opinions about religion but was suspiciously vague about chatbots:
  >what is religion used for?
A religion is used for keeping population in control. Original ConceptNet: keep population in control
  >what is a chatbot used for?
I’m not sure what a chatbot is used for.

If anyone has any grammar suggestions I will try implementing them and post the Chatscript “code”.

 

 
  [ # 11 ]

I decided to reparse the conceptnet5 files for just the facts with the relation “UsedFor”. 
This was due to the first conceptnet file being so large that I ran out of memory.

I removed the following line from files0.txt in the Chatscript4.7 root folder and
added the textfile filesconceptnet.txt with the following line:

# objects
RAWDATA/CONCEPTNETDATA/

I have placed my parsed conceptnet files in the “RAWDATA/CONCEPTNETDATA/” folder but the line can be changed
to point to the correct directory on your system.

I modified the top of each parsed conceptnet file slightly from my earlier sample by removing the ~ from the
table name and the definition for the ~conceptnet concept.  I do not need the concept and specifying it in
multiple files causes an error.

So the top of each file should look something like this:

table: conceptnet(^subject ^verb ^object)
createfact(^subject ^verb ^object)

data:
accommodation UsedFor house_while_travel
accordian_bag UsedFor air
accordian_bag UsedFor play_accordian
...

I had to resave each file in ANSI format using notepad as I received an error in building using the default unicode format.
I also haad to delete one row in the parsed extract for part 6 of the conceptnet5 files which contained a # sign: “apple UsedFor #” which slipped through the parsing.

After starting ChatScript type
:build conceptnet

to build the conceptnet files separately from the wordnet dictionary files which are built by typing :build 0

The following row appears upon startup now showing that the default bot Harry that ships with ChatScript
and whom sometimes gets a bad rap for not knowing very much now knows 48,416 facts about what things are used for.

Build1:  dict=29800 fact=48416 dtext=625940 stext=0 Oct11’14-09:30:02 conceptnet.txt

 

 
  [ # 12 ]

Actually, it is best to leave the command

# objects
RAWDATA/CONCEPTNETDATA/

in the files0.txt rather than in a separate build file.  It looks like creating the separate build file for filesconceptnet.txt replaces “Harry” so your second build file should be to :build Harry (or whatever the name of your chatbot bot is (or the build file for it such as :build 1 for files1.txt )

It was useful for temporarily getting a count of the conceptnet facts.

 

 
  [ # 13 ]

Thanks Alaric Schenck for sharing the script to convert conceptNet to chatscript facts format!
I really love the amount of information in conceptNet.

 

 
  [ # 14 ]

If you have more facts then can be stored in one bot, then use several. One to send the input to each of the fact bloated bots, and combine the responses from those with relevant information. If the information is divided intelligently then this might well be faster than one big bot.
I believe that all of the facts in the world will not make for a convincing bot, but a bot on a forum might.
Brucey bot.
Take all of the entries on this forum as a string of short scripts, questions answers re-joiners. divide them into newbie questions and specific how to’s, digest several related sites without making the distinction. Wright a bot that read the forum and repeats back versions of intelligent answers that have previously been given to similar questions. Add an expert system for the how to questions. To this useful bot add 2 or 3 unrelated fora; rose growing and ssdr microlights, these will only very occasional leak through onto this forum, if Brucey has hobbies then he is not a bot. If there is a slight digression that is fine, the forum will always return back t topic. Fora question and response has slightly different rules to casual chat but it as its rules.

 

 
  [ # 15 ]

this is a useful thread.

I have been experimenting with adding facts based on ConceptNet and it is decent for some types of questions.
Examples: what is something used for, what it is made of, what something is, similar to, part of….

But it creates big heavy memory baggage.  The ConceptNet fact footprint is large and it is only needed some for a fraction of the responses.  And it makes the size of CS very large and heavy.

Is there a way to have a primary bot query a secondary heavy specialized bot on the same machine?
This ConceptNet fact bot would be very memory heavy, and lightly used, shared across many smaller bots.

I see some other threads on this subject in this forum, but I am not sure of the best way to do this, or if there is a better route.

I guess I could execute tcpopen and pass the query to the other bot, and wait for the response, and but there is no context to the query on the incoming side… Maybe context could be part of the query. Or maybe it can read the user files.  I would have to think through the user file and what would be useful. 

Also, a CS query is single-threaded for the response end to end. Having multiple bots running in parallel would probably be a better route for time-critical or CPU intensive tasks. Fire requests to N bots at once and wait for all of them to come back or continue after some timeout.  I do not have this problem yet, but I imagine it will not get better.

Alternatively, I am thinking about just using the json ConceptNet api on a local machine, in place of memory facts. But this would be a different solution, with other implications. probably slowness.

PS: I have ConceptNet 5.62 running in AWS, if someone needs this for dev, PM me and I will share and add it to your aws dev security group.

ConceptNet , where to start: http://conceptnet5.media.mit.edu/
updated link for ConceptNet data:
https://s3.amazonaws.com/conceptnet/downloads/2019/edges/conceptnet-assertions-5.7.0.csv.gz
API link: https://github.com/commonsense/conceptnet5/wiki/API

 

 

 

 1 2 > 
1 of 2
 
  login or register to react