AI Zone Admin Forum Add your forum

NEWS: Chatbots.org survey on 3000 US and UK consumers shows it is time for chatbot integration in customer service!read more..

Keeping your chatbot ‘up to speed’: machine readable news
 
 

Hi all,

one thing I want to do with my chatbot (who got rejected by the site I just noticed by the way :( ) is to have it be ‘up to date’ on current events, just as real humans are.

For this it seems appropriate to ‘feed’ it some news headlines daily, so that it knows what’s going on in the Middle East, when the new elections are coming up, etc…

Soooo, which sources would be recommendable for this? My first thought is to use reddit.com, they have a public read API which is easy to obtain by just appending /.json to their URLs, like this:

http://www.reddit.com/r/worldnews/.json -> pretty sweet! However, this news is a bit too ‘messy’.

Googling ‘machine readable news’, I found out that this is not a novel topic, specifically algorithmic traders seem to be interested in it, to predict the stock market! Services offering this are usually paid though.

So, any other ideas and thoughts on this? Just write a web scraper? Or are there any well-known news sites that offer an easy ‘headline feed’? Maybe the most obvious way to go at this would be
to look for RSS feeds, though I’d have to think for a bit which ones will have the easiest headlines to parse.

i know Vince’s M.I.C.H. does something similar to this, but maybe others do as well. Curious for your thoughts!

 

 
  [ # 1 ]

The one that immediately springs to mind is wikinews. The data accessible via the RSS feed seems to be well formed too. Many of the news services that I’ve tried to access via RSS in the past are, as you say, quite messy, even cutting off some entries in the middle of a word.

http://en.wikinews.org/wiki/Main_Page

This one uses the ATOM protocol. Not sure which one is dominant these days, but when I was doing this myself a few years ago there were several versions of RSS that I had to be able to handle. I wrote an XSLT script to normalise them all which I attempted to attach to this post.

However file attachments seem to be broken again, so you can get it from my web server at the following URL:

http://asmith.id.au/source/newsfeeds.xsl

 

 
  [ # 2 ]

Using RSS is the way to go. You may decide in the future to get info from a variety of sources and most places support RSS. Also, I believe Yahoo has a number of tools to create RSS streams.

http://dir.yahoo.com/rss/dir/index.php

http://developer.yahoo.com/rss/

 

 
  [ # 3 ]

Ok, cool - RSS it will be then, I agree that that’s the most sensible way. Thanks for the tips!

 

 
  login or register to react