Personal tools
You are here: Home Users Andrew Collins Programing rss aggregator

RSS Aggregatifactor Machine

There's a script that runs every hour and grabs new web pages based on a set of rss feeds.

The Aggregatifactor was created to collect data for the Fuzzy Link Bot (FLB) visualization. It's fuzzy because the information here is often very bad. It's difficult for a computer to identify the content of a web page vs the ads and navigational fluff and it's difficult to identify people places and things. The Aggrigatifactor tries to do both and FLB visualizes the mostly awful results and lets you explore the resulting network. The magical thing is that sometimes enough bad information adds up to real knowledge. FLB should let you see through enough of the noise and fuzz to learn what the overall relationships are among the things you already know about. Hopefully, you'll learn something new- discovery rather than search.

  • Items retrieved in the last 4 hours shows the latest pages fetched.
  • List Feeds shows the feeds being used to fetch pages and the number of articles read from each.
  • rss_to_db.py is about 100 lines of Python that used to wake up via cron every hour. It ran ok for about 2 years, but was replaced in late 2009 with a much more sophisticated Groovy script.

Once the pages are fetched, they are sent to the CalaisOpen Calais Web API for entity extraction. The entities returned are stored in a MySQL database.

 

Document Actions