wp-archivebot

Subscribe to a wiki's RSS feed and archive external links

Latest on Hackage:0.1

This package is not currently in any snapshots. If you're interested in using it, we recommend adding it to Stackage Nightly. Doing so will make builds more reliable, and allow stackage.org to host generated Haddocks.

BSD-3-Clause licensed by Gwern
Maintained by [email protected]

A MediaWiki's RecentChanges or NewPages links to every new edit or article; this bot will poll the corresponding RSS feeds (easier and more reliable than parsing the HTML), follow the links to the new edit/article, and then use TagSoup to filter out every off-wiki link (eg. to http://cnn.com).

With this list of external links, the bot will then fire off requests to http://webcitation.org/, which will make a backup (similar to the Internet Archive, but on-demand).

Example: to archive links from every article in the English Wikipedia's RecentChanges:

wp-archivebot [email protected] 'http://en.wikipedia.org/w/index.php?title=Special:RecentChanges&feed=rss'