Skip to content

Latest commit

 

History

History
53 lines (36 loc) · 1.83 KB

README.md

File metadata and controls

53 lines (36 loc) · 1.83 KB

RSS ripper

This little library uses streams to fetch data from RSS endpoints and saves the transformed data to a levelDB database for further use (did anyone said ML?)

Setup

cd repository
nvm use
npm i -g yarn
yarn

Usage

RssRipper is an exposed class that receives an optional "transformer" parameter and has a single method called rip.

const defaultRipper = new RssRipper();
const myFeedStream = defaultRipper.rip('http://my-feed-url.com');
myFeedStream.subscribe(([url, id, item]) => {
  // At this point the library has successfully retrieved an item and stored it to levelDB
  
  // We receive the ripped url, the id that has been 
  // saved to levelDB and the item itself
  console.log(`Saved item with id ${id} from ${url}: ${item}`);
})

Transformers

A transformer is a simple method that can be plugged when we initialize the RssRipper that extracts the data as we prefer.

The default transformer is called pass-through:

export default (item, index) => Rx.Observable.of([index, item])

It returns an 2-dimensional array that will be stored in level db as key and value respectively.

You can extract and transform data from item as you please, leveraging the power of Rx.js to manipulate the stream (e.g. doing an AJAX call per every item, or buffering results and doing a batch call every N items...)

Examples

I've bundled this with a small example that reads paged ATOM rss feeds from the wordpress blog, firing a call every 500 ms to fetch 100 pages in total.

You can run it out-of-the-box using yarn run examples:wordpress.