RSS ripper

This little library uses streams to fetch data from RSS endpoints and saves the transformed data to a levelDB database for further use (did anyone said ML?)

Setup

cd repository
nvm use
npm i -g yarn
yarn

Usage

RssRipper is an exposed class that receives an optional "transformer" parameter and has a single method called rip.

const defaultRipper = new RssRipper();
const myFeedStream = defaultRipper.rip('http://my-feed-url.com');
myFeedStream.subscribe(([url, id, item]) => {
  // At this point the library has successfully retrieved an item and stored it to levelDB
  
  // We receive the ripped url, the id that has been 
  // saved to levelDB and the item itself
  console.log(`Saved item with id ${id} from ${url}: ${item}`);
})

Transformers

A transformer is a simple method that can be plugged when we initialize the RssRipper that extracts the data as we prefer.

The default transformer is called pass-through:

export default (item, index) => Rx.Observable.of([index, item])

It returns an 2-dimensional array that will be stored in level db as key and value respectively.

You can extract and transform data from item as you please, leveraging the power of Rx.js to manipulate the stream (e.g. doing an AJAX call per every item, or buffering results and doing a batch call every N items...)

Examples

I've bundled this with a small example that reads paged ATOM rss feeds from the wordpress blog, firing a call every 500 ms to fetch 100 pages in total.

You can run it out-of-the-box using yarn run examples:wordpress.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

RSS ripper

Setup

Usage

Transformers

Examples

Files

README.md

Latest commit

History

README.md

File metadata and controls

RSS ripper

Setup

Usage

Transformers

Examples