Skip to content

Start realtime twitter stream ingestion into local AsterixDB

QIUSHI BAI edited this page Feb 20, 2019 · 2 revisions

Start realtime twitter stream ingestion into local AsterixDB

Note: the following guide will not work if you use sh to run the scripts, so, please use the commands provided in the guide.

Prerequisites

1. Get your own Twitter API access tokens

please refer to https://developer.twitter.com/en/docs/basics/authentication/guides/access-tokens.html to get your own twitter developer access keys and tokens.

2. Make sure your local AsterixDB is up

cd apache-asterixdb-0.9.5-SNAPSHOT/opt/local/bin/
./start-sample-cluster.sh

Step 1. Fill your Twitter API access tokens

Fill your Twitter API access tokens to the corresponding place in examples/twittermap/script/streamFeed.sh

-ck
Your Consumer Key
-cs
Your Consumer Secret
-tk
Your Access token
-ts
Your Acces Secret Token

Step 2. Create and start Feed in AsterixDB

Open http://localhost:19001/

Copy the following code to the Query box and Run it.

use twitter;
create feed TweetFeed with {
    "adapter-name" : "socket_adapter",
    "sockets" : "asterix_nc1:10001",
    "address-type" : "nc",
    "type-name" : "typeTweet",
    "format" : "adm",
    "upsert-feed" : "false"
};

connect feed TweetFeed to dataset ds_tweet;
start feed TweetFeed;

Step 3. Run the ingestion.

cd examples/twittermap

./script/streamFeed.sh

Now the realtime streaming tweets are being ingested to your local AsterixDB.

You can check the # of tweets ingested on this page: http://localhost:19002/admin/active

Additional information about streamFeed.sh

Spatial range of tweets

This line -loc -173.847656,17.644022,-65.390625,70.377854 indicates only ingesting tweets within this geographic bounding box which is roughly the U.S.

You can also change to other locations you're interested in. To get the coordinates of a certain area, just open Google Maps and click one point to see the latitude and longitude.

This -loc parameter uses this pattern [Southwestern Corner], [Northeastern Corner].

Keyword filter of tweets

You can indicate a list of keywords after the parameter -tr to filter only tweets containing the keywords you are interested in, e.g. -tr hurricane, storm, tornado.

To file only

If you want to get the raw tweets to gzipped JSON files only, you can add -fo parameter to the end of streamFile.sh.