Skip to content

Elasticsearch Adapter

QIUSHI BAI edited this page Dec 28, 2020 · 21 revisions

Elasticsearch Adapter

This page includes instructions on how to use Elasticsearch and Cloudberry to setup a small instance of TwitterMap on a local machine.

Requirements:

  • System: Linux or MacOS

  • Python 3.0+ (Please configure to run python scripts with the command: python3)

  • Java 8 SDK and sbt

  • At least 2GB memory

1. Setup Elasticsearch

Step 1.1: Create a directory named quick-start under your home directory and enter quick-start directory:

mkdir ~/quick-start
cd ~/quick-start

Step 1.2: Download elasticsearch

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.7.2.tar.gz

Step 1.3: Uncompress this file

tar -xzf elasticsearch-6.7.2.tar.gz

Step 1.4: Enter elasticsearch-6.7.2/ directory

cd elasticsearch-6.7.2/

Step 1.5: Run elasticsearch

  • ./bin/elasticsearch

  • Or start on daemon mode: ./bin/elasticsearch -d -p pid

    • To shutdown elasticsearch on daemon mode, kill the process ID in the pid file

      pkill -F pid

  • Wait until you see the following messages:

[INFO ][o.e.n.Node               ] [7Z9-8gl] initialized
[INFO ][o.e.n.Node               ] [7Z9-8gl] starting ...
[INFO ][o.e.t.TransportService   ] [7Z9-8gl] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}INFO ][o.e.c.s.MasterService    ] [7Z9-8gl] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}
[INFO ][o.e.c.s.ClusterApplierService] [7Z9-8gl] new_master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}, reason: apply cluster state (from master [master {7Z9-8gl}{7Z9-8glaTTi6WWF-OFP1hw}{manJsZAtS1aj7RnQjM550Q}{127.0.0.1}{127.0.0.1:9300}{ml.machine_memory=8589934592, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
[INFO ][o.e.h.n.Netty4HttpServerTransport] [7Z9-8gl] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[INFO ][o.e.n.Node               ] [7Z9-8gl] started

Step 1.6: Check the health status of your elasticsearch cluster

Open a new terminal window

  • curl -X GET "localhost:9200/_cluster/health?pretty"

    The cluster health status has to be green or yellow. If your cluster's status is red, it indicates that the specific shard is not allocated in the cluster.

2. Install Cloudberry & TwitterMap

  • Clone the Cloudberry Github repository
cd ~/quick-start

git clone https://github.com/ISG-ICS/cloudberry.git

3. Download and ingest sample tweets into Elasticsearch

Step 3.1: Download sample tweets data file

cd ~/quick-start/cloudberry/examples/twittermap/script/

wget http://cloudberry.ics.uci.edu/img/sample.json.gz

Note: This file is sample.json.gz, different from the sample.adm.gz file in Quick Start tutorial

Step 3.2: Ingest sample tweets into elasticsearch cluster

cd ~/quick-start/cloudberry/examples/twittermap/

./script/ingestTweetToElasticCluster.sh

When the script completes, you should see something similar to the following messages:

[info] Showing high-level information about indices in Elasticsearch cluster AFTER ingesting data...

health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   twitter.ds_tweet fQiZx9wBQNKkqRB9fMw9Xw   4   0      73348            0       58mb           58mb


[success] Finish ingesting tweets

4. Configure Cloudberry

Edit file: ~/quick-start/cloudberry/cloudberry/neo/conf/application.conf

Step 4.1: Comment line 89 and 96, which are the AsterixDB configurations.

  • line 89: asterixdb.url = "http://localhost:19002/query/service"

  • line 96: asterixdb.lang = SQLPP

Step 4.2: Uncomment line 93 and 101, which are the Elasticsearch configurations.

  • line 93: #elasticsearch.url = "http://localhost:9200"

  • line 101: #asterixdb.lang = elasticsearch

Step 4.3: Update line 86 and line 87. Tune DRUM parameters to be more friendly to ElasticSearch.

  • line 86: berry.firstquery.gap = "60 days"

  • line 87: berry.query.gap = "180 days"

5. Configure Twittermap

Edit file: ~/quick-start/cloudberry/examples/twittermap/web/conf/application.conf

Step 5.1: Update line 94 and line 96. Configure the start date and end date of temporal queries.

  • line 94: startDate = "2019-01-04T18:29:23.000"

  • line 96: endDate = "2019-11-10T09:00:23.000"

6. Now you can start Cloudberry & Twittermap as in Quick Start!

To start Cloudberry & Twittermap See Step 2.2 and Step 2.4 in Quick Start.

Clone this wiki locally