Add command for dumping an Elasticsearch instance in the PaaS #188

Kuruyia · 2023-06-09T16:08:27Z

What does this PR do?

This adds the paas:elasticsearch:dump command that allows PaaS clients to easily dump the indexes and documents from an Elasticsearch instance of a PaaS application. This PR follows the addition of the application/storage controller in the PaaS console.

This command asks for the PaaS environment name and application ID (a project name can also be specified), and a directory where to dump the Elasticsearch data.
The directory structure is automatically created.

First, the indexes are dumped using the application/storage:getIndexes action, and are stored in the <dump_dir>/indexes.json file.

Then, the documents are dumped usind the application/storage:dumpDocuments action, and are stored in the <dump_dir>/documents.jsonl file. This file is a Newline delimited JSON file.
Because there could be a huge amount of documents totaling a big size for the response, the PaaS console only allows to dump at most a certain amount of documents at a time. Kourou makes multiple requests to the PaaS console until there are no more documents to dump to get them all. The amount of documents dumped can be set with the batch-size flag, but keep in mind that the PaaS console may also limit the maximum amount of documents that can be returned at once.

Finally, the dump session is cleanly terminated and the user is informed where to find their dump files.

Other changes

Types for Node.js were updated to Node 18.

rolljee

LGTM, but there are some modifications

rolljee · 2023-07-03T07:08:54Z

src/commands/paas/elasticsearch/dump.ts

+  sort: string[];
+};
+
+class PaasInit extends PaasKommand {


Classname seems odd

Yup, that's a copy-paste error! 😅

rolljee · 2023-07-03T07:09:41Z

src/commands/paas/elasticsearch/dump.ts

+    );
+
+    // Create the dump directory
+    await fs.mkdir(this.args.dumpDirectory, { recursive: true });


does option { recursive true } works as mkdirp ?

Yes, it works the same as mkdir -p

Aschen · 2023-07-05T12:31:33Z

Why using multiple files instead of a single file in JSONL format?
We use this format with a stream for the index/collection dump to avoid high memory usage.

I see some potentials problems with the actual design:

having multiple file can end up with literally thousands of small files which is not efficient
10 documents limit is really low and the overhead of a network request should be considered, for instance if you have 100000 documents then you will need 10 thousands requests. This limit should be configurable

Actually most of the issues you will encounter here were already discussed and solved for the index/collection export commands. You can have a look how it was done

Aschen · 2023-07-05T12:35:16Z

src/commands/paas/elasticsearch/dump.ts

+
+    // Finish the dump
+    try {
+      await this.finishDump(pitId);


What about the finishDump method if the dump crash in between two pages of documents, should it be called?

Good catch, yes it should totally be called.

Kuruyia · 2023-07-05T16:41:51Z

Why using multiple files instead of a single file in JSONL format?

I didn't know about that file format, I will look at that 😄

10 documents limit is really low and the overhead of a network request should be considered, for instance if you have 100000 documents then you will need 10 thousands requests. This limit should be configurable

You're right, I think I just forgot to make this limit configurable 😅

Actually most of the issues you will encounter here were already discussed and solved for the index/collection export commands. You can have a look how it was done

👍

…batch size

…ailed

Kuruyia added 2 commits June 9, 2023 17:43

feat: add command for dumping an Elasticsearch instance in the PaaS

935f210

docs: add README section for the PaaS Elasticsearch dump command

2692da9

Kuruyia added the changelog:new-features label Jun 9, 2023

Kuruyia requested review from alexandrebouthinon, rolljee and OlivierCavadenti June 9, 2023 16:08

Kuruyia self-assigned this Jun 9, 2023

rolljee requested changes Jul 3, 2023

View reviewed changes

Aschen reviewed Jul 5, 2023

View reviewed changes

rolljee linked an issue Nov 23, 2023 that may be closed by this pull request

Implements dump command for paas elastic search #197

Closed

Kuruyia added 7 commits January 3, 2024 10:30

chore: rename PaaS ES dump class

77bae1e

feat: add flag to the PaaS ES dump command to configure the document …

edd3440

…batch size

fix: attempt to cleanly finish dumping PaaS ES documents if dumping f…

16fad0f

…ailed

feat: store dumped PaaS ES documents in a single JSONL file

8b0cd2c

docs: update README

1eb4b2c

chore: merge 1-dev and fix conflicts

f3126d8

feat: check the batch size

0c64151

rolljee approved these changes Jan 8, 2024

View reviewed changes

rolljee merged commit a66bfe9 into 1-dev Jan 8, 2024
4 checks passed

rolljee deleted the feat/paas-dump-es branch January 8, 2024 09:54

rolljee mentioned this pull request Jan 16, 2024

Release 0.28.0 #203

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add command for dumping an Elasticsearch instance in the PaaS #188

Add command for dumping an Elasticsearch instance in the PaaS #188

Kuruyia commented Jun 9, 2023 •

edited

Loading

rolljee left a comment

rolljee Jul 3, 2023

Kuruyia Jul 3, 2023

rolljee Jul 3, 2023

Kuruyia Jul 3, 2023

Aschen commented Jul 5, 2023

Aschen Jul 5, 2023

Kuruyia Jul 5, 2023

Kuruyia commented Jul 5, 2023

Add command for dumping an Elasticsearch instance in the PaaS #188

Add command for dumping an Elasticsearch instance in the PaaS #188

Conversation

Kuruyia commented Jun 9, 2023 • edited Loading

What does this PR do?

Other changes

rolljee left a comment

Choose a reason for hiding this comment

rolljee Jul 3, 2023

Choose a reason for hiding this comment

Kuruyia Jul 3, 2023

Choose a reason for hiding this comment

rolljee Jul 3, 2023

Choose a reason for hiding this comment

Kuruyia Jul 3, 2023

Choose a reason for hiding this comment

Aschen commented Jul 5, 2023

Aschen Jul 5, 2023

Choose a reason for hiding this comment

Kuruyia Jul 5, 2023

Choose a reason for hiding this comment

Kuruyia commented Jul 5, 2023

Kuruyia commented Jun 9, 2023 •

edited

Loading