Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command for dumping an Elasticsearch instance in the PaaS #188

Merged
merged 9 commits into from
Jan 8, 2024
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ All other arguments and options will be passed as-is to the `sdk:query` method.
* [`kourou instance:logs`](#kourou-instancelogs)
* [`kourou instance:spawn`](#kourou-instancespawn)
* [`kourou paas:deploy ENVIRONMENT APPLICATIONID IMAGE`](#kourou-paasdeploy-environment-applicationid-image)
* [`kourou paas:elasticsearch:dump ENVIRONMENT APPLICATIONID DUMPDIRECTORY`](#kourou-paaselasticsearchdump-environment-applicationid-dumpdirectory)
* [`kourou paas:init PROJECT`](#kourou-paasinit-project)
* [`kourou paas:login`](#kourou-paaslogin)
* [`kourou paas:logs ENVIRONMENT APPLICATION`](#kourou-paaslogs-environment-application)
Expand Down Expand Up @@ -1057,6 +1058,26 @@ OPTIONS

_See code: [src/commands/paas/deploy.ts](src/commands/paas/deploy.ts)_

## `kourou paas:elasticsearch:dump ENVIRONMENT APPLICATIONID DUMPDIRECTORY`

Dump data from the Elasticsearch of a PaaS application

```
USAGE
$ kourou paas:elasticsearch:dump ENVIRONMENT APPLICATIONID DUMPDIRECTORY

ARGUMENTS
ENVIRONMENT Project environment name
APPLICATIONID Application Identifier
DUMPDIRECTORY Directory where to store dump files

OPTIONS
--help show CLI help
--project=project Current PaaS project
```

_See code: [src/commands/paas/elasticsearch/dump.ts](src/commands/paas/elasticsearch/dump.ts)_

## `kourou paas:init PROJECT`

Initialize a PaaS project in current directory
Expand Down
185 changes: 185 additions & 0 deletions src/commands/paas/elasticsearch/dump.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
import path from "path";
import fs from "node:fs/promises";

import { flags } from "@oclif/command";

import { PaasKommand } from "../../../support/PaasKommand";

/**
* Results of the document dump action.
*/
type DocumentDump = {
pit_id: string;
hits: DocumentDumpHits;
};

type DocumentDumpHits = {
total: DocumentDumpHitsTotal;
hits: DocumentDumpHit[];
};

type DocumentDumpHitsTotal = {
value: number;
};

type DocumentDumpHit = {
sort: string[];
};

class PaasInit extends PaasKommand {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Classname seems odd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's a copy-paste error! 😅

public static description = "Dump data from the Elasticsearch of a PaaS application";

public static flags = {
help: flags.help(),
project: flags.string({
description: "Current PaaS project",
}),
};

static args = [
{
name: "environment",
description: "Project environment name",
required: true,
},
{
name: "applicationId",
description: "Application Identifier",
required: true,
},
{
name: "dumpDirectory",
description: "Directory where to store dump files",
required: true,
}
];

async runSafe() {
// Log in to the PaaS
const apiKey = await this.getCredentials();

await this.initPaasClient({ apiKey });

const user = await this.paas.auth.getCurrentUser();
this.logInfo(
`Logged as "${user._id}" for project "${this.flags.project || this.getProject()
}"`
);

// Create the dump directory
await fs.mkdir(this.args.dumpDirectory, { recursive: true });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does option { recursive true } works as mkdirp ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it works the same as mkdir -p

await fs.mkdir(path.join(this.args.dumpDirectory, "documents/"), { recursive: true });

// Dump the indexes
this.logInfo("Dumping Elasticsearch indexes...");

const indexesResult = await this.getAllIndexes();
await fs.writeFile(path.join(this.args.dumpDirectory, "indexes.json"), JSON.stringify(indexesResult));

this.logOk("Elasticsearch indexes dumped!");

// Dump all the documents
this.logInfo("Dumping Elasticsearch documents...");
await this.dumpAllDocuments();

this.logOk("Elasticsearch documents dumped!");
this.logOk(`The dumped files are available under "${path.resolve(this.args.dumpDirectory)}"`)
}

/**
* @description Get all indexes from the Elasticsearch of the PaaS application.
* @returns The indexes.
*/
private async getAllIndexes() {
const { result }: any = await this.paas.query({
controller: "application/storage",
action: "getIndexes",
environmentId: this.args.environment,
projectId: this.flags.project || this.getProject(),
applicationId: this.args.applicationId,
body: {},
});

return result;
}

/**
* @description Dump documents from the Elasticsearch of the PaaS application.
* @param pitId ID of the PIT opened on Elasticsearch.
* @param searchAfter Cursor for dumping documents after a certain one.
* @returns The dumped documents.
*/
private async dumpDocuments(pitId: string, searchAfter: string[]): Promise<DocumentDump> {
const { result }: any = await this.paas.query({
controller: "application/storage",
action: "dumpDocuments",
environmentId: this.args.environment,
projectId: this.flags.project || this.getProject(),
applicationId: this.args.applicationId,
body: {
pitId,
searchAfter: JSON.stringify(searchAfter),
},
});

return result;
}

private async dumpAllDocuments() {
// Prepare dumping all documents
let pitId = "";
let searchAfter: string[] = [];

let currentDocumentChunk = 0;
let dumpedDocuments = 0;
let totalDocuments = 0;

// Dump the first batch
let result = await this.dumpDocuments(pitId, searchAfter);
let hits = result.hits.hits;

while (hits.length > 0) {
// Update the PIT ID and the cursor for the next dump
pitId = result.pit_id;
searchAfter = hits[hits.length - 1].sort;

// Save the document
await fs.writeFile(path.join(this.args.dumpDirectory, "documents/", `${currentDocumentChunk++}.json`), JSON.stringify(hits));

dumpedDocuments += hits.length;
totalDocuments = result.hits.total.value;
this.logInfo(`Dumping Elasticsearch documents: ${Math.floor(dumpedDocuments / totalDocuments * 100)}% (${dumpedDocuments}/${totalDocuments})`);

// Dump the next batch
result = await this.dumpDocuments(pitId, searchAfter);
hits = result.hits.hits;
}

// Finish the dump
try {
await this.finishDump(pitId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the finishDump method if the dump crash in between two pages of documents, should it be called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, yes it should totally be called.

} catch (error: any) {
this.logInfo("Unable to cleanly finish the dump session:");
console.warn(error)
}
}

/**
* @description Finish the document dumping session.
* @param pitId ID of the PIT opened on Elasticsearch.
*/
private async finishDump(pitId: string) {
await this.paas.query({
controller: "application/storage",
action: "finishDumpDocuments",
environmentId: this.args.environment,
projectId: this.flags.project || this.getProject(),
applicationId: this.args.applicationId,
body: {
pitId,
},
});
}
}

export default PaasInit;
Loading