Skip to content
Francois Berenger edited this page May 7, 2015 · 20 revisions

Semantics of the DAFT CLI commands

start (implicit command)

launch one MDS with a machines file, it will start all DSs for you via ssh then

ls

query the state of the system (list of files, list and placement of chunks)

put

add a local file into the system

rput (remote put)

put (local) then remote get

get

retrieve a file from the system

get = extract . fetch

First you fetch the file to the local DS, then you extract it

extract

Create a soft link to a file in the local data store. Soft linking is only OK for the Raw storage mode. Other storage modes will require more work (decompress or decrypt).

fetch

Download file chunks to the local datastore.

rget (remote get)

the CLI asks a remote DS to get

cat

Catenate several files into one. Should be almost instantaneous since manipulates only metadata in the MDS.

quit

stop the system

Architecture

command line interface (CLI) started on demand with the command to execute, MDS to connect to (host, port) and local DS to connect to (host, port)

metadata server (MDS), it is a daemon, only one

data server (DS), it is a deamon, one per machine in the machines file

Implementation

ls

ls interrogates the metadata server only.

put

put adds a file into the system (publish).

The local DS to which the CLI is connected cuts the file into chunks and add them to the local datastore. Then it notifies the MDS about this file and its chunks. The operation may have to be rolled back on the DS side in case the MDS answers there is already such a file (filenames are unique).

get

get = extract . fetch

fetch = get file into the local DS (or fail) but don't extract it

extract = extract a file from the local DS (default to soft link only, option for copy)

get retrieves a file. The file is made of several chunks that are distributed between DSs. The metadata server know where these chunks are.

As a consequence, the client must first ask the metadata server where the chunksare, and then download the chunks and assemble them. FBR: there is a use case which is inefficient in case all DSs ask for the them file. Maybe we should query the MDS before each chunk, or at least every few chunks. Maybe the MDS needs to be able to compute very fast delta of states. As soon as a file has one more chunk available somewhere, the state of this file (maybe an int) is increased. DS should be able to ask for a diff (if any) between the last state it knows about for a given file and the current one the MDS knows.

broadcast

send a file to several hosts

similar to several DSs asking for the same file at the same time, probably easier to implement efficiently since some very efficient algorithm can be used