CS 425 Membership
Prerequisites
Have node and npm installed on each VM and clone the repo in each one.
Streamming Architecture Explained
Through the cli tool, you can read the dataset as a stream, sending it row by row. This goes to the master, which is the spout that receives the data. It then sends it to multiple bolts, following the topology and deciding where to go based off the state machine.
Every machine is running an express server, and can serve as a bolt or a spout. It chooses what to do based off the response sent. To get the data, run get-stream, which will stream to you all the results from the last run and stream it to you through the cli tool.
To catch errors, it waits for a response, and after a given amount of time sends the request again. If the master dies, it goes to the backup master, which is the second lowest VM number, which already has all the state information saved.
To write your own applications, define a topology which defines the functions for the maps, filters, and reduces. Define a state machine or resolver at the bottom, which is exported. To run it, just run “put-stream <task_name>”. If you want to do an aggregation, define a aggregation and it will be handled on the master. This is because the bolt’s don’t maintain state, but the master does and can handle the aggregations.
Cli Tool
The cli tool has a couple commands that allows it to interact with the introducers and machines to tell it to join, leave, or list that machine's membership lists.
To set it up, run the following commands
cd cli
npm install
npm run build
npm link
This links the mem
command to your command line, so you can run all the commands below
mem -h
Explains all the commands availible
mem join <VM_number>
Adds the specified VM to the group.
mem leave <VM_number>
Voluntarily removal of the specified VM from group.
mem list <VM_number>
Lists the membership list local the specified VM.
mem put <localfilename> <sdfsfilename>
Writes localfilename to the SDFS as sdfsfilename.
mem get <sdfsfilename> <localfilename>
Reads sdfsfilename from the SDFS to localfilename.
mem delete <sdfsfilename>
Removes sdfsfilename and all it’s versions from the SDFS.
mem ls <sdfsfilename>
Lists all the VMs that currently store sdfsfilename.
mem store <VM_number>
List all SDFS files stored at specified VM.
mem get-versions <sdfsfilename> <numversions> <localfilename>
Writes the names of the latest numversions of sdfsfilename to a localfilename.
mem put-stream <name>
Streams data to system and executes the application
mem get-stream
Streams data from sink
SDFS
To set it up, run the following commands
cd sfds
npm install
npm run start
We designed it so that the CLI tool interacts with the VMs using a REST API in which it sends and receives relevant information. Any time it reads, writes, or deletes files, it streams it over websockets.
Each one of the VM’s interacts checks the status of the other VM’s using the SWIM protocol, in which each of the machines is sending syn-ack messages to the same set of 4 machines consistently. If it doesn’t receive any “acks” for a set period of time, it tells the master that node has died, and then all the metadata is update and resend to all the nodes. Since all the nodes already have all the data, this makes master reelection easy since it can just pick any node that is alive. If a machine dies or leaves, it looks at where all the files are store and replicates appropriately. This metadata includes what files have been uploaded, which VMs have what files, what versions exist of each file, the machines up, and who the master is. This CLI tool just pings the machines in order, and the first one it sees up is elected to master.
In addition, we generated log files for when a file is written to a machine, downloaded by the user, and deleted. It also logs updates to metadata during replication or when machines leave, die, or join. This allows us to check that it maintains total order between events. We used our code from MP1 to help us debug functionality through the use of these log files. We used our code from MP1 to help us debug functionality through the use of these log files.
Getting logs
In the introducer and machine, instead of running npm run start
as the last command, run npm run start > log.txt
which will write all the output to a file. Otherwise, if it is open, you can the logs printed out to temrinal.