- May 06, 2018
-
-
rkr2 authored
-
rkr2 authored
-
Michael J Kresca authored
-
Michael J Kresca authored
-
- May 05, 2018
-
-
Michael J Kresca authored
-
Michael J Kresca authored
-
- May 04, 2018
-
-
Michael J Kresca authored
-
Michael J Kresca authored
-
Michael J Kresca authored
1) Added back my cluster configuration work I started... got blow away due to .doc not merging well.
-
Nischol Antao authored
-
Nischol Antao authored
-
- May 03, 2018
-
-
Michael J Kresca authored
1) Started working on the cluster configuratoin section of the report. Not done yet, but have a start.
-
- May 02, 2018
-
-
Nischol Antao authored
-
- May 01, 2018
-
-
Nischol Antao authored
-
Nischol Antao authored
-
- Apr 29, 2018
-
-
Nischol Antao authored
Did performance measurement for question 4. Created ipython notebook for question 4 as well. Rob can edit the markdown for this.
-
- Apr 28, 2018
-
-
Nischol Antao authored
-
- Apr 24, 2018
-
-
Nischol Antao authored
1) Ran Code for questions 1-3 on pyspark in cluster mode, with multiple nodes. Measured and captured the difference in performance between running it on a single EC-2 instance, and running it on a cluster. 2) Added some screenshots for the final report, to show the cluster configuration. 3) Added ipython notebooks for performance metrics in local mode. 4) Added json files for zeppelin notebooks 5) Created new source files for Code pyspark code run in zeppelin notebooks, in cluster mode 6) Added test results for question 3 when using hive to calculate the median data. 7) Added R code from Rob for question 3 local exploration 8) Renamed some of the local exploration files
-
- Apr 22, 2018
-
-
Nischol Antao authored
-
- Apr 21, 2018
-
-
Nischol Antao authored
Added Guidelines for the final report, and updated the report to include the new sections we should include.
-
Nischol Antao authored
Finished spark implementation for question 1. Need to do some visualizations for this, and write the final report for it. Added iPython notebooks that detail the code for question 1. Formalized local pandas source, and remote pyspark source, and did a performance comparison.
-
- Apr 17, 2018
-
-
Nischol Antao authored
-