Update README.md

9af53a9e · yager2 · 73b753e2 · 9af53a9e
Commit 9af53a9e authored 6 years ago by yager2
--- a/README.md
+++ b/README.md
 ## What is this project?
-**Project title:** Taffic Patterns in New York City
+**Project title:** Traffic Patterns in New York City

-**Team members:** Richard Sowers, Derrek Yager, Vaibhav Karve, Marzieh Abolhelm.
+**Team members:** Richard Sowers (@r-sowers), Daniel B. Work, Derrek Yager (@yager2), Vaibhav Karve (@vkarve2), Marzieh Abolhelm (@abolhel2).


 ## What does this code do?
 - It factorizes a matrix D into two smaller matrices W and H such that:
    - D, W, H all have non-negative entries
    - Column sum of W is 1 for each column
-    - H is sparse
+    - Columns of H are sparse
 - It applies matrix factorization to study traffic patterns of taxi-travel in New York City.
+- Explanation of the theory can be found in the paper Low_Rank_Manhattan_Traffic.pdf.


 ## Minimal requirements for running this code?
@@ -23,7 +24,7 @@


 ## How is this repository organized?
- ./             (Home directory: DataFiles -> ReadData -> MultiplicativeAlgorithm)
+- ./             (Home directory: The workflow is DataFiles -> ReadData -> MultiplicativeAlgorithm)
    - ReadData
    - Archive
    - MultiplicativeAlgorithm
@@ -32,9 +33,23 @@
 - MultiplicativeAlgorithm/      (Contains the meat)
    - CSNMF.ipynb does most of the work
    
-   
-   
-## Importing into Jupyter Notebook
-For example we can import everything in `EndChecker.py` to a notebook by adding a line as such:
-
-`from util.EndChecker import *`
\ No newline at end of file
+## Detailed Steps
+1. Clone Repository
+2. Go to https://databank.illinois.edu/datasets/IDB-4900670 and download these data files to DataFiles:
+    - nodes.csv
+    - links.csv
+    - travel_times_2011.csv
+3. Run all cells in ReadData/ReadData.ipynb. This creates, in MultiplicativeAlgorithm/, :
+    - D_2011.csv
+    - full_link_ids.txt
+    - D_2011_full_links.csv
+    - D_trips.txt and D_traveltimes.txt 
+
+Of these, full_link_ids.txt and D_trips.txt are important for running CSNMF.  D_traveltimes.txt can be used in the CSNMF algorithm as well, but 
+that is not currently functional due to bad hyperparameters. 
+
+4. Run all cells in MultiplicativeAlgorithm/cSNMF.ipynb. 
+    - config.py contains global variables set to current dataset. Some need to be fixed, e.g. there are 260855 links and 8760 hours, while others can be modified by the user like the rank of the decomposition and if the user wants to run the seeded or randomized algorithm.
+    - \_\_init\_\_.py initializes logger
+    - The cell running cSNMF.factorize() uses global variables set in config.py, but can be overriden for experimentation.
+    - The W and H matrices are saved as txt files to MultiplicativeAlgorithm/.
\ No newline at end of file