- ./ (Home directory: The workflow is DataFiles -> ReadData -> MultiplicativeAlgorithm)
- ReadData
- Archive
- MultiplicativeAlgorithm
...
...
@@ -32,9 +33,23 @@
- MultiplicativeAlgorithm/ (Contains the meat)
- CSNMF.ipynb does most of the work
## Importing into Jupyter Notebook
For example we can import everything in `EndChecker.py` to a notebook by adding a line as such:
`from util.EndChecker import *`
\ No newline at end of file
## Detailed Steps
1. Clone Repository
2. Go to https://databank.illinois.edu/datasets/IDB-4900670 and download these data files to DataFiles:
- nodes.csv
- links.csv
- travel_times_2011.csv
3. Run all cells in ReadData/ReadData.ipynb. This creates, in MultiplicativeAlgorithm/, :
- D_2011.csv
- full_link_ids.txt
- D_2011_full_links.csv
- D_trips.txt and D_traveltimes.txt
Of these, full_link_ids.txt and D_trips.txt are important for running CSNMF. D_traveltimes.txt can be used in the CSNMF algorithm as well, but
that is not currently functional due to bad hyperparameters.
4. Run all cells in MultiplicativeAlgorithm/cSNMF.ipynb.
- config.py contains global variables set to current dataset. Some need to be fixed, e.g. there are 260855 links and 8760 hours, while others can be modified by the user like the rank of the decomposition and if the user wants to run the seeded or randomized algorithm.
-\_\_init\_\_.py initializes logger
- The cell running cSNMF.factorize() uses global variables set in config.py, but can be overriden for experimentation.
- The W and H matrices are saved as txt files to MultiplicativeAlgorithm/.