Skip to content
Snippets Groups Projects
Commit 9af53a9e authored by yager2's avatar yager2
Browse files

Update README.md

parent 73b753e2
No related branches found
No related tags found
No related merge requests found
## What is this project?
**Project title:** Taffic Patterns in New York City
**Project title:** Traffic Patterns in New York City
**Team members:** Richard Sowers, Derrek Yager, Vaibhav Karve, Marzieh Abolhelm.
**Team members:** Richard Sowers (@r-sowers), Daniel B. Work, Derrek Yager (@yager2), Vaibhav Karve (@vkarve2), Marzieh Abolhelm (@abolhel2).
## What does this code do?
- It factorizes a matrix D into two smaller matrices W and H such that:
- D, W, H all have non-negative entries
- Column sum of W is 1 for each column
- H is sparse
- Columns of H are sparse
- It applies matrix factorization to study traffic patterns of taxi-travel in New York City.
- Explanation of the theory can be found in the paper Low_Rank_Manhattan_Traffic.pdf.
## Minimal requirements for running this code?
......@@ -23,7 +24,7 @@
## How is this repository organized?
- ./ (Home directory: DataFiles -> ReadData -> MultiplicativeAlgorithm)
- ./ (Home directory: The workflow is DataFiles -> ReadData -> MultiplicativeAlgorithm)
- ReadData
- Archive
- MultiplicativeAlgorithm
......@@ -32,9 +33,23 @@
- MultiplicativeAlgorithm/ (Contains the meat)
- CSNMF.ipynb does most of the work
## Importing into Jupyter Notebook
For example we can import everything in `EndChecker.py` to a notebook by adding a line as such:
`from util.EndChecker import *`
\ No newline at end of file
## Detailed Steps
1. Clone Repository
2. Go to https://databank.illinois.edu/datasets/IDB-4900670 and download these data files to DataFiles:
- nodes.csv
- links.csv
- travel_times_2011.csv
3. Run all cells in ReadData/ReadData.ipynb. This creates, in MultiplicativeAlgorithm/, :
- D_2011.csv
- full_link_ids.txt
- D_2011_full_links.csv
- D_trips.txt and D_traveltimes.txt
Of these, full_link_ids.txt and D_trips.txt are important for running CSNMF. D_traveltimes.txt can be used in the CSNMF algorithm as well, but
that is not currently functional due to bad hyperparameters.
4. Run all cells in MultiplicativeAlgorithm/cSNMF.ipynb.
- config.py contains global variables set to current dataset. Some need to be fixed, e.g. there are 260855 links and 8760 hours, while others can be modified by the user like the rank of the decomposition and if the user wants to run the seeded or randomized algorithm.
- \_\_init\_\_.py initializes logger
- The cell running cSNMF.factorize() uses global variables set in config.py, but can be overriden for experimentation.
- The W and H matrices are saved as txt files to MultiplicativeAlgorithm/.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment