Skip to content
Snippets Groups Projects

What is this project?

Project title: Traffic Patterns in New York City

Team members: Richard Sowers (@r-sowers), Daniel B. Work, Derrek Yager (@yager2), Vaibhav Karve (@vkarve2), Marzieh Abolhelm (@abolhel2).

What does this code do?

  • It factorizes a matrix D into two smaller matrices W and H such that:
    • D, W, H all have non-negative entries
    • Column sum of W is 1 for each column
    • Columns of H are sparse
  • It applies matrix factorization to study traffic patterns of taxi-travel in New York City.
  • Explanation of the theory can be found in the paper Low_Rank_Manhattan_Traffic.pdf.

Minimal requirements for running this code?

How is this repository organized?

  • ./ (Home directory: The workflow is DataFiles -> ReadData -> MultiplicativeAlgorithm)
    • ReadData
    • Archive
    • MultiplicativeAlgorithm
    • DataFiles
  • ReadData/ (Contains ReadData.ipynb for importing data into matrix format)
  • MultiplicativeAlgorithm/ (Contains the meat)
    • CSNMF.ipynb does most of the work

Detailed Steps

  1. Clone Repository

  2. Go to https://databank.illinois.edu/datasets/IDB-4900670 and download these data files to DataFiles:

    • nodes.csv
    • links.csv
    • travel_times_2011.csv
  3. Run all cells in ReadData/ReadData.ipynb. This creates, in MultiplicativeAlgorithm/, :

    • D_2011.csv
    • full_link_ids.txt
    • D_2011_full_links.csv
    • D_trips.txt and D_traveltimes.txt

    Of these, full_link_ids.txt and D_trips.txt are important for running CSNMF. D_traveltimes.txt can be used in the CSNMF algorithm as well but that is not currently functional due to bad hyperparameters.

  4. Run all cells in MultiplicativeAlgorithm/cSNMF.ipynb.

    • config.py contains global variables set to current dataset. Some variables are constant due to the dataset, e.g. there are 260855 links and 8760 hours, while other variables can be modified by the user, like the rank of the decomposition and if the user wants to run the seeded or randomized algorithm.
    • __init__.py initializes logger
    • The cell running cSNMF.factorize() uses global variables set in config.py, but can be overriden for experimentation.
    • The W and H matrices are saved as txt files to MultiplicativeAlgorithm/.
  5. If desired, run ExtremeEvents.ipynb and Visualizations.ipynb to analyze factorizations.

    • ExtremeEvents captures days and signatures that deviate from median behavior. It generates X.txt, a matrix of extreme events.
    • Visualizations maps and plots temporal and spatial trends, e.g. map of which links use signature i, plot of every (Monday, Tuesday, etc.)layered for each signature.