Reproducibility of SurvTRACE: Transformers for Survival Analysis with Competing Events
Original Paper and Repository
This repository is based and inspired on the following work:
Zifeng Wang and Jimeng Sun. 2021. SurvTRACE: Transformers for Survival Analysis with Competing Events.
You can find the paper at https://arxiv.org/abs/2110.00855 and the repository at https://github.com/RyanWangZf/SurvTRACE.
How to configure the environment
Use our pre-saved conda environment!
conda env create --name survtrace --file=survtrace.yml
conda activate survtrace
then install as a package
pip install -e .
or try to install from the requirement.txt
pip3 install -r requirements.txt
How to get the data
For this project we use different datasets to run our experiments.
- Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT) (Knaus et al. 1995).
- Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) (Curtis et al. 2012).
- Surveillance, Epidemiology, and End Results Program (SEER).
pycox
provides the SUPPORT and METABRIC datasets. Meanwhile, access to SEER has to be requested as the instructions in the following sub section.
How to get the SEER dataset.
-
Go to https://seer.cancer.gov/data/ to ask for data request from SEER following the guide there.
-
After complete the step one, we should have the seerstat software for data access. Open it and sign in with the username and password sent by seer.
-
Use seerstat to open the ./data/external/seer.sl file. Click on the 'excute' icon to request from the seer database. We will obtain a csv file.
-
Move the csv file to ./data/raw/seer_raw.csv, then run script to create the processed data, as
make seer
we will obtain the processed seer data named seer_processed.csv located in ./data/processed/.
Running the experiments
You can run all the steps for creating the results with the following command.
make run
Alternatively you could do each step separately:
-
Clean previous generated files, if they exist.
make clean
-
Process SEER dataset.
make seer
-
Generate datasets. By default it does 10 runs, but you can change the NUM_RUNS argument.
make datasets [-e NUM_RUNS=10]
-
Run experiments. By default it does 10 runs, but you can change the NUM_RUNS argument.
make experiments [-e NUM_RUNS=10]
-
Print results.
make results