Snippets Groups Projects

authored

Name	Last commit	Last update
..
data
results
src
.gitignore
LICENSE
Makefile
README.md
requirements.txt
results.png
setup.py
survtrace.yml

Reproducibility of SurvTRACE: Transformers for Survival Analysis with Competing Events

Original Paper and Repository

This repository is based and inspired on the following work:

Zifeng Wang and Jimeng Sun. 2021. SurvTRACE: Transformers for Survival Analysis with Competing Events.

You can find the paper at https://arxiv.org/abs/2110.00855 and the repository at https://github.com/RyanWangZf/SurvTRACE.

How to configure the environment

Use our pre-saved conda environment!

conda env create --name survtrace --file=survtrace.yml
conda activate survtrace

then install as a package

pip install -e .

or try to install from the requirement.txt

pip3 install -r requirements.txt

How to get the data

For this project we use different datasets to run our experiments.

Study to Understand Prognoses Preferences Outcomes and Risks of Treatment (SUPPORT) (Knaus et al. 1995).
Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) (Curtis et al. 2012).
Surveillance, Epidemiology, and End Results Program (SEER).

pycox provides the SUPPORT and METABRIC datasets. Meanwhile, access to SEER has to be requested as the instructions in the following sub section.

How to get the SEER dataset.

Go to https://seer.cancer.gov/data/ to ask for data request from SEER following the guide there.
After complete the step one, we should have the seerstat software for data access. Open it and sign in with the username and password sent by seer.
Use seerstat to open the ./data/external/seer.sl file. Click on the 'excute' icon to request from the seer database. We will obtain a csv file.
Move the csv file to ./data/raw/seer_raw.csv, then run script to create the processed data, as
```
make seer
```
we will obtain the processed seer data named seer_processed.csv located in ./data/processed/.

Running the experiments

You can run all the steps for creating the results with the following command.

make run

Alternatively you could do each step separately:

Clean previous generated files, if they exist.
```
make clean
```
Process SEER dataset.
```
make seer
```
Generate datasets. By default it does 10 runs, but you can change the NUM_RUNS argument.
```
make datasets [-e NUM_RUNS=10]
```
Run experiments. By default it does 10 runs, but you can change the NUM_RUNS argument.
```
make experiments [-e NUM_RUNS=10]
```
Print results.
```
make results
```