Skip to content
Snippets Groups Projects

Update README.md

Open kunyin2 requested to merge kunyin2-main-patch-76494 into main
1 file
+ 11
24
Compare changes
  • Side-by-side
  • Inline
+ 11
24
# National NLP Clinical Challenges (n2c2)
# 598 Reproducibility Project
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/61f8f04b341a482f95b9a38073575860)](https://app.codacy.com/app/michelole/n2c2?utm_source=github.com&utm_medium=referral&utm_content=bst-mug/n2c2&utm_campaign=badger)
## steps for using this code repo:
[![Build Status](https://travis-ci.org/bst-mug/n2c2.svg?branch=master)](https://travis-ci.org/bst-mug/n2c2)
* 1. You need to get the original/input dataset. One way to get is from https://portal.dbmi.hms.harvard.edu/projects/n2c2-nlp/ by submitting data access request.
[![Coverage Status](https://coveralls.io/repos/github/bst-mug/n2c2/badge.svg?branch=master)](https://coveralls.io/github/bst-mug/n2c2?branch=master)
* 2. Once you have the input dataset, you need to put the data under the folder: `/data/test` , `/data/train` separatelly.
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
* 3. Run the program `SentenceDumper.java` to generate the `sentences.txt` file.
* 4. Run the program `VocabularyDumper.java` to generate the `vocab.txt` file.
A repository containing support code and resources developed at the [Institute for Medical Informatics, Statistics and Documentation at the Medical University of Graz (Austria)](https://www.medunigraz.at/imi/en/) for participation at the [2018 n2c2 Shared-Task Track 1](https://n2c2.dbmi.hms.harvard.edu/) organized by the Department of Biomedical Informatics at the Harvard Medical School.
* 5. Install fasttext program to your working machine. You can follow this link to install fasttext: https://fasttext.cc/docs/en/supervised-tutorial.html
* 6. Copy `sentences.txt` file to the folder `scripts`, then run the script `train_embeddings.sh` which will generate `n2c2-fasttext` model.
## Citing
* 7. Copy `vocab.txt` file to the folder `scripts`. Download `BioWordVec_PubMed_MIMICIII_d200.bin` from https://github.com/ncbi-nlp/BioSentVec. Run the script `print_pre_trained_vectors.sh` to generate pre_trained embedding. Run the script `print_self_trained_vectors.sh` to generate self_trained embedding.
Then, copy both the embeddings to the related folder as the generated class file by java.
If you use data or code in your work, please cite our [JAMIA paper](https://academic.oup.com/jamia/advance-article/doi/10.1093/jamia/ocz149/5568257):
```
@article{oleynik2019evaluating,
title={Evaluating shallow and deep learning strategies for the 2018 n2c2 shared-task on clinical text classification},
author={Michel Oleynik and Amila Kugic and Zdenko Kasáč and Markus Kreuzthaler},
journal={Journal of the American Medical Informatics Association},
publisher={Oxford University Press},
year={2019}
}
```
Also of interest:
- Our [n2c2 presentation slides](https://www.medunigraz.at/imi/de/n2c2.Presentation_V6.pdf)
## Code Dependencies
## Code Dependencies
- JDK8+
- JDK8+
Loading