Skip to content
Snippets Groups Projects
Commit 6bb66d74 authored by gangwar2's avatar gangwar2
Browse files

Update README files

parent 53fd6007
No related branches found
No related tags found
No related merge requests found
File moved
...@@ -7,6 +7,10 @@ Setup the environment using `conda` as follows: ...@@ -7,6 +7,10 @@ Setup the environment using `conda` as follows:
conda env create -n expembtx -f environment.yml conda env create -n expembtx -f environment.yml
``` ```
## Datasets
The datasets are available [here](https://osf.io/9tdqg/?view_only=78c364b3c71f43b5b414deac81cf863b).
## Training and Evaluation ## Training and Evaluation
### Setup ### Setup
To run the training and evaluation pipeline in this repository, [eqnet](https://github.com/mast-group/eqnet/) is required. As it can not be installed as a dependency, clone this repository and add it to `PYTHONPATH`. To run the training and evaluation pipeline in this repository, [eqnet](https://github.com/mast-group/eqnet/) is required. As it can not be installed as a dependency, clone this repository and add it to `PYTHONPATH`.
...@@ -24,32 +28,32 @@ Example: ...@@ -24,32 +28,32 @@ Example:
python train_expembtx.py \ python train_expembtx.py \
--train_file <TRAIN_FILE> \ --train_file <TRAIN_FILE> \
--val_file <VAL_FILE> \ --val_file <VAL_FILE> \
--n_epochs 100 \ --n_epochs <N_EPOCHS> \
--norm_first True \ --norm_first True \
--optim Adam \ --optim Adam \
--weight_decay 0 \ --weight_decay 0 \
--lr 0.0001 \ --lr 0.0001 \
--train_batch_size 128 \ --train_batch_size <TRAIN_BATCH_SIZE> \
--run_name <RUN_NAME> \ --run_name <RUN_NAME> \
--val_batch_size 256 \ --val_batch_size <EVAL_BATCH_SIZE> \
--grad_clip_val 1 \ --grad_clip_val 1 \
--max_out_len 256 \ --max_out_len 256 \
--precision 16 \ --precision 16 \
--save_dir <OUT_DIR> \ --save_dir <OUT_DIR> \
--early_stopping 5 \ --early_stopping <EARLY_STOPPING> \
--n_min_epochs 10 \ --n_min_epochs <N_MIN_EPOCHS> \
--label_smoothing 0.1 \ --label_smoothing 0.1 \
--seed 42 --seed 42
``` ```
Add `--semvec` option to the above-mentioned command for the SemVec datasets. Add `--semvec` option to the above-mentioned command for the SemVec datasets. For the SemVec datasets, `<TRAIN_FILE>` is not the original training file provided with the SemVec datasets but a version in the input-output format.
For all supported options, use `python train_expembtx.py --help` or refer to [TrainingAgruments](expemb/args.py#TestingArguments). For all supported options, use `python train_expembtx.py --help` or refer to [TrainingAgruments](expemb/args.py#TrainingAgruments).
### Evaluation ### Evaluation
To evaluate a trained model, `test_expembtx.py` may be used. To evaluate a trained model, `test_expembtx.py` may be used. The options may vary depending if the model is trained on the Equivalent Expressions Dataset or the SemVec datasets.
Example: For the Equivalent Expressions Dataset, the following command may be used to test the model accuracy. On completion, it will generate a file containing the results inside `<SAVED_MODEL_DIR>` with `<RESULT_FILE_PREFIX>` as the file name prefix.
``` ```
python test_expembtx.py \ python test_expembtx.py \
--test_file <TEST_FILE> \ --test_file <TEST_FILE> \
...@@ -60,6 +64,16 @@ python test_expembtx.py \ ...@@ -60,6 +64,16 @@ python test_expembtx.py \
--batch_size 32 --batch_size 32
``` ```
For the SemVec datasets, the following command may be used.
```
python test_expembtx.py \
--test_file <TEST_FILE> \
--full_file <SEMVEC_FULL_DATASET> \
--ckpt_name best_max \
--save_dir <SAVED_MODEL_DIR> \
--semvec
```
For all supported options, use `python test_expembtx.py --help` or refer to [TestingArguments](expemb/args.py#TestingArguments). For all supported options, use `python test_expembtx.py --help` or refer to [TestingArguments](expemb/args.py#TestingArguments).
## Embedding Mathematics ## Embedding Mathematics
...@@ -91,5 +105,5 @@ For all supported options, use `python run_embmath.py --help` or refer to [Dista ...@@ -91,5 +105,5 @@ For all supported options, use `python run_embmath.py --help` or refer to [Dista
## Embedding Plots ## Embedding Plots
For embedding plots, refer to [embedding_plots.ipynb](notebooks/embedding_plots.ipynb). For embedding plots, refer to [embedding_plots.ipynb](notebooks/embedding_plots.ipynb).
## Wandb Integration ## Weights & Biases (wandb) Integration
This repository supports wandb integration. To start using it, login to wandb using `wandb login`. To disable wandb, set the environment variable `WANDB_MODE=offline`. This repository supports wandb integration. To start using it, login to wandb using `wandb login`. To disable wandb, set the environment variable `WANDB_MODE=offline`.
\ No newline at end of file
outs: outs:
- md5: dd9adab06b0b971ca76b127229ca272e.dir - md5: 8f77cd8265892df56a3ffd2a7a785b2b.dir
size: 1056242338 size: 1056244911
nfiles: 125 nfiles: 127
path: data path: data
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment