Update README files

6bb66d74 · gangwar2 · 53fd6007 · 6bb66d74 · 6bb66d74 · 6bb66d74
Commit 6bb66d74 authored 2 years ago by gangwar2
--- a/expemb/LICENSE
+++ b/expemb/LICENSE
--- a/README.md
+++ b/README.md
@@ -7,6 +7,10 @@ Setup the environment using `conda` as follows:
 conda env create -n expembtx -f environment.yml
 ```
+## Datasets
+The datasets are available [here](https://osf.io/9tdqg/?view_only=78c364b3c71f43b5b414deac81cf863b).
 ## Training and Evaluation
 ### Setup
 To run the training and evaluation pipeline in this repository, [eqnet](https://github.com/mast-group/eqnet/) is required. As it can not be installed as a dependency, clone this repository and add it to `PYTHONPATH`.
@@ -24,32 +28,32 @@ Example:
 python train_expembtx.py \
    --train_file <TRAIN_FILE> \
    --val_file <VAL_FILE> \
-    --n_epochs 100 \
+    --n_epochs <N_EPOCHS> \
    --norm_first True \
    --optim Adam \
    --weight_decay 0 \
    --lr 0.0001 \
-    --train_batch_size 128 \
+    --train_batch_size <TRAIN_BATCH_SIZE> \
    --run_name <RUN_NAME> \
-    --val_batch_size 256 \
+    --val_batch_size <EVAL_BATCH_SIZE> \
    --grad_clip_val 1 \
    --max_out_len 256 \
    --precision 16 \
    --save_dir <OUT_DIR> \
-    --early_stopping 5 \
+    --early_stopping <EARLY_STOPPING> \
-    --n_min_epochs 10 \
+    --n_min_epochs <N_MIN_EPOCHS> \
    --label_smoothing 0.1 \
    --seed 42
 ```
-Add `--semvec` option to the above-mentioned command for the SemVec datasets.
+Add `--semvec` option to the above-mentioned command for the SemVec datasets. For the SemVec datasets, `<TRAIN_FILE>` is not the original training file provided with the SemVec datasets but a version in the input-output format.
-For all supported options, use `python train_expembtx.py --help` or refer to [TrainingAgruments](expemb/args.py#TestingArguments).
+For all supported options, use `python train_expembtx.py --help` or refer to [TrainingAgruments](expemb/args.py#TrainingAgruments).
 ### Evaluation
-To evaluate a trained model, `test_expembtx.py` may be used.
+To evaluate a trained model, `test_expembtx.py` may be used. The options may vary depending if the model is trained on the Equivalent Expressions Dataset or the SemVec datasets.
-Example:
+For the Equivalent Expressions Dataset, the following command may be used to test the model accuracy. On completion, it will generate a file containing the results inside `<SAVED_MODEL_DIR>` with `<RESULT_FILE_PREFIX>` as the file name prefix.
 ```
 python test_expembtx.py \
    --test_file <TEST_FILE> \
@@ -60,6 +64,16 @@ python test_expembtx.py \
    --batch_size 32
 ```
+For the SemVec datasets, the following command may be used.
+```
+python test_expembtx.py \
+    --test_file <TEST_FILE> \
+    --full_file <SEMVEC_FULL_DATASET> \
+    --ckpt_name best_max \
+    --save_dir <SAVED_MODEL_DIR> \
+    --semvec
+```
 For all supported options, use `python test_expembtx.py --help` or refer to [TestingArguments](expemb/args.py#TestingArguments).
 ## Embedding Mathematics
@@ -91,5 +105,5 @@ For all supported options, use `python run_embmath.py --help` or refer to [Dista
 ## Embedding Plots
 For embedding plots, refer to [embedding_plots.ipynb](notebooks/embedding_plots.ipynb).
-## Wandb Integration
+## Weights & Biases (wandb) Integration
 This repository supports wandb integration. To start using it, login to wandb using `wandb login`. To disable wandb, set the environment variable `WANDB_MODE=offline`.
\ No newline at end of file
--- a/data.dvc
+++ b/data.dvc
 outs:
- md5: dd9adab06b0b971ca76b127229ca272e.dir
+- md5: 8f77cd8265892df56a3ffd2a7a785b2b.dir
-  size: 1056242338
+  size: 1056244911
-  nfiles: 125
+  nfiles: 127
  path: data