@@ -7,6 +7,10 @@ Setup the environment using `conda` as follows:
...
@@ -7,6 +7,10 @@ Setup the environment using `conda` as follows:
conda env create -n expembtx -f environment.yml
conda env create -n expembtx -f environment.yml
```
```
## Datasets
The datasets are available [here](https://osf.io/9tdqg/?view_only=78c364b3c71f43b5b414deac81cf863b).
## Training and Evaluation
## Training and Evaluation
### Setup
### Setup
To run the training and evaluation pipeline in this repository, [eqnet](https://github.com/mast-group/eqnet/) is required. As it can not be installed as a dependency, clone this repository and add it to `PYTHONPATH`.
To run the training and evaluation pipeline in this repository, [eqnet](https://github.com/mast-group/eqnet/) is required. As it can not be installed as a dependency, clone this repository and add it to `PYTHONPATH`.
...
@@ -24,32 +28,32 @@ Example:
...
@@ -24,32 +28,32 @@ Example:
python train_expembtx.py \
python train_expembtx.py \
--train_file <TRAIN_FILE> \
--train_file <TRAIN_FILE> \
--val_file <VAL_FILE> \
--val_file <VAL_FILE> \
--n_epochs 100 \
--n_epochs <N_EPOCHS> \
--norm_first True \
--norm_first True \
--optim Adam \
--optim Adam \
--weight_decay 0 \
--weight_decay 0 \
--lr 0.0001 \
--lr 0.0001 \
--train_batch_size 128 \
--train_batch_size <TRAIN_BATCH_SIZE> \
--run_name <RUN_NAME> \
--run_name <RUN_NAME> \
--val_batch_size 256 \
--val_batch_size <EVAL_BATCH_SIZE> \
--grad_clip_val 1 \
--grad_clip_val 1 \
--max_out_len 256 \
--max_out_len 256 \
--precision 16 \
--precision 16 \
--save_dir <OUT_DIR> \
--save_dir <OUT_DIR> \
--early_stopping 5 \
--early_stopping <EARLY_STOPPING> \
--n_min_epochs 10 \
--n_min_epochs <N_MIN_EPOCHS> \
--label_smoothing 0.1 \
--label_smoothing 0.1 \
--seed 42
--seed 42
```
```
Add `--semvec` option to the above-mentioned command for the SemVec datasets.
Add `--semvec` option to the above-mentioned command for the SemVec datasets. For the SemVec datasets, `<TRAIN_FILE>` is not the original training file provided with the SemVec datasets but a version in the input-output format.
For all supported options, use `python train_expembtx.py --help` or refer to [TrainingAgruments](expemb/args.py#TestingArguments).
For all supported options, use `python train_expembtx.py --help` or refer to [TrainingAgruments](expemb/args.py#TrainingAgruments).
### Evaluation
### Evaluation
To evaluate a trained model, `test_expembtx.py` may be used.
To evaluate a trained model, `test_expembtx.py` may be used. The options may vary depending if the model is trained on the Equivalent Expressions Dataset or the SemVec datasets.
Example:
For the Equivalent Expressions Dataset, the following command may be used to test the model accuracy. On completion, it will generate a file containing the results inside `<SAVED_MODEL_DIR>` with `<RESULT_FILE_PREFIX>` as the file name prefix.
```
```
python test_expembtx.py \
python test_expembtx.py \
--test_file <TEST_FILE> \
--test_file <TEST_FILE> \
...
@@ -60,6 +64,16 @@ python test_expembtx.py \
...
@@ -60,6 +64,16 @@ python test_expembtx.py \
--batch_size 32
--batch_size 32
```
```
For the SemVec datasets, the following command may be used.
```
python test_expembtx.py \
--test_file <TEST_FILE> \
--full_file <SEMVEC_FULL_DATASET> \
--ckpt_name best_max \
--save_dir <SAVED_MODEL_DIR> \
--semvec
```
For all supported options, use `python test_expembtx.py --help` or refer to [TestingArguments](expemb/args.py#TestingArguments).
For all supported options, use `python test_expembtx.py --help` or refer to [TestingArguments](expemb/args.py#TestingArguments).
## Embedding Mathematics
## Embedding Mathematics
...
@@ -91,5 +105,5 @@ For all supported options, use `python run_embmath.py --help` or refer to [Dista
...
@@ -91,5 +105,5 @@ For all supported options, use `python run_embmath.py --help` or refer to [Dista
## Embedding Plots
## Embedding Plots
For embedding plots, refer to [embedding_plots.ipynb](notebooks/embedding_plots.ipynb).
For embedding plots, refer to [embedding_plots.ipynb](notebooks/embedding_plots.ipynb).
## Wandb Integration
## Weights & Biases (wandb) Integration
This repository supports wandb integration. To start using it, login to wandb using `wandb login`. To disable wandb, set the environment variable `WANDB_MODE=offline`.
This repository supports wandb integration. To start using it, login to wandb using `wandb login`. To disable wandb, set the environment variable `WANDB_MODE=offline`.