Updated readme

2d444bd9 · Yifan Zhao · f139cd53 · 2d444bd9
Commit 2d444bd9 authored 1 year ago by Yifan Zhao
--- a/README.md
+++ b/README.md
@@ -29,8 +29,10 @@ In short:

 ## Experiments

+### Basic Quantization
+
 Our goal is to use TVM compile a quantized ResNet50 down to a exectuable function;
-for brevity, it's not yet necessary to test the accuracy of the compiled model.
+for simplicity, it's not yet necessary to test the accuracy of the compiled model.

 - No usage of dataset is necessary at any point in these experiments;
  when TVM calls for inputs to the model for compilation / benchmarking purposes,
@@ -43,31 +45,37 @@ and the [TVM discussion board](https://discuss.tvm.apache.org) may have more adv

 There are roughly the following few major steps:

-1. Get an instance of a ResNet50 model implemented in PyTorch.
-   It's available in the `torchvision` package and as easy to get as a function call (remember to install the package first).
+1. Get an instance of a ResNet50 model implemented in PyTorch. It's available in the `torchvision` package.

-2. It may be a good idea to try using TVM on plain (un-quantized) DNN first.
-   Give TVM the network and a sample input the network takes, to compile the network into a function object that can be called from Python side and gives outputs.
+2. It's a good idea to try TVM on an un-quantized DNN first.
+   Give TVM the network and a sample input to the network,
+   and compile the network into a function object that can be called from Python side to produce DNN outputs.

   The TVM how-to guides has complete tutorials on how to do this step.
-   Pay attention to *which hardware (CPU? GPU?) the model is being compiled for* and how to specify it.
+   Pay attention to the compilation **target**:
+   which hardware (CPU? GPU?) the model is being compiled for, and understand how to specify it.
+   Compile for GPU, if you have one, or CPU otherwise.

 3. Now, quantize the model down to **int8** precision.
   TVM itself has utilities to quantize a DNN before compilation;
-   you can find how-to in the guides and forum.
+   you can find how-tos in the guides and forum.
+   Again, you should get a function object that can be called from Python side.
+
+   **Hint**: there is a namespace `tvm.relay.quantize` and everything you need is somewhere in there.

-   Do this for the GPU (if you have one), or CPU otherwise.
-   Use TVM utils to benchmark the inference time of the quantized model vs. the un-quantized model.
+4. Just for your own check -- how can you see the TVM code in the compiled module?
+   Did the quantization actually happen, for example, did the datatypes in the code change?

-   We're not (yet) looking to maximize the performance of the DNN with quantization,
-   but if there is no speedup, you should look into it and form your own guess.
+5. Use TVM's utility functions to benchmark the inference time of the quantized model vs. the un-quantized model.

-    - Hint: TVM may print the following only for the quantized case, or for both -- what does it mean?
-      > One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
+   In this task we will not try to maximize the performance of the quantized DNN,
+   but if there is no speedup, you should try to understand it and formulate a guess.
+   **Hint**: TVM may print the following when you compile the DNN -- what does it mean?
+   > One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.

-4. [Bonus] If you used `qconfig` for the previous part, look into how to change the quantization precision,
-   which is the number of bits of quantization (the $n$ in int-$n$),
-   by looking at the source code of `class QConfig` or search on forum.
+6. In your quantization setup, how did TVM know that you wanted to quantize to int8?
+   Look into that, and vary the number of bits of quantization (the $n$ in int-$n$).
+   Searching in forum and peeking the source code of the quantizer class will both help.

-   Go down `int8` -> `int4` -> `int2` -> `int1 (bool)`, then followed by non-power-of-2 bits (`int7`, `int6`...),
-   and investigate what is supported by TVM and what is failing when it doesn't work.
+   Try out `int8` -> `int4` -> `int2` -> `int1`; note which precisions work.
+   When it doesn't work, note exactly which part is failing.