Skip to content
Snippets Groups Projects
Commit 2d444bd9 authored by Yifan Zhao's avatar Yifan Zhao
Browse files

Updated readme

parent f139cd53
No related branches found
No related tags found
No related merge requests found
......@@ -29,8 +29,10 @@ In short:
## Experiments
### Basic Quantization
Our goal is to use TVM compile a quantized ResNet50 down to a exectuable function;
for brevity, it's not yet necessary to test the accuracy of the compiled model.
for simplicity, it's not yet necessary to test the accuracy of the compiled model.
- No usage of dataset is necessary at any point in these experiments;
when TVM calls for inputs to the model for compilation / benchmarking purposes,
......@@ -43,31 +45,37 @@ and the [TVM discussion board](https://discuss.tvm.apache.org) may have more adv
There are roughly the following few major steps:
1. Get an instance of a ResNet50 model implemented in PyTorch.
It's available in the `torchvision` package and as easy to get as a function call (remember to install the package first).
1. Get an instance of a ResNet50 model implemented in PyTorch. It's available in the `torchvision` package.
2. It may be a good idea to try using TVM on plain (un-quantized) DNN first.
Give TVM the network and a sample input the network takes, to compile the network into a function object that can be called from Python side and gives outputs.
2. It's a good idea to try TVM on an un-quantized DNN first.
Give TVM the network and a sample input to the network,
and compile the network into a function object that can be called from Python side to produce DNN outputs.
The TVM how-to guides has complete tutorials on how to do this step.
Pay attention to *which hardware (CPU? GPU?) the model is being compiled for* and how to specify it.
Pay attention to the compilation **target**:
which hardware (CPU? GPU?) the model is being compiled for, and understand how to specify it.
Compile for GPU, if you have one, or CPU otherwise.
3. Now, quantize the model down to **int8** precision.
TVM itself has utilities to quantize a DNN before compilation;
you can find how-to in the guides and forum.
you can find how-tos in the guides and forum.
Again, you should get a function object that can be called from Python side.
**Hint**: there is a namespace `tvm.relay.quantize` and everything you need is somewhere in there.
Do this for the GPU (if you have one), or CPU otherwise.
Use TVM utils to benchmark the inference time of the quantized model vs. the un-quantized model.
4. Just for your own check -- how can you see the TVM code in the compiled module?
Did the quantization actually happen, for example, did the datatypes in the code change?
We're not (yet) looking to maximize the performance of the DNN with quantization,
but if there is no speedup, you should look into it and form your own guess.
5. Use TVM's utility functions to benchmark the inference time of the quantized model vs. the un-quantized model.
- Hint: TVM may print the following only for the quantized case, or for both -- what does it mean?
> One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
In this task we will not try to maximize the performance of the quantized DNN,
but if there is no speedup, you should try to understand it and formulate a guess.
**Hint**: TVM may print the following when you compile the DNN -- what does it mean?
> One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
4. [Bonus] If you used `qconfig` for the previous part, look into how to change the quantization precision,
which is the number of bits of quantization (the $n$ in int-$n$),
by looking at the source code of `class QConfig` or search on forum.
6. In your quantization setup, how did TVM know that you wanted to quantize to int8?
Look into that, and vary the number of bits of quantization (the $n$ in int-$n$).
Searching in forum and peeking the source code of the quantizer class will both help.
Go down `int8` -> `int4` -> `int2` -> `int1 (bool)`, then followed by non-power-of-2 bits (`int7`, `int6`...),
and investigate what is supported by TVM and what is failing when it doesn't work.
Try out `int8` -> `int4` -> `int2` -> `int1`; note which precisions work.
When it doesn't work, note exactly which part is failing.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment