-
- Downloads
[BYOC] CUTLASS integration (#9261)
* byoc cutlass * add cmake and fix build * test worked but accuracy is bad * fixed argument printing properly * moving files * moving contents of cutlass_profiler into python/tvm/contrib/cutlass * run black * remove irrelavant codegen code * clang format * tried replacing sm 75 with 80, didn't help improve accuracy * remove irrelavant code from generator * tried dense + bias fusion but generated cu file does not compile * dense + bias worked after adding Leyuan's patch, bias + relu worked too * tried adding sm80 generator but accuracy is still off * remove GemmUniversal generator * cleanup partition and build * moved partition, profile and build function out of test * turned out the result match's TVM non-cutlass result. Numpy fp16 matmul is busted? * clean up test * LinearCombination can be reused for bias only epilogue * remove unsupported epilogues like gelu * removing deadcode * unify gemm templates for with or without beta scaling * supported gelu but accuracy is slightly off * gelu test passed with relaxed rtol * cleanup * remove unused stuff from library.py * move profiler template into its own file * removed gemm_profiler.py * move contents of compile_engine.py into gen_gemm.py * rename to profiler_template.cu to avoid CI issue * cleaning up trying to pass pylint * add missing asf header * run black * fixing many pylint issues except wildcard import * fixed wildcard warning * add missing CUTLASS.cmake file, restore gemm_profiler.py * pylint * minor fix * add license * start filling in TODO doc * rename GemmProfiler to GemmProfilerEmitter * more renaming and doc * add doc to the main compile API * refactored generator * run black * black fix * finish doc TODO * add test for 32 bit accum * fixed kernel generator to correctly handle fp32 accum * revise build-related API * add option to profile only one kernel * add option to enable parallel compilation * clean up gen_gemm * doc update * profile_cutlass_kernels -> tune_cutlass_kernels Co-authored-by:leyuan.wang <leyuan.wang@bytedance.com> Co-authored-by:
Masahiro Masuda <masahi129@gmail.com>
Showing
- .gitmodules 3 additions, 0 deletions.gitmodules
- 3rdparty/cutlass 1 addition, 0 deletions3rdparty/cutlass
- CMakeLists.txt 2 additions, 0 deletionsCMakeLists.txt
- LICENSE 5 additions, 0 deletionsLICENSE
- cmake/modules/contrib/CUTLASS.cmake 23 additions, 0 deletionscmake/modules/contrib/CUTLASS.cmake
- licenses/LICENSE.cutlass.txt 23 additions, 0 deletionslicenses/LICENSE.cutlass.txt
- python/tvm/contrib/cutlass/__init__.py 18 additions, 0 deletionspython/tvm/contrib/cutlass/__init__.py
- python/tvm/contrib/cutlass/build.py 172 additions, 0 deletionspython/tvm/contrib/cutlass/build.py
- python/tvm/contrib/cutlass/gemm_operation.py 262 additions, 0 deletionspython/tvm/contrib/cutlass/gemm_operation.py
- python/tvm/contrib/cutlass/gemm_profiler.py 196 additions, 0 deletionspython/tvm/contrib/cutlass/gemm_profiler.py
- python/tvm/contrib/cutlass/gen_gemm.py 355 additions, 0 deletionspython/tvm/contrib/cutlass/gen_gemm.py
- python/tvm/contrib/cutlass/library.py 219 additions, 0 deletionspython/tvm/contrib/cutlass/library.py
- python/tvm/relay/op/contrib/__init__.py 1 addition, 0 deletionspython/tvm/relay/op/contrib/__init__.py
- python/tvm/relay/op/contrib/cutlass.py 74 additions, 0 deletionspython/tvm/relay/op/contrib/cutlass.py
- src/relay/backend/contrib/codegen_c/codegen_c.h 6 additions, 0 deletionssrc/relay/backend/contrib/codegen_c/codegen_c.h
- src/relay/backend/contrib/cutlass/codegen.cc 369 additions, 0 deletionssrc/relay/backend/contrib/cutlass/codegen.cc
- src/relay/backend/contrib/dnnl/codegen.cc 0 additions, 6 deletionssrc/relay/backend/contrib/dnnl/codegen.cc
- tests/python/contrib/test_cutlass.py 135 additions, 0 deletionstests/python/contrib/test_cutlass.py
Loading
Please register or sign in to comment