Skip to content
Snippets Groups Projects
  • Leyuan Wang's avatar
    541f9f2d
    [BYOC] CUTLASS integration (#9261) · 541f9f2d
    Leyuan Wang authored
    
    * byoc cutlass
    
    * add cmake and fix build
    
    * test worked but accuracy is bad
    
    * fixed argument printing properly
    
    * moving files
    
    * moving contents of cutlass_profiler into python/tvm/contrib/cutlass
    
    * run black
    
    * remove irrelavant codegen code
    
    * clang format
    
    * tried replacing sm 75 with 80, didn't help improve accuracy
    
    * remove irrelavant code from generator
    
    * tried dense + bias fusion but generated cu file does not compile
    
    * dense + bias worked after adding Leyuan's patch, bias + relu worked too
    
    * tried adding sm80 generator but accuracy is still off
    
    * remove GemmUniversal generator
    
    * cleanup partition and build
    
    * moved partition, profile and build function out of test
    
    * turned out the result match's TVM non-cutlass result. Numpy fp16
    matmul is busted?
    
    * clean up test
    
    * LinearCombination can be reused for bias only epilogue
    
    * remove unsupported epilogues like gelu
    
    * removing deadcode
    
    * unify gemm templates for with or without beta scaling
    
    * supported gelu but accuracy is slightly off
    
    * gelu test passed with relaxed rtol
    
    * cleanup
    
    * remove unused stuff from library.py
    
    * move profiler template into its own file
    
    * removed gemm_profiler.py
    
    * move contents of compile_engine.py into gen_gemm.py
    
    * rename to profiler_template.cu to avoid CI issue
    
    * cleaning up trying to pass pylint
    
    * add missing asf header
    
    * run black
    
    * fixing many pylint issues except wildcard import
    
    * fixed wildcard warning
    
    * add missing CUTLASS.cmake file, restore gemm_profiler.py
    
    * pylint
    
    * minor fix
    
    * add license
    
    * start filling in TODO doc
    
    * rename GemmProfiler to GemmProfilerEmitter
    
    * more renaming and doc
    
    * add doc to the main compile API
    
    * refactored generator
    
    * run black
    
    * black fix
    
    * finish doc TODO
    
    * add test for 32 bit accum
    
    * fixed kernel generator to correctly handle fp32 accum
    
    * revise build-related API
    
    * add option to profile only one kernel
    
    * add option to enable parallel compilation
    
    * clean up gen_gemm
    
    * doc update
    
    * profile_cutlass_kernels -> tune_cutlass_kernels
    
    Co-authored-by: default avatarleyuan.wang <leyuan.wang@bytedance.com>
    Co-authored-by: default avatarMasahiro Masuda <masahi129@gmail.com>
    [BYOC] CUTLASS integration (#9261)
    Leyuan Wang authored
    
    * byoc cutlass
    
    * add cmake and fix build
    
    * test worked but accuracy is bad
    
    * fixed argument printing properly
    
    * moving files
    
    * moving contents of cutlass_profiler into python/tvm/contrib/cutlass
    
    * run black
    
    * remove irrelavant codegen code
    
    * clang format
    
    * tried replacing sm 75 with 80, didn't help improve accuracy
    
    * remove irrelavant code from generator
    
    * tried dense + bias fusion but generated cu file does not compile
    
    * dense + bias worked after adding Leyuan's patch, bias + relu worked too
    
    * tried adding sm80 generator but accuracy is still off
    
    * remove GemmUniversal generator
    
    * cleanup partition and build
    
    * moved partition, profile and build function out of test
    
    * turned out the result match's TVM non-cutlass result. Numpy fp16
    matmul is busted?
    
    * clean up test
    
    * LinearCombination can be reused for bias only epilogue
    
    * remove unsupported epilogues like gelu
    
    * removing deadcode
    
    * unify gemm templates for with or without beta scaling
    
    * supported gelu but accuracy is slightly off
    
    * gelu test passed with relaxed rtol
    
    * cleanup
    
    * remove unused stuff from library.py
    
    * move profiler template into its own file
    
    * removed gemm_profiler.py
    
    * move contents of compile_engine.py into gen_gemm.py
    
    * rename to profiler_template.cu to avoid CI issue
    
    * cleaning up trying to pass pylint
    
    * add missing asf header
    
    * run black
    
    * fixing many pylint issues except wildcard import
    
    * fixed wildcard warning
    
    * add missing CUTLASS.cmake file, restore gemm_profiler.py
    
    * pylint
    
    * minor fix
    
    * add license
    
    * start filling in TODO doc
    
    * rename GemmProfiler to GemmProfilerEmitter
    
    * more renaming and doc
    
    * add doc to the main compile API
    
    * refactored generator
    
    * run black
    
    * black fix
    
    * finish doc TODO
    
    * add test for 32 bit accum
    
    * fixed kernel generator to correctly handle fp32 accum
    
    * revise build-related API
    
    * add option to profile only one kernel
    
    * add option to enable parallel compilation
    
    * clean up gen_gemm
    
    * doc update
    
    * profile_cutlass_kernels -> tune_cutlass_kernels
    
    Co-authored-by: default avatarleyuan.wang <leyuan.wang@bytedance.com>
    Co-authored-by: default avatarMasahiro Masuda <masahi129@gmail.com>