Fork Optimizations
We should implement the following fork optimizations/analyses
-
Fork coalescing -
Fork tiling/untiling (need to figure out how to derive valid tile sizes) - Fork fusion
-
First doing unrelated forks -
Then store-load forwarded version through parallel reduce
-
- Fork fission
-
A version that splits all reduces into separate forks -
A version that fissions based on an intermediate data, places it into an array, and then uses that array in the following fork
-
-
Fork interchange -
Better forkify (scalar evolution) -
Tensorize (compute einsum notation for each reduce of a fork)
Edited by rarbore2