Merge branch 'main' into juno_macro

2af7f9b1 · Aaron Councilman · 335a25b3 · 47082e5e · 2af7f9b1 · 2af7f9b1
Commit 2af7f9b1 authored 10 months ago by Aaron Councilman
--- a/.gitignore
+++ b/.gitignore
@@ -7,3 +7,4 @@
 *.o
 *.hbin
 .*.swp
+.vscode
\ No newline at end of file
--- a/Cargo.lock
+++ b/Cargo.lock
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -7,15 +7,18 @@ members = [
 	"hercules_rt",
 	"hercules_rt_proc",

+	"hercules_test/hercules_interpreter",
+	"hercules_test/hercules_tests",
+	
 	"hercules_tools/hercules_driver",
-	"hercules_tools/hercules_dot",
-	"hercules_tools/hercules_cpu_beta",
+	#"hercules_tools/hercules_hbin_dump",

 	"juno_frontend",
  "juno_build",

+	"hercules_samples/dot",
 	"hercules_samples/matmul",
-	"hercules_samples/task_parallel",
+	#"hercules_samples/task_parallel"

  "juno_samples/matmul"
 ]
--- a/DESIGN.md
+++ b/DESIGN.md
-# Hercules' Design
-
-Hercules' is a compiler targeting heterogenous devices. The key goals of Hercules are listed below:
-
- Generate optimized, memory efficient, and parallel code for devices containing CPUs, GPUs, and other processing elements.
- Explore language design for programming heterogenous systems in a performant, expressive, and safe manner.
- Expose detailed configuration of code generation and scheduling through a novel scheduling language.
- Design an intermediate representation that allows for fine-grained control of what code is executed on what device in a system.
- Develop a runtime system capable of dynamically scheduling generated code fragments on a heterogenous machine.
-
-The following sections contain information on how Hercules is designed to meet these goals.
-
-## Front-end Language Design
-
-The front-end language Juno is a relatively standard imperative programming language based on Rust syntax, though using Mutual Value Semantics and making certain restrictions useful for compiling of to heterogeneous systems.
-More information about Juno can be found in [juno_frontend/README.md](juno_frontend/README.md).
-
-## Scheduling Language Design
-
-TODO: @aaronjc4
-
-## Compiler Design
-
-The Hercules' compiler is split into the following components:
-
-### Hercules IR
-
-The IR of the Hercules compiler is similar to the sea of nodes IR presented in "A Simple Graph-Based Intermediate Representation", with a few differences.
-
- There are dynamic constants, which are constants provided dynamically to the conductor (this is the runtime system, [see the section describing the conductor](#the-conductor)) - these can be used to specify array type sizes, unlike normal runtime values.
- There is no single global store. The closest analog are individual values with an array type, which support dynamically indexed read and write operations.
- There is no I/O, or other side effects.
- There is no recursion.
- The implementation of Hercules IR does not follow the original object oriented design of sea-of-nodes.
-
-A key design consideration of Hercules IR is the absence of a concept of memory. A downside of this approach is that any language targetting Hecules IR must also be very restrictive regarding memory - in practice, this means tightly controlling or eliminating first-class references. The upside is that the compiler has complete freedom to layout data however it likes in memory when performing code generation. This includes deciding which data resides in which address spaces, which is a necessary ability for a compiler striving to have fine-grained control over what operations are computed on what devices.
-
-In addition to not having a generalized memory, Hercules IR has no functionality for calling functions with side-effects, or doing IO. In other words, Hercules is a pure IR (it's not functional, as functions aren't first class values). This may be changed in the future - we could support effectful programs by giving call operators a control input and output edge. However, at least for now, we'd like to work with the simplest IR possible, so the IR is pure.
-
-The key idea behind the sea of nodes IR is that control flow and data flow are represented in the same graph. The entire program thus can be represented by one large flow graph. This has several nice properties, the primary of which being that instructions are unordered except by true dependencies. This alleviates most code motion concerns, and also makes peephole optimizations more practical. Additionally, loop invariant code is neither "inside" nor "outside" a loop in the sea of nodes. Thus, any optimizations benefitting from a particular assumption about the position of loop invariant code works without needing to do code motion. Deciding whether code lives inside a loop or not becomes a scheduling concern.
-
-We chose to use a sea of nodes based IR because we believe it will be easier to partition than a CFG + basic block style IR. A CFG + basic block IR is inherently two-level - there is the control flow level in the CFG, and the data flow in the basic blocks. Partitioning a function across these two levels is a challenging task. As shown by previous work (HPVM), introducing more graph levels into the IR makes partitioning harder, not easier. We want Hercules to have fine-grained control over which code executes where. This requires Hercules' compiler IR to have as few graph levels as reasonable.
-
-See [IR.md](IR.md) for a more specific description of Hercules IR.
-
-### Optimizations
-
-Hercules relies on other compiler infrastructures, such as LLVM, to do code generation for specific devices. Thus, Hercules itself doesn't perform particularly sophisticated optimizations. In general, the optimizations Hercules do are done to make partitioning easier. This includes things like GVN and peephole optimizations, which in general, make the IR "simpler".
-
-TODO: @rarbore2
-
-### Partitioning
-
-Partitioning is responsible for deciding which operations in the IR graph are executed on which devices. Additionally, operations are broken up into shards - every node in a shard executes on the same device, and the runtime system schedules execution at the shard level. Partitioning is conceptually very similar to instruction selection. Each shard can be thought of as a single instruction, and the device the shard is executed on can be thought of as the particular instruction being selected. In instruction selection, there is not only the choice of which instructions to use, but also how to partition the potentially many operations in the IR into a smaller number of target instructions. Similarly, the Hercules IR partitioning process must decide which operations are grouped together into the same shard, and for each shard, which device it should execute on. The set of operations each potential target device is capable of executing is crucial information when forming the shard boundaries, so this cannot be performed optimally as a sequential two step process.
-
-TODO: @rarbore2
-
-### Code Generation
-
-Hercules uses LLVM for generating CPU and GPU code. Memory is "introduced" into the program representation at this stage. Operations in a function are separated into basic blocks. The data layout of values is decided on, and memory is allocated on the stack or is designated as separately allocated and passed into functions as necessary. Code is generated corresponding to possibly several estimates of dynamic constants.
-
-TODO: @rarbore2
-
-## The Conductor
-
-The conductor is responsible for dynamically executing code generated by Hercules. It exposes a Rust API for executing Hercules code. It takes care of memory allocation, synchronization, and scheduling. It is what is called the "runtime" in other systems - we chose a different name as there are events that happen distinctly as "conductor time" (such as providing dynamic constants), rather than at "runtime" (where the generated code is actually executed).
-
-TODO: @rarbore2
--- a/README.md
+++ b/README.md
 # Hercules

-See [DESIGN.md](DESIGN.md) for a discussion of Hercules' design.
+See `reports/` for discussions of Hercules' design.
--- a/hercules_cg/Cargo.toml
+++ b/hercules_cg/Cargo.toml
@@ -4,5 +4,9 @@ version = "0.1.0"
 authors = ["Russel Arbore <rarbore2@illinois.edu>"]

 [dependencies]
+rand = "*"
+ordered-float = "*"
 bitvec = "*"
+serde = { version = "*", features = ["derive"] }
 hercules_ir = { path = "../hercules_ir" }
+
--- a/hercules_cg/src/common.rs
+++ b/hercules_cg/src/common.rs
--- a/hercules_cg/src/cpu.rs
+++ b/hercules_cg/src/cpu.rs
--- a/hercules_cg/src/cpu_beta.rs
+++ b/hercules_cg/src/cpu_beta.rs
--- a/hercules_cg/src/lib.rs
+++ b/hercules_cg/src/lib.rs
-#![feature(let_chains)]
+#![feature(let_chains, iter_intersperse)]

-pub mod common;
 pub mod cpu;
-pub mod top;
+pub mod manifest;
+pub mod sched_dot;
+pub mod sched_gen;
+pub mod sched_ir;
+pub mod sched_schedule;

-pub use crate::common::*;
 pub use crate::cpu::*;
-pub use crate::top::*;
+pub use crate::manifest::*;
+pub use crate::sched_dot::*;
+pub use crate::sched_gen::*;
+pub use crate::sched_ir::*;
+pub use crate::sched_schedule::*;
--- a/hercules_cg/src/manifest.rs
+++ b/hercules_cg/src/manifest.rs
+extern crate serde;
+
+extern crate hercules_ir;
+
+use std::iter::once;
+
+use self::serde::Deserialize;
+use self::serde::Serialize;
+
+use self::hercules_ir::*;
+
+use crate::*;
+
+/*
+ * A manifest stores metadata about a Hercules function. This metadata is used
+ * by the runtime to actually call a Hercules function.
+ */
+#[derive(Debug, Clone, Hash, Serialize, Deserialize)]
+pub struct Manifest {
+    // The signature of each Hercules function is represented in terms of
+    // STypes, since this is the lowest level type representation that Hercules
+    // constructs before reaching target-specific backends.
+    pub param_types: Vec<(SType, ParameterKind)>,
+    pub return_type: SType,
+
+    // The dynamic constants (potentially) used in this Hercules function.
+    pub dynamic_constants: Vec<DynamicConstant>,
+    // The dimensions for array constants defined and used in this Hercules
+    // function.
+    pub array_constants: Vec<Box<[DynamicConstantID]>>,
+
+    // The partitions that make up this Hercules function.
+    pub partitions: Vec<PartitionManifest>,
+}
+
+#[derive(Debug, Clone, Hash, Serialize, Deserialize)]
+pub struct PartitionManifest {
+    // Each partition has one corresponding SFunction.
+    pub name: SFunctionName,
+    // Record the type and kind of each parameter.
+    pub parameters: Vec<(SType, ParameterKind)>,
+    // Record the type and kind of each return value.
+    pub returns: Vec<(SType, ReturnKind)>,
+    // Record the list of possible successors from this partition.
+    pub successors: Vec<PartitionID>,
+}
+
+#[derive(Debug, Clone, Hash, Serialize, Deserialize, PartialEq, Eq)]
+pub enum ParameterKind {
+    // A parameter corresponding to a parameter of the Hercules function.
+    HerculesParameter(usize),
+    // A parameter corresponding to some data defined in some other partition.
+    DataInput(NodeID),
+    // A parameter corresponding to a dynamic constant input to the Hercules
+    // function.
+    DynamicConstant(usize),
+    // A parameter corresponding to an array constant used in the partition.
+    ArrayConstant(ArrayID),
+}
+
+#[derive(Debug, Clone, Hash, Serialize, Deserialize)]
+pub enum ReturnKind {
+    // A return value corresponding to the return value of the Hercules
+    // function.
+    HerculesReturn,
+    // A return value corresponding to some data used in some other partition.
+    DataOutput(NodeID),
+    // An integer specifying which partition should be executed next, if this
+    // partition has multiple successors.
+    NextPartition,
+}
+
+impl Manifest {
+    pub fn all_visible_types(&self) -> impl Iterator<Item = SType> + '_ {
+        self.param_types
+            // Include the Hercules function parameter types.
+            .iter()
+            .map(|(ty, _)| ty.clone())
+            // Include the Hercules function return type.
+            .chain(once(self.return_type.clone()))
+            // Include the partition parameter types.
+            .chain(
+                self.partitions
+                    .iter()
+                    .map(|partition| partition.parameters.iter().map(|(ty, _)| ty.clone()))
+                    .flatten(),
+            )
+            // Include the partition return types.
+            .chain(
+                self.partitions
+                    .iter()
+                    .map(|partition| partition.returns.iter().map(|(ty, _)| ty.clone()))
+                    .flatten(),
+            )
+            // Include the product types formed by the partition return types,
+            // since multiple return values are returned inside a struct.
+            .chain(self.partitions.iter().map(|partition| {
+                SType::Product(partition.returns.iter().map(|(ty, _)| ty.clone()).collect())
+            }))
+    }
+}
+
+impl PartitionManifest {
+    pub fn data_inputs(&self) -> impl Iterator<Item = (NodeID, &SType)> + '_ {
+        self.parameters.iter().filter_map(|(stype, param_kind)| {
+            if let ParameterKind::DataInput(id) = param_kind {
+                Some((*id, stype))
+            } else {
+                None
+            }
+        })
+    }
+
+    pub fn data_outputs(&self) -> impl Iterator<Item = (NodeID, &SType)> + '_ {
+        self.returns.iter().filter_map(|(stype, return_kind)| {
+            if let ReturnKind::DataOutput(id) = return_kind {
+                Some((*id, stype))
+            } else {
+                None
+            }
+        })
+    }
+}
--- a/hercules_cg/src/sched_dot.rs
+++ b/hercules_cg/src/sched_dot.rs
+extern crate bitvec;
+extern crate rand;
+
+use std::collections::{HashMap, VecDeque};
+use std::env::temp_dir;
+use std::fmt::Write;
+use std::fs::File;
+use std::io::Write as _;
+use std::process::Command;
+
+use self::bitvec::prelude::*;
+
+use self::rand::Rng;
+
+use crate::*;
+
+/*
+ * Top level function to compute a dot graph for a schedule IR module, and
+ * immediately render it using xdot.
+ */
+pub fn xdot_sched_module(module: &SModule) {
+    let mut tmp_path = temp_dir();
+    let mut rng = rand::thread_rng();
+    let num: u64 = rng.gen();
+    tmp_path.push(format!("sched_dot_{}.dot", num));
+    let mut file = File::create(tmp_path.clone()).expect("PANIC: Unable to open output file.");
+    let mut contents = String::new();
+    write_dot(&module, &mut contents).expect("PANIC: Unable to generate output file contents.");
+    file.write_all(contents.as_bytes())
+        .expect("PANIC: Unable to write output file contents.");
+    Command::new("xdot")
+        .args([tmp_path])
+        .output()
+        .expect("PANIC: Couldn't execute xdot. Is xdot installed?");
+}
+
+/*
+ * Top level function to write a schedule IR module out as a dot graph.
+ */
+pub fn write_dot<W: Write>(module: &SModule, w: &mut W) -> std::fmt::Result {
+    write_digraph_header(w)?;
+
+    for (function_name, function) in module.functions.iter() {
+        // Schedule the SFunction to form a linear ordering of instructions.
+        let dep_graph = sched_dependence_graph(function);
+        let mut block_to_inst_list = (0..function.blocks.len())
+            .map(|block_idx| (block_idx, vec![]))
+            .collect::<HashMap<usize, Vec<(&SInst, usize)>>>();
+        for (block_idx, block) in function.blocks.iter().enumerate() {
+            let mut emitted = bitvec![u8, Lsb0; 0; block.insts.len()];
+            let mut worklist = VecDeque::from((0..block.insts.len()).collect::<Vec<_>>());
+            while let Some(inst_idx) = worklist.pop_front() {
+                let inst_id = InstID::new(block_idx, inst_idx);
+                let dependencies = &dep_graph[&inst_id];
+                let all_uses_emitted = dependencies
+                    .into_iter()
+                    // Check that all used instructions in this block...
+                    .filter(|inst_id| inst_id.idx_0() == block_idx)
+                    // were already emitted.
+                    .all(|inst_id| emitted[inst_id.idx_1()]);
+                // Phis don't need to wait for all of their uses to be added.
+                if block.insts[inst_idx].is_phi() || all_uses_emitted {
+                    block_to_inst_list
+                        .get_mut(&block_idx)
+                        .unwrap()
+                        .push((&block.insts[inst_idx], block.virt_regs[inst_idx].0));
+                    emitted.set(inst_id.idx_1(), true);
+                } else {
+                    worklist.push_back(inst_idx);
+                }
+            }
+        }
+
+        // A SFunction is a subgraph.
+        write_subgraph_header(function_name, w)?;
+
+        // Each SBlock is a nested subgraph.
+        for (block_idx, block) in function.blocks.iter().enumerate() {
+            write_block_header(function_name, block_idx, "lightblue", w)?;
+
+            // Emit the instructions in scheduled order.
+            write_block(function_name, block_idx, &block_to_inst_list[&block_idx], w)?;
+
+            write_graph_footer(w)?;
+
+            // Add control edges.
+            for succ in block.successors().as_ref() {
+                write_control_edge(function_name, block_idx, succ.idx(), w)?;
+            }
+        }
+
+        write_graph_footer(w)?;
+    }
+
+    write_graph_footer(w)?;
+    Ok(())
+}
+
+fn write_digraph_header<W: Write>(w: &mut W) -> std::fmt::Result {
+    write!(w, "digraph \"Module\" {{\n")?;
+    write!(w, "compound=true\n")?;
+    Ok(())
+}
+
+fn write_subgraph_header<W: Write>(function_name: &SFunctionName, w: &mut W) -> std::fmt::Result {
+    write!(w, "subgraph {} {{\n", function_name)?;
+    write!(w, "label=\"{}\"\n", function_name)?;
+    write!(w, "bgcolor=ivory4\n")?;
+    write!(w, "cluster=true\n")?;
+    Ok(())
+}
+
+fn write_block_header<W: Write>(
+    function_name: &SFunctionName,
+    block_idx: usize,
+    color: &str,
+    w: &mut W,
+) -> std::fmt::Result {
+    write!(w, "subgraph {}_block_{} {{\n", function_name, block_idx)?;
+    write!(w, "label=\"\"\n")?;
+    write!(w, "style=rounded\n")?;
+    write!(w, "bgcolor={}\n", color)?;
+    write!(w, "cluster=true\n")?;
+    Ok(())
+}
+
+fn write_graph_footer<W: Write>(w: &mut W) -> std::fmt::Result {
+    write!(w, "}}\n")?;
+    Ok(())
+}
+
+fn write_block<W: Write>(
+    function_name: &SFunctionName,
+    block_idx: usize,
+    insts: &[(&SInst, usize)],
+    w: &mut W,
+) -> std::fmt::Result {
+    write!(
+        w,
+        "{}_{} [xlabel={}, label=\"{{",
+        function_name, block_idx, block_idx
+    )?;
+    for token in insts.into_iter().map(|token| Some(token)).intersperse(None) {
+        match token {
+            Some((inst, virt_reg)) => {
+                write!(w, "%{} = {}(", virt_reg, inst.upper_case_name())?;
+                for token in sched_get_uses(inst).map(|u| Some(u)).intersperse(None) {
+                    match token {
+                        Some(SValue::VirtualRegister(use_virt_reg)) => {
+                            write!(w, "%{}", use_virt_reg)?
+                        }
+                        Some(SValue::Constant(scons)) => write!(w, "{:?}", scons)?,
+                        None => write!(w, ", ")?,
+                    }
+                }
+                write!(w, ")")?;
+            }
+            None => write!(w, " | ")?,
+        }
+    }
+    write!(w, "}}\", shape = \"record\"];\n")?;
+    Ok(())
+}
+
+fn write_control_edge<W: Write>(
+    function_name: &SFunctionName,
+    src: usize,
+    dst: usize,
+    w: &mut W,
+) -> std::fmt::Result {
+    write!(
+        w,
+        "{}_{} -> {}_{} [color=\"black\"];\n",
+        function_name, src, function_name, dst
+    )?;
+    Ok(())
+}
--- a/hercules_cg/src/sched_gen.rs
+++ b/hercules_cg/src/sched_gen.rs
--- a/hercules_cg/src/sched_ir.rs
+++ b/hercules_cg/src/sched_ir.rs
--- a/hercules_cg/src/sched_schedule.rs
+++ b/hercules_cg/src/sched_schedule.rs
--- a/hercules_cg/src/top.rs
+++ b/hercules_cg/src/top.rs
-extern crate hercules_ir;
-
-use std::collections::HashMap;
-use std::fmt::Write;
-
-use self::hercules_ir::*;
-
-use crate::*;
-
-/*
- * Top level function to generate code for a module. Emits LLVM IR text. Calls
- * out to backends to generate code for individual partitions. Creates a
- * manifest describing the generated code.
- */
-pub fn codegen<W: Write>(
-    module: &Module,
-    def_uses: &Vec<ImmutableDefUseMap>,
-    reverse_postorders: &Vec<Vec<NodeID>>,
-    typing: &ModuleTyping,
-    control_subgraphs: &Vec<Subgraph>,
-    fork_join_maps: &Vec<HashMap<NodeID, NodeID>>,
-    fork_join_nests: &Vec<HashMap<NodeID, Vec<NodeID>>>,
-    antideps: &Vec<Vec<(NodeID, NodeID)>>,
-    bbs: &Vec<Vec<NodeID>>,
-    plans: &Vec<Plan>,
-    w: &mut W,
-) -> Result<ModuleManifest, std::fmt::Error> {
-    // Render types, constants, and dynamic constants into LLVM IR.
-    let llvm_types = generate_type_strings(module);
-    let llvm_constants = generate_constant_strings(module);
-    let llvm_dynamic_constants = generate_dynamic_constant_strings(module);
-
-    // Generate a dummy uninitialized global - this is needed so that there'll
-    // be a non-empty .bss section in the ELF object file.
-    write!(w, "@dummy = dso_local global i32 0, align 4\n")?;
-
-    // Do codegen for each function individually. Get each function's manifest.
-    let mut manifests = vec![];
-    for function_idx in 0..module.functions.len() {
-        // There's a bunch of per-function information we use.
-        let context = FunctionContext {
-            function: &module.functions[function_idx],
-            types: &module.types,
-            constants: &module.constants,
-            dynamic_constants: &module.dynamic_constants,
-            def_use: &def_uses[function_idx],
-            reverse_postorder: &reverse_postorders[function_idx],
-            typing: &typing[function_idx],
-            control_subgraph: &control_subgraphs[function_idx],
-            fork_join_map: &fork_join_maps[function_idx],
-            fork_join_nest: &fork_join_nests[function_idx],
-            antideps: &antideps[function_idx],
-            bbs: &bbs[function_idx],
-            plan: &plans[function_idx],
-            llvm_types: &llvm_types,
-            llvm_constants: &llvm_constants,
-            llvm_dynamic_constants: &llvm_dynamic_constants,
-            partitions_inverted_map: plans[function_idx].invert_partition_map(),
-        };
-
-        manifests.push(context.codegen_function(w)?);
-    }
-
-    // Assemble the manifest for the whole module.
-    Ok(ModuleManifest {
-        functions: manifests,
-        types: module.types.clone(),
-        // TODO: populate array constants.
-        array_constants: vec![],
-    })
-}
-
-impl<'a> FunctionContext<'a> {
-    /*
-     * Each function gets codegened separately.
-     */
-    fn codegen_function<W: Write>(&self, w: &mut W) -> Result<FunctionManifest, std::fmt::Error> {
-        // Find the "top" control node of each partition. One well-formedness
-        // condition of partitions is that there is exactly one "top" control
-        // node.
-        let top_nodes: Vec<NodeID> = self
-            .partitions_inverted_map
-            .iter()
-            .enumerate()
-            .map(|(part_idx, part)| {
-                // For each partition, find the "top" node.
-                *part
-                    .iter()
-                    .filter(move |id| {
-                        // The "top" node is a control node having at least one
-                        // control predecessor in another partition, or is a
-                        // start node. Every predecessor in the control subgraph
-                        // is a control node.
-                        self.function.nodes[id.idx()].is_start()
-                            || (self.function.nodes[id.idx()].is_control()
-                                && self
-                                    .control_subgraph
-                                    .preds(**id)
-                                    .filter(|pred_id| {
-                                        self.plan.partitions[pred_id.idx()].idx() != part_idx
-                                    })
-                                    .count()
-                                    > 0)
-                    })
-                    .next()
-                    .unwrap()
-            })
-            .collect();
-
-        // Collect all the node IDs that are values returned by this function.
-        let returned_values = self
-            .function
-            .nodes
-            .iter()
-            .filter_map(|node| node.try_return().map(|(_, data)| data.idx() as u32))
-            .collect::<Vec<u32>>();
-
-        // Get the partition ID of the start node.
-        let top_partition = self.plan.partitions[0].idx() as u32;
-
-        // Generate code for each individual partition. This generates a single
-        // LLVM function per partition. These functions will be called in async
-        // tasks by the Hercules runtime.
-        assert_eq!(self.plan.num_partitions, top_nodes.len());
-        let mut manifests = vec![];
-        for part_idx in 0..self.plan.num_partitions {
-            match self.plan.partition_devices[part_idx] {
-                Device::CPU => manifests.push(self.codegen_cpu_partition(top_nodes[part_idx], w)?),
-                Device::GPU => todo!(),
-            }
-        }
-
-        // Assemble the manifest for the whole function.
-        Ok(FunctionManifest {
-            name: self.function.name.clone(),
-            param_types: self.function.param_types.clone(),
-            return_type: self.function.return_type,
-            typing: self.typing.clone(),
-            num_dynamic_constant_parameters: self.function.num_dynamic_constants,
-            partitions: manifests,
-            // TODO: populate dynamic constant rules.
-            dynamic_constant_rules: vec![],
-            top_partition,
-            returned_values,
-        })
-    }
-}
--- a/hercules_ir/src/build.rs
+++ b/hercules_ir/src/build.rs
--- a/hercules_ir/src/dataflow.rs
+++ b/hercules_ir/src/dataflow.rs
@@ -332,7 +332,7 @@ pub fn control_output_flow(
    let node = &function.nodes[node_id.idx()];

    // Step 2: clear all bits, if applicable.
-    if node.is_strictly_control() || node.is_thread_id() || node.is_reduce() || node.is_phi() {
+    if node.is_control() || node.is_thread_id() || node.is_reduce() || node.is_phi() {
        out = UnionNodeSet::Empty;
    }

@@ -361,7 +361,10 @@ pub fn immediate_control_flow(

    // Step 1: replace node if this is a phi, thread ID, or collect.
    if let Node::Phi { control, data: _ }
-    | Node::ThreadID { control }
+    | Node::ThreadID {
+        control,
+        dimension: _,
+    }
    | Node::Reduce {
        control,
        init: _,
@@ -375,10 +378,9 @@ pub fn immediate_control_flow(
            .into_iter()
            .fold(UnionNodeSet::top(), |a, b| UnionNodeSet::meet(&a, b));
    }
-    let node = &function.nodes[node_id.idx()];

    // Step 2: clear all bits and set bit for current node, if applicable.
-    if node.is_control() {
+    if function.nodes[node_id.idx()].is_control() {
        let mut singular = bitvec![u8, Lsb0; 0; function.nodes.len()];
        singular.set(node_id.idx(), true);
        out = UnionNodeSet::Bits(singular);

--- a/hercules_ir/src/def_use.rs
+++ b/hercules_ir/src/def_use.rs
@@ -143,14 +143,20 @@ pub fn get_uses<'a>(node: &'a Node) -> NodeUses<'a> {
        Node::Region { preds } => NodeUses::Variable(preds),
        Node::If { control, cond } => NodeUses::Two([*control, *cond]),
        Node::Match { control, sum } => NodeUses::Two([*control, *sum]),
-        Node::Fork { control, factor: _ } => NodeUses::One([*control]),
+        Node::Fork {
+            control,
+            factors: _,
+        } => NodeUses::One([*control]),
        Node::Join { control } => NodeUses::One([*control]),
        Node::Phi { control, data } => {
            let mut uses: Vec<NodeID> = Vec::from(&data[..]);
            uses.push(*control);
            NodeUses::Owned(uses.into_boxed_slice())
        }
-        Node::ThreadID { control } => NodeUses::One([*control]),
+        Node::ThreadID {
+            control,
+            dimension: _,
+        } => NodeUses::One([*control]),
        Node::Reduce {
            control,
            init,
@@ -173,6 +179,7 @@ pub fn get_uses<'a>(node: &'a Node) -> NodeUses<'a> {
            dynamic_constants: _,
            args,
        } => NodeUses::Variable(args),
+        Node::IntrinsicCall { intrinsic: _, args } => NodeUses::Variable(args),
        Node::Read { collect, indices } => {
            let mut uses = vec![];
            for index in indices.iter() {
@@ -210,6 +217,10 @@ pub fn get_uses<'a>(node: &'a Node) -> NodeUses<'a> {
                NodeUses::Two([*collect, *data])
            }
        }
+        Node::Projection {
+            control,
+            selection: _,
+        } => NodeUses::One([*control]),
    }
 }

@@ -226,12 +237,18 @@ pub fn get_uses_mut<'a>(node: &'a mut Node) -> NodeUsesMut<'a> {
        Node::Region { preds } => NodeUsesMut::Variable(preds.iter_mut().collect()),
        Node::If { control, cond } => NodeUsesMut::Two([control, cond]),
        Node::Match { control, sum } => NodeUsesMut::Two([control, sum]),
-        Node::Fork { control, factor: _ } => NodeUsesMut::One([control]),
+        Node::Fork {
+            control,
+            factors: _,
+        } => NodeUsesMut::One([control]),
        Node::Join { control } => NodeUsesMut::One([control]),
        Node::Phi { control, data } => {
            NodeUsesMut::Variable(std::iter::once(control).chain(data.iter_mut()).collect())
        }
-        Node::ThreadID { control } => NodeUsesMut::One([control]),
+        Node::ThreadID {
+            control,
+            dimension: _,
+        } => NodeUsesMut::One([control]),
        Node::Reduce {
            control,
            init,
@@ -254,6 +271,9 @@ pub fn get_uses_mut<'a>(node: &'a mut Node) -> NodeUsesMut<'a> {
            dynamic_constants: _,
            args,
        } => NodeUsesMut::Variable(args.iter_mut().collect()),
+        Node::IntrinsicCall { intrinsic: _, args } => {
+            NodeUsesMut::Variable(args.iter_mut().collect())
+        }
        Node::Read { collect, indices } => {
            let mut uses = vec![];
            for index in indices.iter_mut() {
@@ -291,5 +311,9 @@ pub fn get_uses_mut<'a>(node: &'a mut Node) -> NodeUsesMut<'a> {
                NodeUsesMut::Two([collect, data])
            }
        }
+        Node::Projection {
+            control,
+            selection: _,
+        } => NodeUsesMut::One([control]),
    }
 }
--- a/hercules_ir/src/dom.rs
+++ b/hercules_ir/src/dom.rs