Thesis Title: Instruction and Work Scheduling for Thread-Parallel Architectures

Subwarp Interleaving

Subwarp Interleaving allows for fine-grained interleaved execution of diverged paths within a warp with the goal of increasing hardware utilization and reducing warp latency in applications with high thread divergence, low warp occupancy, and long stalls in the pipeline.

GPU Subwarp Interleaving, Sana Damani, Mark Stephenson, Ram Rangan, Daniel R. Johnson, Rishkul Kulkarni, and Stephen W. Keckler, HPCA 2022

Speculative Reconvergence

Speculative Reconvergence identifies serially executed common code blocks within divergent loops and modifies the compiler’s branch convergence mechanism to reconverge threads early, thereby maximizing parallelism within expensive code blocks at the cost of increased serialization along low-cost paths within a GPU program.

Speculative Reconvergence for Improved SIMT Efficiency, Sana Damani, Daniel R. Johnson, Mark Stephenson, Stephen W. Keckler, Eddie Yan, Michael McKeown, and Olivier Giroux, CGO 2020

Common Subexpression Convergence

Common Subexpression Convergence (CSC) is a compiler optimization that identifies common code along divergent paths and moves them to a convergent region thereby increasing SIMT efficiency of a GPU program.

Common Subexpression Convergence, Sana Damani and Vivek Sarkar, LCPC 2019


MLIR Tutorial, Jacques Pienaar and Sana Damani, MLIR4HPC@LCPC 2019

search previous next tag category expand menu location phone mail time cart zoom edit close