Performance Tuning Improving Start Time Compiling CUDA kernels to exact compute capability of device reduces jit compile time.