Commit Graph

2 Commits

Author SHA1 Message Date
Simeon Ehrig
ad8d5e1137 Changes for Pull Request #284
- add Author to CUDA test cases
- optimize DeviceKernelInliner
- improve some comments
- remove deprecated opt level variables
- change interface of IncrementalCUDADeviceCompiler::process() IncrementalCUDADeviceCompiler::declare()
2019-11-07 19:29:15 +01:00
Simeon Ehrig
96366346c0 Added DeviceKernelInliner ASTTransformer
This ASTTransformer adds an inline attribute to any CUDA __device__ kernel
that does not have the attribute. Inlining solves a problem caused by
incremental compilation of PTX code. In a normal compiler, all definitions
of __global__ and __device__ kernels are in the same translation unit. In
the incremental compiler, each kernel has its own translation unit. In case
a __global__ kernel uses a __device__ function, this design caused an error.
Instead of generating the PTX code of the __device__ kernel in the same file
as the __global__ kernel, there is only an external declaration of the
__device__ function. However, normal PTX code does not support an external
declaration of functions.

The transformer only works if the target device is nvptx.
2019-11-07 19:29:15 +01:00