IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
* Revert "[cmake] Narrow the list of included directories."
This reverts commit 88c16588108508eff66cc160d244c4a59eb157ed.
* Revert "[cxxmodules] Build cling runtime into module."
This reverts commit 76a9ce75d2c29c169812b0816163ac7ce27ffad5.
This patch splits the headers required at both compile time and runtime.
The Cling_Runtime_Extra module does not require __CLING__ to be defined.
It allows the headers from it to be included at compile time as well
where we do not have __CLING__ defined.
The operations done by the LookupHelper are costly in both memory and
performance. Almost every operation requires memory allocation and parsing
of often non-trivial C++ code.
Unfortunately, the LookupHelper is used very intensively by rootcling and
ROOT. The callers usually do not use any caching mechanisms and redo the
expensive operations over and over even though the answer is known to be
the same as before. For instance, building the dictionary of shows:
```
MathCore:
Cached entries: 217
Total parse requests: 54051
Cache hits: 53834
TreePlayer:
Cached entries: 183
Total parse requests: 57697
Cache hits: 57514
```
This patch introduces the first set of caching functionality. In
particular, each LookupHelper::find* function allocates a memory buffer
which is then stored in the clang::SourceManager. We hash the buffer
content and keep a mapping between a hash and FileID and next time we
encounter the same content we do not allocate a new FileID but reuse the
old one. We see decrease in memory footprint by 7% for non-cxxmodules ROOT.
For cxxmodules we see significant reduction of the pcm sizes (by half)
which translates into rss improvements:
```
master before:
cpu time = 0.291462 seconds
sys time = 0.064409 seconds
res memory = 345.816 Mbytes
vir memory = 573.508 Mbytes
master after:
cpu time = 0.235828 seconds
sys time = 0.098327 seconds
res memory = 260.012 Mbytes
vir memory = 377.945 Mbytes
```
Patch by Yuka Takahashi and me.
In cases where we build ROOT with -Dbuiltin_llvm=Off -Dbuiltin_clang=On
and we have installed both llvm and clang in /usr/ clad will pick up
the clang headers from there too.
This patch gives higher priority to the header files which ROOT is
supposed to use. It fixes a very obscure initialization issue due to
different versions of the ASTContext.h installed and used by ROOT.
The relevant highlights are:
* Support better Windows (thanks to Bertrand Bellenot!);
* Disabled automatic discovery of system LLVM -- clad should only
search for LLVM at DCLAD_PATH_TO_LLVM_BUILD. On some platforms
(discovered by Oksana Shadura via rootbench) clad discovers the
system LLVM which is compatible in principle but this is not what
we want for ROOT.
* Implemented -CLAD_BUILD_STATIC_ONLY -- this covers the ROOT usecase
where we do not need shared objects but link the libraries against
another shared object (libCling.so). This allows platforms which have
disabled LLVM_ENABLE_PLUGINS to still build clad and use it. Such
example is CYGWIN and Windows.
See more at: https://github.com/vgvassilev/clad/releases/tag/v0.2
clad is a C++ plugin for clang and cling that implements automatic
differentiation of user-defined functions by employing the chain rule in
forward and reverse mode, coupled with source code transformation and AST
constant fold.
In mathematics and computer algebra, automatic differentiation (AD) is a
set of techniques to numerically evaluate the derivative of a function
specified by a computer program. AD exploits the fact that every computer
program, no matter how complicated, executes a sequence of elementary
arithmetic operations (addition, subtraction, multiplication, division, etc.)
and elementary functions (exp, log, sin, cos, etc.). By applying the chain
rule repeatedly to these operations, derivatives of arbitrary order can
be computed automatically, accurately to working precision, and using at
most a small constant factor more arithmetic operations than the original
program.
AD is an alternative technique to symbolic and numerical differentiation.
These classical methods run into problems: symbolic differentiation leads
to inefficient code (unless done carefully) and faces the difficulty of
converting a computer program into a single expression, while numerical
differentiation can introduce round-off errors in the discretization
process and cancellation. Both classical methods have problems with
calculating higher derivatives, where the complexity and errors increase.
Finally, both classical methods are slow at computing the partial
derivatives of a function with respect to many inputs, as is needed for
gradient-based optimization algorithms. Automatic differentiation solves
all of these problems, at the expense of introducing more software
dependencies.
This patch allows ROOT to interoperate with clad. Namely, users can ask
the interpreter to produce a derivative or a gradient to a known function.
An illustrative example code for first order derivative:
root [0] #include "Math/CladDerivator.h"
root [1] double my_pow2(double x) { return x*x; }
root [2] auto meta_obj = clad::differentiate(my_pow2, /*wrt 1-st argument*/0);
root [3] meta_obj.dump();
The code is: double my_pow2_darg0(double x) {
return (1. * x + x * 1.);
}
root [5] meta_obj.execute(1) // no iterations, at the cost of function call.
(double) 2.0000000
Learn more about clad at https://github.com/vgvassilev/clad
Patch by Aleksandr Efremov and me!
Clang allows third party shared libraries to provide user-defined
extensions. For example, a custom libTemplateInstantiation.so can
visualize all template instantiation chains in clang. To enable it
one needs to pass a set of options such as -fplugin.
Cling should be able to inherently work with clang plugins. However,
cling still does not make full use of the clang driver where the plugin
setup is handled.
This patch enables plugins in cling and extends them in some aspects.
In particular, cling allows loading of plugins from shared libraries
but also if they are linked to the same library where cling is. This is
very useful in cases where cling runs itself in a shared library (eg
libCling). Users of libCling (such as ROOT) prefer to keep all llvm and
clang related symbols local to avoid symbol clashes if there is another
version of clang and llvm linked against a package. This can be done by
dlopen-ing libCling with RTLD_LOCAL visibility mode. Then the only way
for clang plugins to work in this scenario is to be linked to libCling.
Patch by Aleksandr Efremov and me.
Problem:
If a c++ statement with a concrete template parameter will type in, cling pass the (unwrapped) input line and the specialization of the statement, generated by compilerinstance to the clang nvptx. This causes an explicit-specialization-after-instantiation-error.
Solution:
Template specialization declaration will be save two times at a transaction. Once with the type kCCIHandleCXXImplicitFunctionInstantiation and once with the type kCCIHandleTopLevelDecl. To avoid sending a template specialization to the clang nvptx and causing an explicit-specialization-after-instantiation-error it have to check, which kCCIHandleTopLevelDecl declaration is also a kCCIHandleCXXImplicitFunctionInstantiation declaration.
As input of clang nvptx there are two possible sources. The raw input of `IncrementalParser::parseInternal()` or the AST-Printer of the `llvm::module`. Now, to decide, which input source should be use, I use the value `Transaction::ConsumerCallInfo::kCCIHandleTopLevelDecl`. This allows more C++ constructs to use.
Fixed a small bug. The AST-printer prints `setValueNoAlloc()` without the scope `cling::runtime::internal`. This caused an error, because clang could not find the declaration in the global space. The solution are dummy declaration in the global space.
At the moment, templates doesn't works.
- little changes at comments and code style
- try to use const in IncrementalCUDADeviceCompiler, where is possible
- move CUDA device code compiler instance to IncrementalParser
- change the members of CuArgs to const and adjust the setCuArgs method
- use std::vector<string> instead llvm::Smallvector<const char *> to build argv for executeAndWait
- improve the error messages of generatePCH(), generatePTX() and generateFatbinary()
- replace m_Counter with a copy in IncrementalCUDADeviceCompiler to avoid involuntary changes
Before this commit, the CUDA device code Compiler class overwrited source code .cu files, if the translation get wrong. Now, it renames the faulty files, so you can check, which source code cause compiler errors.
- add cudaDeviceSynchronize() at every kernel launch
- remove small address bug at cudaMemcpy, if host array is used
- in parallel test cases, replace fixes thread number with variable
- overworked shared memory kernel
- CUDA __constant__ memory
- CUDA global __device__ memory
- CUDA __host__ prefix
- CUDA kernel launch with arguments
- CUDA templated kernels
- CUDA shared memory with dynamic runtime
- CUDA Streams
- test if CUDA device is available
Before, it was not possible to find the clang++, which is contained in the cling, if we don't start the clang from the bin folder ('./cling -xcuda'). Now, for example it is possible to start the cling with 'bin/cling -xcuda' .
Fix a Bug, which avoid to start './cling -xcuda -fsyntax-only'.
In some cases, the path of the cling temp folder contains some non printable chars at the end.
Change the handling of the path string, to solve this problem.
Now, it is possible to declare variables on the prompt, which are visible for other statements.
The problems was, that cling wrapped all statements in a function, to get valid input. So, every variable is just visible inside in its own wrapper function. To solve the problem, cling change the local variable declaration to a global declaration.
The implementation for the CUDA compiler checks, if the unwrapping is happened. If it happened, the c++ code of the unwrapped variable declaration (AST-printer) instead the raw input will be written to the .cu-file.
At the moment, to extend the AST-tree of the device code, we use PCH-files to extend the exist device code with new lines of code. In detail, if we want to create a new PTX-file, we use the CUDA code (.cu file) and a PCH-file with the existing AST as input and generate an new PCH-file, which contains the whole AST. Then, the PCH-file will compiled to a PTX-file.
A bug in clang prevent, that we can’t generate more than 5 new PCH-files. The bug is not easy to fix, so I write a small workaround. Instead using a PCH-file, which contains the AST, we generate a new complete AST from all .cu-files every time.
The workaround is temporary and should removed, if clang is patched.
Now, it is possible to set some arguments of the clang nvptx and fatbinary via arguments at cling start. The arguments are filtered. So not every argument is possible at the moment. The Arguments can’t changed during runtime, because the PCH-files forbid it. For Example, the calng nvptx use the optimization level, which is set at start of cling.
At the moment, the debug options of clang nvptx are simple. If any debug option is detected, just a -g will add to the clang nvptx.
Additional PTX options for clang nvptx doesn’t works at the moment. There is a problem at parsing at the start of cling.
I replaced copies of the include paths with a pointer to the headerSearchOptions. Now, explicit handling of the include paths is not more necessary. Add include paths, which was declared via argument at start also works.
The class IncrementalCUDADeviceCompiler use external tools to generate PTX and cuda fatbin files. It runs the tools clang and fatbinary via llvm::sys::ExecuteAndWait. The class also handle to include new code in existing code. The steps of the compiler pipeline are:
- clang: CUDA C++ + previous PCH -> PCH
- clang: PCH -> PTX
- fatbinary: PTX -> fatbin
There is no selection of code. Every input of the cling will pass to the IncrementalCUDADeviceCompiler.
Now, it is possible to define functions with c++ attributes without the .rawInput mode. For example functions like `[[ noreturn ]] foo() { ... }` or `[[deprecated]] [[nodiscard]] int bar(){ … }`.