IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
This reverts commit 5298b418eec4129351888f41cb7c3bfc90161e22.
This commit was mistakenly committed. PR was opened in #1730, but it was
closed and moved to #1761. I didn't notice this and created another PR
in #1980.
This change was causing 100+ failures in runtime cxxmodules nightlies.
(Eg. https://epsft-jenkins.cern.ch/job/root-pullrequests-build/29183/testReport/junit/projectroot/runtutorials/tutorial_fit_FittingDemo/)
We want to have **proper** PrebuildModulesPaths which information were
extracted from LD_LIBRARY_PATH and DYLD_LIBRARY_PATH, not a random ".".
Because of this commit, we were trying to autoload libraries generated
by roottest on-demand (for example "./h1analysisTreeReader_C.so") This
is not an intentional behavior, these autogenerated libraries are
already loaded by roottest and what we want to do is to load **proper**
libraries like libHist.so instead.
In previous allmodules&autoloading patch, we used callback from
DeserializationListener to get Decl and loaded corresponding libraries.
It worked, but the performance was bad because ROOT was loading
excessive libraries.
In this patch, we use TCling::LazyFunctionCreatorAutoloadForModule. This
function gets callback when "mangled_name" was not found in loaded
libraries thus we have to the load corresponding library and lookup
again.
I used unordered_map to store mangled identifier and library pair. I'm
doing an optimization by hashing mangled name and storing library not by
name but by uint8 and hold uint8-name information in another vector.
Also tried std::map but unorderd_map was more performant. There are
better hash table like:
https://probablydance.com/2018/05/28/a-new-fast-hash-table-in-response-to-googles-new-fast-hash-table/
we can try to use them if this part gets crucial.
With this patch:
```
Processing tutorials/hsimple.C...
hsimple : Real Time = 0.04 seconds Cpu Time = 0.03 seconds
(TFile *) 0x562b37a14fe0
Processing /home/yuka/CERN/ROOT/memory.C...
cpu time = 0.362307 seconds
sys time = 0.039741 seconds
res memory = 278.215 Mbytes
vir memory = 448.973 Mbytes
```
W/o this patch:
```
Processing tutorials/hsimple.C...
hsimple : Real Time = 0.08 seconds Cpu Time = 0.07 seconds
(TFile *) 0x5563018a1d30
Processing /home/yuka/CERN/ROOT/memory.C...
cpu time = 1.524314 seconds
sys time = 0.157075 seconds
res memory = 546.867 Mbytes
vir memory = 895.184 Mbytes
```
So it improves time by 4x times and memory by 2x.
Before this commit, cpt.py attempted `"3.11.1" < "3.4.3"`, but this
incorrectly returns `True`. This commit adds a function that splits the
string into version identifiers and checks them all individually.
Add llvm module pass to generate unique cuda module ctor/dtor names.
This llvm module pass address the follow problem. Every llvm module has a cuda ctor and dtor (if a cuda fatbinary exist), with at least a function call to register the fatbinary. The ctor/dtor can also include function calls to register global functions and variables at runtime, depending on user's code. The lazy compilation detects functions by the name. If the name (symbol) already exists it uses the existing translation. Otherwise it translates the function on first use (but it never translates twice). Without the module pass, Cling will always use the translation of the first module.
The testcase use the reflection of the gCling interpreter object. It takes two random modules and compare the symbols of the cuda module ctor and dtor.
Also add function, which change the symbol of the cuda module ctor and dtor to preprocessor compliant symbols.
Preloading all the modules has several advantages. 1. We do not have to
rely on rootmap files which don't support some features (namespaces and
templates) 2. Lookup would be faster because we don't have to do
trampoline via rootmap files.
Autoloading libraries when decls are deserialized gives us correctness.
However we still need to optimize the performance by reducing the amount
of loaded libraries and improving Clang performance.
Unloading the enum forward decl has tragic consequences e.g. for template
specializations relying on that decl (as template type parameter): the
enum definition will not be associated with teh forward decl, and any
subsequent template specialization will not recognize the type identity
of the enum forward declaration and its definition.
Instead, silence the diagnostic. This is not nice, but will be fixed by
C++ modules.
Fix compilation errors with the latest version of Visual Studio (15.7.0)
- Prevent compilation error G47C585C4: STL1000: Unexpected compiler version, expected Clang 6 or newer
- Fix exported symbols
Late parsed templated are parsed from a token chain, as if expanding a macro. This confuses subsequent parses. Make sure that the token quere is emptied, which is exactly what ParseInternal() does after parsing.
In case of TProtoClass.cxx, all the loop was doing
is to copy the collection (loop body with comments
only - and those don't compile when removing the //.
Thus comment out the whole loop.
In TDFNodes.cxx remove the redundant get() as well.
There are some codes compiled at the start up time. For example,
- #include \"cling/Interpreter/RuntimeUniverse.h\"
- #include \"cling/Interpreter/DynamicLookupRuntimeUniverse.h\"
- namespace cling { class Interpreter; namespace runtime { Interpreter* gCling }}}
- PrintValue
These are passed to Cling as string and initialized at the start up time. So I think it makes sense to reduce top-level global variables, #includes and virtual functions.
1. Global variables
If we break at emitModule, we can get a list of global variables and functions which are actually deserialized. These include functions, variables, STL classes and all the functions derives from them.
I tried to change them to for example constexpr, so that it's processed at compile time.
2. Eagerly Deserialized decls
Thanks to @Axel 's hint and tip, we could minimize eagerly deserialized decls deserialized in ASTReader::PassInterestingDeclsToConsumer. We already removed most of eagerly deserialized decls (Some are remaining to be removed, some are hard to remove and some don't cost a lot).
So far, we got 9.2% of cpu time improvement and 8.8% of memory improvement at start up time in release build.
- root.exe -q -l
- master
cpu time = 0.09186914285714286 sec (average of 7 times)
res memory = 142.008 Mbytes
- HEAD
cpu time = 0.08337842857142856 sec
res memory = 129.508 Mbytes
- hsimple.C
Improved by 13% of cpu time and 8.5% of memory
- master
cpu time = 0.0954708 sec (average)
res memory = 142.891 Mbytes
- HEAD
cpu time = 0.0833258 sec
res memory = 130.73 Mbytes
With modules
- Improvement by 17.7% in cputime and 2% in memory on root.exe -q -l
(For memory, small improvement is because most of the memory is taken by LoadModules)
- With this patch, modules is 11.2% slower in cpu time and 6% better in residential memory.