IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Since the combined base62+golomb decoder is still the most expensive
routine, I have to consider very clever tricks to give it a boost.
In the routine, its "master logic" is executed on behalf of the base62
decoder: it makes bits from the string and passes them on to the "slave"
golomb routine. The slave routine has to maintain its own state (doing
q or doing r); after the bits are processed, it returns and base62 takes
over. When the slave routine is invoked again, it has to recover the
state and take the right path (q or r). These seemingly cheap state
transitions can actually become relatively expensive, since the "if"
clause involves branch prediction which is not particularly accurate on
variable-length inputs. This change demonstrates that it is possible to
get rid of the state-related instructions altogether.
Roughly, the idea is that, instead of calling putNbits(), we can invoke
"goto *putNbits", and the pointer will dispatch either to putNbitsQ or
putNbitsR label (we can do this with gcc's computed gotos). However,
the goto will not return, and so the "putbits" guys will have to invoke
"goto getbits", and so on. So it gets very similar to coroutines as
described in [Knuth 1997, vol. 1, p. 194]. Furthermore, one must
realize that computed gotos are not actually required: since the total
number of states is relatively small - roughly (q^r)x(reg^esc,align) -
it is possible to instantiate a few similar coroutines which pass
control directly to the right labels.
For example, the decoding is started with "get24q" coroutine - that is,
we're in the "Q" mode and we try to grab 24 bits (for the sake of the
example, I do not consider the initial align step). If 24 bits are
obtained successfully, they are passed down to the "put24q" coroutine
which, as its name suggests, takes over in the "Q" mode immediately;
furthermore, in the "put24q" coroutine, the next call to get bits has to
be either "get24q" or "get24r" (depending on whether Q or R is processed
when no bits are left) - that is, the coroutine itself must "know" that
there is no base62 complications at this point. The "get24r" is similar
to "get24q" except that it will invoke "put24r" instead of "put24q". On
the other hand, consider that, in the beginning, only 12 bits have been
directly decoded (and the next 12 bits probably involve "Z"). We then
pass control to "put12q", which will in turn call either "get12q" or
"get12r" to handle irregular cases for the pending 12 bits (um, the
names "get12q" and "get12r" are a bit of a misnomer).
This change also removes another branch in golomb R->Q transition:
r &= (1 << Mshift) - 1;
*v++ = (q << Mshift) | r;
q = 0;
state = ST_VLEN;
- if (left == 0)
- return;
bits >>= n - left;
n = left;
vlen:
if (bits == 0) {
q += n;
return;
}
int vbits = __builtin_ffs(bits);
...
This first "left no bits" check is now removed and performed implicitly
by the latter "no need for bsf" check, with the result being far better
than I expected. Perhaps it helps to understand that the condition
"left exactly 0" rarely holds, but CPU is stuck by the check.
So, Q and R processing step each now have exactly one branch (that is,
exactly one condition which completes the step). Also, in the "put"
coroutines, I simply make a sequence of Q and R steps; this produces
a clean sequence of instructions which branches only when absolutely
necessary.
callginrd annotations for "apt-cache <<<unmet", previous commit:
2,671,717,564 PROGRAM TOTALS
1,059,874,219 lib/set.c:decode_base62_golomb
509,531,239 lib/set.c:rpmsetcmp
callginrd annotations for "apt-cache <<<unmet", this commit:
2,426,092,837 PROGRAM TOTALS
812,534,481 lib/set.c:decode_base62_golomb
509,531,239 lib/set.c:rpmsetcmp
- set.c: Fixed bad sentinel due to off-by-one error in alt100.28.
- set.c: Improved linear cache search by using contiguous memory block.
- set.c: Improved decoding by combining and processing 24 bits at a time.
- set.c: Reimplemented downsampling using merges instead of full qsort(3).
- cpp.req: Implemented global/hierarchical mode in which subordinate files
are processed implicitly, resulting in fewer failures and major speed-up.
- cpp.req: Recover missing refs due to cpp "once-only header" optimization.
Running pkg-config multiple times can produce too many cflags, most
of them being dups. With this change, I rely on pkg-config itself to
discard dups properly - pkg-config(1) manpage says that "duplicate
flags are merged (maintaining proper ordering)".
Hierarchical processing makes cpp.req more susceptible to "once-only
header" optimization. To demonstrate the problem, I've implemented
some debugging facilities. Here is how <gtk/gtk.h> is processed.
$ cpp.req -vv /usr/include/gtk-2.0/gtk/gtk.h
[...]
Include gdk/gdk.h
+ Push /usr/include/gtk-2.0/gdk/gdk.h
Include gdk/gdkapplaunchcontext.h
+ Push /usr/include/gtk-2.0/gdk/gdkapplaunchcontext.h
Include gio/gio.h
! Push /usr/include/glib-2.0/gio/gio.h
Include gio/giotypes.h
Push /usr/include/glib-2.0/gio/giotypes.h
Include gio/gioenums.h
Push /usr/include/glib-2.0/gio/gioenums.h
Include glib-object.h
Push /usr/include/glib-2.0/glib-object.h
Include gobject/gbinding.h
Push /usr/include/glib-2.0/gobject/gbinding.h
Include glib.h
Push /usr/include/glib-2.0/glib.h
[...]
+ Push /usr/include/gtk-2.0/gtk/gtkdebug.h
Include glib.h
Pop
[...]
recovered glib.h -> /usr/include/glib-2.0/glib.h
recovered stdarg.h -> /usr/lib64/gcc/x86_64-alt-linux/4.5.3/include/stdarg.h
recovered time.h -> /usr/include/time.h
recovered glib-object.h -> /usr/include/glib-2.0/glib-object.h
In the output, "Include" lines annotate "#include" instructions which
are about to be processed by cpp; "Push" and "Pop" annotate actual
file operations performed by cpp. Technically, "Include" annotations
are enabled via -dI option which installs cb_include callback in
gcc/c-ppoutput.c; "Push" and "Pop" are triggered in the guts of the
libcpp library. The library has hardcoded optimization against repeated
inclusions. According to "info cpp", "It remembers when a header file
has a wrapper #ifndef. If a subsequent #include specifies that header,
and the macro in the #ifndef is still defined, it does not bother to
rescan the file at all." (See should_stack_file in libcpp/files.c.)
This means that, normally, each "Include" should be followed by a
corresponding "Push". However, due to "once-only header" optimization,
some includes are not followed by a push. This means that the file
has already been pushed, and it happens to use a wrapper #ifndef.
Note that, in the output, this is exactly the case with <glib2.h>.
Also note that, in the output, files internal to the package are marked
with "+" on the left. They are tracked down to the first non-packaged
file, which makes a dependency; these files are marked with "!". The
problem with <glib2.h> is then that it gets first included in an
external file. Later it is also included in an internal file, but
a "Push" is not triggered. And since the internal file is subordinate
to <gtk/gtk.h> and is not going to be processed on its own, the
dependency on <glib2.h> is lost.
To recover missing pushes, we have to associate each include with the
first-time push. In other words, we need a table which maintains a
(header -> filename) mapping; in the above example, the table will
contain (glib.h -> /usr/include/glib-2.0/glib.h). Using this table,
we can ascertain that each internal #include produced a result.
Now, this might still have some corner cases: includes with
non-canonical header names probably will not be recovered, and it is not
clear whether <foo.h> and "foo.h" should be processed differently.
It works well enough in simple cases, though.
I have to admit that cpp.req can be slow and often fails in an ugly
manner. To address these issues, this change introduces "hierarchical
processing". Consider the package libgtk+2-devel. Only a few header
files from this package can be included directly, and these files in
turn include other "private" headers which are protected against direct
inclusion. The idea is then that only those few files with the highest
rank have to be processed explicitly, and most of the "private" files
can be processed implicitly as they are included on behalf of
higher-ranking files.
To implement the idea, somehow we have to sort the files by their rank.
This probably has to involve some guesswork. However, assigning higher
ranks to shorter filenames seems to produce nice guesses. More precisely,
files are sorted by shorter directory names and then by shorter basenames.
Another possible criteria which is not currently implemented is also to
take into account the number of path components in a directory name.
The result is pretty amazing: the amount of time needed to process
libgtk+2-devel headers is reduced from 150s to 5s. Notably <gtk/gtk.h>
includes 241 packaged files. This is also due to other optimizations:
packaged files are excluded from dependencies early on, and each
required filename gets passed to FindPackage only once.
Most of the time, downsampling is needed for Provides versions,
which are expensive, and values are reduced by only 1 bit, which
can be implemented without sorting the values again. Indeed,
only a merge is required. The array v[] can be split into two
parts: the first part v1[] and the second part v2[], the latter
having values with high bit set. After the high bit is stripped,
v2[] values are still sorted. It suffices to merge v1[] and v2[].
Note that, however, a merge cannot be done inplace, and also we have
to support 2 or more downsampling steps. We also want to avoid copying.
This requires careful buffer management - each version needs two
alternate buffers.
callgrind annotations for "apt-cache <<<unmet", previous commit:
2,743,058,808 PROGRAM TOTALS
1,068,102,605 lib/set.c:decode_base62_golomb
509,186,920 lib/set.c:rpmsetcmp
131,678,282 stdlib/msort.c:msort_with_tmp'2
93,496,965 sysdeps/x86_64/strcmp.S:__GI_strcmp
91,066,266 sysdeps/x86_64/memcmp.S:bcmp
83,062,668 sysdeps/x86_64/strlen.S:__GI_strlen
64,584,024 sysdeps/x86_64/memcpy.S:memcpy
callgrind annotations for "apt-cache <<<unmet", this commit:
2,683,295,262 PROGRAM TOTALS
1,068,102,605 lib/set.c:decode_base62_golomb
510,261,969 lib/set.c:rpmsetcmp
93,692,793 sysdeps/x86_64/strcmp.S:__GI_strcmp
91,066,275 sysdeps/x86_64/memcmp.S:bcmp
90,080,205 stdlib/msort.c:msort_with_tmp'2
83,062,524 sysdeps/x86_64/strlen.S:__GI_strlen
58,165,691 sysdeps/x86_64/memcpy.S:memcpy
The only reason for using a linked list is to make LRU reordering O(1).
This change replaces the linked list with a plain array. The inner loop
is now very tight, but reordering involves memmove(3) and is O(N), since
on average, half the array has to be shifted. Note, however, that the
leading part of the array which is to be shifted is already there in L1
cache, and modern memmove(3) must be very efficient - I expect it to
take much fewer instructions than the loop itself.
Recently I tried to implement another data structure similar to SVR2
buffer cache [Bach 1986], but the code got too complicated. So I still
maintain that, for small cache sizes, linear search is okay. Dennis
Ritchie famously argued that a linear search of a directory is efficient
because it is bounded by the size of the directory [Ibid., p. 76].
Great minds think alike (and share similar views on a linear search).
What can make the search slow, however, is not the loop per se, but
rather memory loads: on average, about 67% entries have to be loaded
(assuming 67% hit ratio), checked for entry->hash, and most probably
followed by entry->next.
With malloc'd cache entries, memory loads can be slow. To facilitate
the search, this change introduces new structure "cache_hdr", which
has only 3 members necessary for the search. The structures are
pre-allocated in contiguous memory block. This must play nice with
CPU caches, resulting in fewer memory loads and faster searches.
Indeed, based on some measurements of "apt-shell <<<unmet", this change
can demonstrate about 2% overall improvement in user time. Using more
sophisticated SVR2-like data structure further improves the result only
by about %0.5.
- Introduced %_rpmlibdir/brp.d/ directory to allow existance of various brp-*
scripts not only in rpm-build package.
- brp-hardlink_pyo_pyc: splitted from brp-bytecompile_python
Hardlinking identical .pyo and .pyc files splitted from brp-bytecompile_python to
brp-hardlink_pyo_pyc to make this brp work for python3 files (generated by separate
brp-bytecompile_python3).
Made it possible for third party packages to have their own brp-* scripts. All
existent brp-* scripts migrated to /usr/lib/rpm/brp.d, brp-alt taught to execute
all from this directory in alphabetical order. All brp-* scripts obligated to
have three digit prefix (to specify execution order) and .brp suffix.
- GROUPS: add Development/Python3 (by Vitaly Kuznetsov) and Other (by Igor
Vlasenko).
- %_sharedstatedir: change to /var/lib (suggested by Alexey Gladkov).
- 0common-files.req.list: removed /etc/sysctl.d directory.
- verify-elf: check RPATH for non-ascii symbols, illegal absolute and
relative paths, and paths to standard libraries.
- cpp.req: do not insist on trying c++ mode when c++ support is not installed.
- find-debuginfo-files: fixed packaging of symlinks.
- rpmbuild: added "-bt" %check-only option.
Package only those /usr/lib/debug/* symlinks that complement the package
being processed and point to debuginfo regular files which are going to
be packaged along with these symlinks.
The most obvious consequence of this change is that library symlinks for
use of ld(1) will not result to their
/usr/lib/debug/usr/lib*/libNAME.so.debug counterparts to be packaged.
When plain cpp check fails, cpp.req tries to process the same file in
c++ mode, which requires c++ support to be installed. As result, when
c++ support is not installed, cpp.req clutter the log with vain attempts
to process files in c++ mode. This change reduces the noise by checking
whether c++ support is actually available.
- Partially reverted the change to file permissions handling on package
removal or upgrade that was introduced in 4.0.4-alt100.32.
Permissions to access regular files are now erased only if
these files are set[ug]id executables.
- find-lang: handle more exotic GNOME help locale directories (closes: #26417).
Do not erase permissions from regular files on package removal or
upgrade unless these files are both setXid and executable.
It is legal to have regular system files linked somewhere, e.g. by
chrooted installs, so we must be careful not to break these files.
- Fixes the first case crash of RhBug:741606 / CVE-2011-3378 where
immutable region offset is way out of bounds.
(cherry picked from commit a48f0e20cbe2ababc88b2fc52fb7a281d6fc1656)
- Region offsets are supposed to be negative when when an entry
is involved, otherwise zero. Fixes some cases of crash'n'burn on
malformed headers having bogus offsets (CVE-2011-3378)
(cherry picked from commit 11a7e5d95a8ca8c7d4eaff179094afd8bb74fc3f)
SIGPIPE SIG_IGN handler was installed before the fork, which means that,
in autodep scripts, SIGPIPE was ignored as well. This is why in
commands like
cmd1 | cmd2
cmd1 was not killed graceully with SIGPIPE, but instead writing to cmd2
resulted in EPIPE. For which some commands apparently were not ready.
This fixes messages like
/usr/lib/rpm/files.req: line 33: echo: write error: Broken pipe
In decode_set_init(), we explicitly prohibit empty sets:
// no empty sets for now
if (*str == '\0')
return -4;
This does not validate *str character, since the decoder will check for
errors anyway. However, this assumes that, otherwise, a non-empty set
will be decoded. The assumption is wrong: it was actually possible to
construct an "empty set" which triggered assertion failure.
$ /usr/lib/rpm/setcmp yx00 yx00
setcmp: set.c:705: decode_delta: Assertion `c > 0' failed.
zsh: abort /usr/lib/rpm/setcmp yx00 yx00
$
Here, the "00" part of the set-version yields a sequence of zero bits.
Since trailing zero bits are okay, golomb decoding routine basically
skips the whole sequence and returns 0.
To fix the problem, we have to observe that only up to 5 trailing zero
bits can be required to complete last base62 character, and the leading
"0" sequence occupies 6 or more bits.
Some header files have protection against being included into user
code directly. This means that, when processing such files, cpp
is going to fail, and some dependencies probably will be missing.
/usr/include/gtk-2.0/gtk/gtkaccessible.h:
20 #if defined(GTK_DISABLE_SINGLE_INCLUDES) && !defined (__GTK_H_INSIDE__) && !defined (GTK_COMPILATION)
21 #error "Only <gtk/gtk.h> can be included directly."
22 #endif
23
24 #ifndef __GTK_ACCESSIBLE_H__
25 #define __GTK_ACCESSIBLE_H__
26
27 #include <atk/atk.h>
28 #include <gtk/gtkwidget.h>
To remedy the problem, we should, as per the above example, process
gtk/gtk.h dependencies recursively. Dependencies which we now attribute
to gtk/gtk.h are: 1) files which are packaged within the same subpackage
- these dependencies will be optimized out later by rpm; 2) the first
file not packaged into this subpackage, which is atk/atk.h. Files below
atk/atk.h are not processed.
Packaged? Stack
+---------------------+
+ | gtk/gtk.h |
+---------------------+
+ | gtk/gtkaccessible.h | <- SPmark
+---------------------+
- | atk/atk.h |
+---------------------+
| ... |
Also note that packaged files in cpp output should not be identified by
filenames, since filenames in the output will be possibly non-canonical.
Therefore I use standard unix technique to identify files by (dev,ino).
/usr/include/boost/spirit/home/support/detail/lexer/containers/ptr_vector.hpp:
9 #include "../size_t.hpp"