IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
- rpmrc.in: changed to use -mtune=generic for all x86 flavours.
- pkgconfig.req.files: changed to ignore file type and treat
all non-symlinks the same way.
- Added %getenv builtin macro.
- Added %_tmpdir builtin macro,
changed default %_tmppath value to %_tmpdir (closes: #25117).
- Build selinux support in dynamically linked objects only.
- %configure: export -m* part of %optflags as ASFLAGS (for assembler)
along with other *FLAGS exported for compilers.
Some of %optflags options, -m* in particular, have to be passed to
assembler so that the output it produces would be consistent with the
output made by compilers.
In modern rpms both %patch -F <N> and %patch -F<N> are valid option
calls whereas old -F implementation supported -F <N> syntax only.
This patch adds support for %patch -F<N> syntax.
- debugedit doesn't support STABS but there are some crazy cases
like PPC Linux kernel which contains both STABS and DWARF debuginfo
sections, manually added. A better fix would be erroring out
if we didn't find any usable debuginfo and warning otherwise but
this at least folks get their kernels built.
The previous "silently ignore" policy produces bogus debuginfo packages
on some architectures and fails with other mysterious errors on others,
better just fail hard until (if ever) somebody adds stabs support.
* build/rpmspec.h (OpenFileInfo): Change readBuf to a pointer,
add readBufSize.
(freeOpenFileInfo): New prototype.
* build/spec.c (freeSpec): Initialize readBuf and readBufSize.
(freeOpenFileInfo): New function.
* build/parseSpec.c (readLine): Use getline and freeOpenFileInfo.
(closeSpec): Use freeOpenFileInfo.
- set.c: Reimplemented base62+golomb decoder using Knuth's coroutines.
- set.c: Increased cache size from 160 to 256 slots, 75% hit ratio.
- set.c: Implemented 4-byte and 8-byte steppers for rpmsetcmp main loop.
Provides versions, on average, are about 34 times longer that Requires
versions. More precisely, if we consider all rpmsetcmp calls for
"apt-shell <<<unmet" command, then sum(c1)/sum(c2)=33.88. This means
that we can save some time and instructions by skipping intermediate
bytes - in other words, by stepping a few bytes at a time. Of course,
after all the bytes are skipped, we must recheck a few final bytes and
possibly step back. Also, this requires more than one sentinel for
proper boundary checking.
This change implements two such "steppers" - 4-byte stepper for c1/c2
ratio below 16 and 8-byte stepper which is used otherwise. When
stepping back, both steppers use bisecting. Note that replacing last
two bisecting steps with a simple loop might be actually more efficient
with respect to branch prediction and CPU's BTB. It is very hard to
measure any user time improvement, though, even in a series of 100 runs.
The improvement is next to none, at least on older AMD CPUs. And so I
choose to keep bisecting.
callgrind annotations for "apt-shell <<<unmet", previous commit:
2,279,520,414 PROGRAM TOTALS
646,107,201 lib/set.c:decode_base62_golomb
502,438,804 lib/set.c:rpmsetcmp
98,243,148 sysdeps/x86_64/memcmp.S:bcmp
93,038,752 sysdeps/x86_64/strcmp.S:__GI_strcmp
callgrind annotations for "apt-shell <<<unmet", this commit:
2,000,254,692 PROGRAM TOTALS
642,039,009 lib/set.c:decode_base62_golomb
227,036,590 lib/set.c:rpmsetcmp
98,247,798 sysdeps/x86_64/memcmp.S:bcmp
93,047,422 sysdeps/x86_64/strcmp.S:__GI_strcmp
Hit ratio for "apt-shell <<<unmet" command:
160 slots: hit=46813 miss=22862 67.2%
256 slots: hit=52238 miss=17437 75.0%
So, we've increased the cache size by a factor of 256/160=1.6 or by 60%,
and the number of misses has decreased by a factor of 22862/17437=1.31
or by 1-17437/22862=23.7%. This is not so bad, but it looks like we're
paying more for less. The following analysis shows that this is not
quite true, since the real memory usage has increased by a somewhat
smaller factor.
160 slots, callgrind annotations:
2,406,630,571 PROGRAM TOTALS
795,320,289 lib/set.c:decode_base62_golomb
496,682,547 lib/set.c:rpmsetcmp
93,466,677 sysdeps/x86_64/strcmp.S:__GI_strcmp
91,323,900 sysdeps/x86_64/memcmp.S:bcmp
90,314,290 stdlib/msort.c:msort_with_tmp'2
83,003,684 sysdeps/x86_64/strlen.S:__GI_strlen
58,300,129 sysdeps/x86_64/memcpy.S:memcpy
...
inclusive:
1,458,467,003 lib/set.c:rpmsetcmp
256 slots, callgrind annotations:
2,246,961,708 PROGRAM TOTALS
634,410,352 lib/set.c:decode_base62_golomb
492,003,532 lib/set.c:rpmsetcmp
95,643,612 sysdeps/x86_64/memcmp.S:bcmp
93,467,414 sysdeps/x86_64/strcmp.S:__GI_strcmp
90,314,290 stdlib/msort.c:msort_with_tmp'2
79,217,962 sysdeps/x86_64/strlen.S:__GI_strlen
56,509,877 sysdeps/x86_64/memcpy.S:memcpy
...
inclusive:
1,298,977,925 lib/set.c:rpmsetcmp
So the decoding routine now takes about 20% fewer instructions, and
inclusive rpmsetcmp cost is reduced by about 11%. Note, however, that
bcmp is now the third most expensive routine (due to higher hit ratio).
Since recent glibc versions provide optimized memcmp implementations, I
imply that total/inclusive improvement can be somewhat better than 11%.
As per memory usage, the question "how much the cache takes" cannot be
generally answered with a single number. However, if we simply sum the
size of all malloc'd chunks on each rpmsetcmp invocation, using the
piece of code with a few obvious modifications elsewhere, we can obtain
the following statistics.
if (hc == CACHE_SIZE) {
int total = 0;
for (i = 0; i < hc; i++)
total += ev[i]->msize;
printf("total %d\n", total);
}
160 slots, memory usage:
min=1178583
max=2048701
avg=1330104
dev=94747
q25=1266647
q50=1310287
q75=1369005
256 slots, memory usage:
min=1670029
max=2674909
avg=1895076
dev=122062
q25=1828928
q50=1868214
q75=1916025
This indicates that average cache size is increased by about 42% from
1.27M to 1.81M; however, the third quartile is increased by about 40%,
and the maximum size is increased only by about 31% from 1.95M to 2.55M.
By which I conclude that extra 600K must be available even on low-memory
machines like Raspberry Pi (256M RAM).
* * *
What's a good hit ratio?
$ DepNames() { pkglist-query '[%{RequireName}\t%{RequireVersion}\n]' \
/var/lib/apt/lists/_ALT_Sisyphus_x86%5f64_base_pkglist.classic |
fgrep set: |cut -f1; }
$ DepNames |wc -l
34763
$ DepNames |sort -u |wc -l
2429
$ DepNames |sort |uniq -c |sort -n |awk '$1>1{print$1}' |Sum
33924
$ DepNames |sort |uniq -c |sort -n |awk '$1>1{print$1}' |wc -l
1590
$ DepNames |sort |uniq -c |sort -n |tail -256 |Sum
27079
$
We have 34763 set-versioned dependencies, which refer to 2429 sonames;
however, only 33924 dependencies refer to 1590 sonames more than once,
and the first reference is always a miss. Thus the best possible hit
ratio (if we use at least 1590 slots) is (33924-1590)/34763=93.0%.
What happens if we use only 256 slots? Assuming that dependencies are
processed in random order, the best strategy must spend its cache slots
on sonames with the most references. This way we can serve (27079-256)
dependencies via cache hit, and so the best possible hit ratio for 256
slots is is 77.2%, assuming that dependencies are processed in random
order.
In sort -R output, identical lines adhere to each other. Manpage says
that -R sorts by random hash of keys, which probably means that, a random
hash function, when applied to the same keys, makes the same hash value.
What we need instead is a random permutation of the input lines, though.