IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
- Imported an Owl patch to remove unsafe file permissions (chmod'ing
files to 0) on package removal or upgrade to prevent continued access
to such files via hard-links possibly created by a user
(CVE-2005-4889, CVE-2010-2059).
- verify-elf: added /lib/../lib64 to the list of prohibited RPATH entries.
Import rpm-4.2-owl-remove-unsafe-perms.diff from Owl, to remove unsafe
file permissions (chmod'ing files to 0) on package removal or upgrade to
prevent continued access to such files via hard-links possibly created
by a user (CVE-2005-4889, CVE-2010-2059).
Below I use 'apt-shell <<<unmet' as a baseline for measurements.
Cache performance with cache_size = 128: hit=39628 miss=22394 (64%)
Cache performance with cache_size = 160: hit=42031 miss=19991 (68%)
(11% fewer cache misses)
Cache performance with cache_size = 160 pivot_size = 1 (plain LRU):
hit=36172 miss=25850 (58%)
Total number of soname set-versions which must be decoded at least once:
miss=2173 (max 96%)
callgrind annotations, 4.0.4-alt100.27:
3,904,042,289 PROGRAM TOTALS
1,378,794,846 decode_base62_golomb
1,176,120,148 rpmsetcmp
291,805,495 __GI_strcmp
162,494,544 __GI_strlen
162,222,530 msort_with_tmp'2
56,758,517 memcpy
53,132,375 __GI_strcpy
...
callgrind annotations, this commit (rebuilt in hasher):
2,558,482,547 PROGRAM TOTALS
987,220,089 decode_base62_golomb
468,510,579 rpmsetcmp
162,222,530 msort_with_tmp'2
85,422,341 __GI_strcmp
82,063,609 bcmp
76,510,060 __GI_strlen
63,806,309 memcpy
...
Inclusive rpmsetcmp annotation, this commit:
1,719,199,968 rpmsetcmp
Typical execution time, 4.0.4-alt100.27:
1.87s user 0.29s system 96% cpu 2.242 total
Typical execution time, this commit:
1.52s user 0.31s system 96% cpu 1.895 total
Based on user time, this constitutes about 20% speed-up. For some
reason, the speed-up is more noticable on i586 architecture (27%).
Note that the cache should not be further increased, because of two
reasons: 1) LRU search is linear - this is fixable; 2) cache memory
cannot be reclaimed - this is unfixable. On average, the cache now
takes 1.3M (max 2M). For small cache sizes, linear search is okay
then (cache_decode_set costs about 20M Ir, which is less than memcmp).
An interesting question is to what extent it is worth to increase
the cache size, assuming that memory footprint is not an issue.
A plausible answer is that decode_base62_golomb should cost no
more than 1/2 of rpmsetcmp inclusive time, which is 987M Ir and
1,719M Ir respectively. So, Ideally, the cache should be increased
up to the point where decode_base62_golomb takes about 700M Ir.
Note, however, that using midpoint insertion technique seems to
improve cache performance far more than simply increasing cache size.
This partially reverts what's been introduced with previous commit.
Realize that strlen() must be *only* called when allocating space
for v[]. There is no reason to call strlen() for every Provides
string, since most of them are decoded via the cache hit.
Note, however, that now I have to use the following trick:
memcmp(str, cur->str, cur->len + 1) == 0
I rely on the fact this works as expected even when str is shorter than
cur->len. Namely, memcmp must start from lower addresses and stop at
the first difference (i.e. memcmp must not read past the end of str,
possibly except for a few trailing bytes on the same memory page); this
is not specified by the standard, but this is how it must work.
Also, since the cache now stores full decoded values, it is possible to
avoid copying and instead to set the pointer to internal cache memory.
Copying must be performed, however, when the set is to be downsampled.
Note that average Provides set size is around 1024, which corresponds
to base62 string length of about 2K and v[] of 4K. Saving strlen(2K)
and memcpy(4K) on every rpmsetcmp call is indeed an improvement.
callgrind annotations for "apt-cache unmet", 4.0.4-alt100.27
1,900,016,996 PROGRAM TOTALS
694,132,522 decode_base62_golomb
583,376,772 rpmsetcmp
106,136,459 __GI_strcmp
102,581,178 __GI_strlen
80,781,386 msort_with_tmp'2
38,648,490 memcpy
26,936,309 __GI_strcpy
26,918,522 regionSwab.clone.2
21,000,896 _int_malloc
...
callgrind annotations for "apt-cache unmet", this commit (rebuilt in hasher):
1,264,977,497 PROGRAM TOTALS
533,131,492 decode_base62_golomb
230,706,690 rpmsetcmp
80,781,386 msort_with_tmp'2
60,541,804 __GI_strlen
42,518,368 memcpy
39,865,182 bcmp
26,918,522 regionSwab.clone.2
21,841,085 _int_malloc
...
Now that string functions are expensive, the API is redesigned so that
strlen is called only once, in rpmsetcmp. The length is then passed as
an argument down to decoding functions. With the length argument, it is
now possible to replace strcmp with memcmp and strcpy with memcpy.
"Effectively avoided" means something like "prakticheski avoided"
in Russian. Multiple escapse are not avoided "prakticheski", though;
they are avoided altogether and "in principle". The right word does
not come to mind.
Now that decode_base62_golomb is much cheaper, the question is:
is it still worth to store short deltas, as opposed to storing
full values at the expense of shrinking the cache?
callgrind annotations for previous commit:
1,526,256,208 PROGRAM TOTALS
470,195,400 decode_base62_golomb
434,006,244 rpmsetcmp
106,137,949 __GI_strcmp
102,459,314 __GI_strlen
...
callgrind annotations for this commit:
1,427,199,731 PROGRAM TOTALS
533,131,492 decode_base62_golomb
231,592,751 rpmsetcmp
103,476,056 __GI_strlen
102,008,203 __GI_strcmp
...
So, decode_base62_golomb now takes more cycles, but the overall price
goes down. This is because, when caching short deltas, two additional
stages should be performed: 1) short deltas must be copied into unsigned
v[] array; 2) decode_delta must be invoked to recover hash values. Both
stages iterate on per-value basis and both are seemingly fast. However,
they are not that fast when both of them are replaced with bare memcpy,
which uses xmm registers or something like this.
The loop is logically impeccable, but its main condition
(v1 < v1end && v2 < v2end) is somewhat redundant: in two
of the three cases, only one pointer gets advanced. To
save instructions, the conditions are now handled within
the cases. The loop is now a while (1) loop, a disguised
form of goto.
Also not that, when comparing Requires against Provides,
the Requires is usually sparse:
P: a b c d e f g h i j k l ...
R: a c h j ...
This means that a nested loop which skips intermediate Provides
elements towards the next Requires element may improve performance.
while (v1 < v1end && *v1 < *v2)
v1++;
However, note that the first condition (v1 < v1end) is also somewhat
redundant. This kind of boundary checking can be partially omitted if
the loop gets unrolled. There is a better technique, however, called
the barrier: *v1end must contain the biggest element possible, so that
the trailing *v1 is never smaller than any of *v2. The nested loop is
then becomes as simple as
while (*v1 < *v2)
v1++;
callgrind annotations, 4.0.4-alt100.27:
1,899,657,916 PROGRAM TOTALS
694,132,522 decode_base62_golomb
583,376,772 rpmsetcmp
106,225,572 __GI_strcmp
102,459,314 __GI_strlen
...
callgrind annotations, this commit (rebuilt in hasher):
1,526,256,208 PROGRAM TOTALS
470,195,400 decode_base62_golomb
434,006,244 rpmsetcmp
106,137,949 __GI_strcmp
102,459,314 __GI_strlen
...
Note that rpmsetcmp also absorbs cache_decode_set and decode_delta;
the loop is now about twice as faster.
The whole point of using a table is not only that comparisons
like (c >= 'a' && c <= 'z') can be eliminated; but also that conditional
branches (the "ands" and "ifs") should be eliminated as well.
The existing code, however, uses separate branches to check e.g. for the
end of string; to check for an error; and to check for the (num6b < 61)
common case. With this change, the table is restructured so that the
common case will be handled with only a single instruction.
- fixup-desktop: Fixed to edit files in place (closes: #25645).
- platform.in: Removed obsolete %update_wms/%clean_wms and
%update_scrollkeeper/%clean_scrollkeeper macros (by Igor Vlasenko).
- fixup-desktop: new file that does trivial fixes in desktop files
(by Igor Vlasenko; closes: #25605).
- ru.po: removed ambiguous translations for "source", "patch" and "icon"
(closes: #24857).
- build/reqprov.c: fixed optimization of subpackage self-requirements.
- build/interdep.c: fixed check for cycles introduced along with
pruning of requirements in 4.0.4-alt100.18.
The check for cycles introduced in 4.0.4-alt100.17-6-ga71e004 appeared
to be incomplete. For example, it fails to recognize a cycle in the
following example:
%package -n test
Requires: libtest = %version-%release
%package -n libtest
Requires: libtest-common = %version-%release
%package -n libtest-common
Requires: libtest = %version-%release