Commit Graph

1875 Commits

Author SHA1 Message Date
Igor Vlasenko
f1e0f83d55 build/parsePrep.c: %patch -F, -d options and macros (ALT#27627)
Backport from rpm 4.10:
%patch -F <fuzz>
%patch -d <destdir>
support for macros:
%{_default_patch_fuzz}  Default fuzz level for %patch in spec file.
%{_default_patch_flags} Default patch flags
2012-08-16 22:57:17 +00:00
Igor Vlasenko
3ad21f4eb2 GROUPS: add Graphical desktop/MATE (ALT#27626) 2012-08-16 22:56:05 +00:00
Sergey Bolshakov
d289faa950 rpmrc.in, macros.in, installplatform: introduce armh arch 2012-08-14 13:20:54 +04:00
dd6ab30865 4.0.4-alt100.53
- brp-fix-perms: fixed "find -perm" syntax.
2012-08-08 13:16:07 +00:00
d78a04e393 brp-fix-perms: fix "find -perm" syntax 2012-08-08 13:15:41 +00:00
6ce84ceff3 4.0.4-alt100.52
- 0common-files.req.list: added /etc/sudoers.d directory.
2012-07-12 10:03:37 +00:00
8d16b4186d 0common-files.req.list: add /etc/sudoers.d 2012-07-12 10:02:33 +00:00
4af3824e1b 4.0.4-alt100.51
- find-lang: added --all-name option (by Igor Vlasenko; closes: #27284).
2012-05-24 17:50:40 +00:00
Igor Vlasenko
da9e0cf514 scripts/find-lang.in: add --all-name option (ALT#27284)
Add PLD/Fedora compatible --all-name option (by mkochano,pascalek@PLD).
2012-05-24 13:44:54 +00:00
cacba087ef 4.0.4-alt100.50
- Fixed build with ld --no-copy-dt-needed-entries.
2012-05-21 01:33:43 +00:00
787f805585 Fix build with ld --no-copy-dt-needed-entries 2012-05-21 00:41:34 +00:00
8c1ee3ba41 4.0.4-alt100.49
- platform.in: Added %_unitdir macro.
- Fixed build with new automake.
2012-05-10 21:31:58 +00:00
e7cd0f8ed2 platform.in: add %_unitdir 2012-05-10 11:37:08 +00:00
8d06e06fbc Makefile.am: fix build with new automake 2012-04-13 17:09:14 +00:00
f9b82ab936 python/Makefile.am: fix typo 2012-04-13 16:54:02 +00:00
552f5329c9 4.0.4-alt100.48
- parseSpec:
  + fixed long lines processing;
  + made size of line buffer configurable via %_spec_line_buffer_size.
2012-03-19 16:19:55 +00:00
12538e4010 parseSpec: use getline() to fix long line processing
* build/rpmspec.h (OpenFileInfo): Change readBuf to a pointer,
add readBufSize.
(freeOpenFileInfo): New prototype.
* build/spec.c (freeSpec): Initialize readBuf and readBufSize.
(freeOpenFileInfo): New function.
* build/parseSpec.c (readLine): Use getline and freeOpenFileInfo.
(closeSpec): Use freeOpenFileInfo.
2012-03-19 19:02:25 +00:00
e870902a7b build: make size of line buffer for .spec parsing configurable via %_spec_line_buffer_size
Based on http://rpm5.org/cvs/chngview?cn=9215

In the meantime, decrease this default line buffer size from 80K to 64K.
2012-03-19 01:29:03 +00:00
a470a11bac parseSpec: enhance line buffer overflow diagnostics 2012-03-19 18:18:03 +00:00
060fd2e340 parseSpec: implement line buffer overflow protection
* build/parseSpec.c (copyNextLine): Protect spec->lbuf line buffer
from overflow.
2012-03-19 01:13:13 +00:00
Alexey Tourbin
b20a5248ea 4.0.4-alt100.47
- set.c: Reimplemented base62+golomb decoder using Knuth's coroutines.
- set.c: Increased cache size from 160 to 256 slots, 75% hit ratio.
- set.c: Implemented 4-byte and 8-byte steppers for rpmsetcmp main loop.
2012-03-15 07:34:01 +04:00
Alexey Tourbin
798ce0db28 set.c: implemented 4-byte and 8-byte steppers for rpmsetcmp main loop
Provides versions, on average, are about 34 times longer that Requires
versions.  More precisely, if we consider all rpmsetcmp calls for
"apt-shell <<<unmet" command, then sum(c1)/sum(c2)=33.88.  This means
that we can save some time and instructions by skipping intermediate
bytes - in other words, by stepping a few bytes at a time.  Of course,
after all the bytes are skipped, we must recheck a few final bytes and
possibly step back.  Also, this requires more than one sentinel for
proper boundary checking.

This change implements two such "steppers" - 4-byte stepper for c1/c2
ratio below 16 and 8-byte stepper which is used otherwise.  When
stepping back, both steppers use bisecting.  Note that replacing last
two bisecting steps with a simple loop might be actually more efficient
with respect to branch prediction and CPU's BTB.  It is very hard to
measure any user time improvement, though, even in a series of 100 runs.
The improvement is next to none, at least on older AMD CPUs.  And so I
choose to keep bisecting.

callgrind annotations for "apt-shell <<<unmet", previous commit:
2,279,520,414  PROGRAM TOTALS
646,107,201  lib/set.c:decode_base62_golomb
502,438,804  lib/set.c:rpmsetcmp
 98,243,148  sysdeps/x86_64/memcmp.S:bcmp
 93,038,752  sysdeps/x86_64/strcmp.S:__GI_strcmp

callgrind annotations for "apt-shell <<<unmet", this commit:
2,000,254,692  PROGRAM TOTALS
642,039,009  lib/set.c:decode_base62_golomb
227,036,590  lib/set.c:rpmsetcmp
 98,247,798  sysdeps/x86_64/memcmp.S:bcmp
 93,047,422  sysdeps/x86_64/strcmp.S:__GI_strcmp
2012-03-15 07:02:09 +04:00
Alexey Tourbin
d78a2cbf3d set.c: increased cache size from 160 to 256 slots, 75% hit ratio
Hit ratio for "apt-shell <<<unmet" command:
160 slots: hit=46813 miss=22862 67.2%
256 slots: hit=52238 miss=17437 75.0%

So, we've increased the cache size by a factor of 256/160=1.6 or by 60%,
and the number of misses has decreased by a factor of 22862/17437=1.31
or by 1-17437/22862=23.7%.  This is not so bad, but it looks like we're
paying more for less.  The following analysis shows that this is not
quite true, since the real memory usage has increased by a somewhat
smaller factor.

160 slots, callgrind annotations:
2,406,630,571  PROGRAM TOTALS
795,320,289  lib/set.c:decode_base62_golomb
496,682,547  lib/set.c:rpmsetcmp
 93,466,677  sysdeps/x86_64/strcmp.S:__GI_strcmp
 91,323,900  sysdeps/x86_64/memcmp.S:bcmp
 90,314,290  stdlib/msort.c:msort_with_tmp'2
 83,003,684  sysdeps/x86_64/strlen.S:__GI_strlen
 58,300,129  sysdeps/x86_64/memcpy.S:memcpy
...
inclusive:
1,458,467,003  lib/set.c:rpmsetcmp

256 slots, callgrind annotations:
2,246,961,708  PROGRAM TOTALS
634,410,352  lib/set.c:decode_base62_golomb
492,003,532  lib/set.c:rpmsetcmp
 95,643,612  sysdeps/x86_64/memcmp.S:bcmp
 93,467,414  sysdeps/x86_64/strcmp.S:__GI_strcmp
 90,314,290  stdlib/msort.c:msort_with_tmp'2
 79,217,962  sysdeps/x86_64/strlen.S:__GI_strlen
 56,509,877  sysdeps/x86_64/memcpy.S:memcpy
...
inclusive:
1,298,977,925  lib/set.c:rpmsetcmp

So the decoding routine now takes about 20% fewer instructions, and
inclusive rpmsetcmp cost is reduced by about 11%.  Note, however, that
bcmp is now the third most expensive routine (due to higher hit ratio).
Since recent glibc versions provide optimized memcmp implementations, I
imply that total/inclusive improvement can be somewhat better than 11%.

As per memory usage, the question "how much the cache takes" cannot be
generally answered with a single number.  However, if we simply sum the
size of all malloc'd chunks on each rpmsetcmp invocation, using the
piece of code with a few obvious modifications elsewhere, we can obtain
the following statistics.

	if (hc == CACHE_SIZE) {
	    int total = 0;
	    for (i = 0; i < hc; i++)
	        total += ev[i]->msize;
	    printf("total %d\n", total);
	}

160 slots, memory usage:
min=1178583
max=2048701
avg=1330104
dev=94747
q25=1266647
q50=1310287
q75=1369005

256 slots, memory usage:
min=1670029
max=2674909
avg=1895076
dev=122062
q25=1828928
q50=1868214
q75=1916025

This indicates that average cache size is increased by about 42% from
1.27M to 1.81M; however, the third quartile is increased by about 40%,
and the maximum size is increased only by about 31% from 1.95M to 2.55M.
By which I conclude that extra 600K must be available even on low-memory
machines like Raspberry Pi (256M RAM).

* * *

What's a good hit ratio?

$ DepNames() { pkglist-query '[%{RequireName}\t%{RequireVersion}\n]' \
	/var/lib/apt/lists/_ALT_Sisyphus_x86%5f64_base_pkglist.classic |
		fgrep set: |cut -f1; }
$ DepNames |wc -l
34763
$ DepNames |sort -u |wc -l
2429
$ DepNames |sort |uniq -c |sort -n |awk '$1>1{print$1}' |Sum
33924
$ DepNames |sort |uniq -c |sort -n |awk '$1>1{print$1}' |wc -l
1590
$ DepNames |sort |uniq -c |sort -n |tail -256 |Sum
27079
$

We have 34763 set-versioned dependencies, which refer to 2429 sonames;
however, only 33924 dependencies refer to 1590 sonames more than once,
and the first reference is always a miss.  Thus the best possible hit
ratio (if we use at least 1590 slots) is (33924-1590)/34763=93.0%.

What happens if we use only 256 slots?  Assuming that dependencies are
processed in random order, the best strategy must spend its cache slots
on sonames with the most references.  This way we can serve (27079-256)
dependencies via cache hit, and so the best possible hit ratio for 256
slots is is 77.2%, assuming that dependencies are processed in random
order.
2012-03-09 02:42:21 +04:00
Alexey Tourbin
0af7afd2e5 set.c: precompute r mask for putbits coroutines
callgrind annotations for "apt-shell <<<unmet", previous commit:
2,424,712,279  PROGRAM TOTALS
813,389,804  lib/set.c:decode_base62_golomb
496,701,778  lib/set.c:rpmsetcmp

callgrind annotations for "apt-shell <<<unmet", this commit:
2,406,630,571  PROGRAM TOTALS
795,320,289  lib/set.c:decode_base62_golomb
496,682,547  lib/set.c:rpmsetcmp
2012-03-09 00:51:58 +04:00
Alexey Tourbin
d1650ccdfe spec: use shuf instead of sort -R to prepare profile data
In sort -R output, identical lines adhere to each other.  Manpage says
that -R sorts by random hash of keys, which probably means that, a random
hash function, when applied to the same keys, makes the same hash value.
What we need instead is a random permutation of the input lines, though.
2012-03-09 00:51:42 +04:00
Alexey Tourbin
80cec29464 set.c: improved cache_decode_set loop
I am going to consdier whether it is worthwhile to increase the cache
size.  Thus I have to ensure that the linear search won't be an obstacle
for doing so.  Particularly, its loop must be efficient in terms of both
cpu instructions and memory access patterns.

1) On behalf of memory access patterns, this change introduces two
separate arrays: hv[] with hash values and ev[] with actual cache
entries.  On x86-64, this saves 4 bytes per entry which have previously
been wasted to align cache_hdr structures.  This has some benefits on
i686 as well: for example, ev[] is not accessed on a cache miss.

2) As per instructions, the loop has two branches: the first is for
boundary checking, and the second is for matching hash condition.  Since
the boundary checking condition (cur->ent != NULL) relies on a sentinel,
the loop cannot be unrolled; it takes 6 instructions per iteration.  If
we replace the condition with explicit boundary check (hp < hv + hc),
the number of iterations becomes known upon entry to the loop, and gcc
will unroll the loop; it takes now 3 instructions per iteration, plus
some (smaller) overhead for boundary checking.

This change also removes __thread specifiers, since gcc is apparently
not very good at optimizing superfluous __tls_get_addr calls.  Also, if
we are to consider larger cache sizes, it becomes questionable whether
each thread should posess its own cache only as a means of achieving
thread safety.  Anyway, currently I'm not aware of threaded applications
which make concurrent librpm calls.

callgrind annotations for "apt-shell <<<unmet", previous commit:
2,437,446,116  PROGRAM TOTALS
820,835,411  lib/set.c:decode_base62_golomb
510,957,897  lib/set.c:rpmsetcmp
...
 23,671,760      for (cur = cache; cur->ent; cur++) {
  1,114,800  => /usr/src/debug/glibc-2.11.3-alt7/elf/dl-tls.c:__tls_get_addr (69675x)
 11,685,644  	if (hash == cur->hash) {
          .  	    ent = cur->ent;

callgrind annotations for "apt-shell <<<unmet", this commit:
2,431,849,572  PROGRAM TOTALS
820,835,411  lib/set.c:decode_base62_golomb
496,682,547  lib/set.c:rpmsetcmp
...
 10,204,175      for (hp = hv; hp < hv + hc; hp++) {
 11,685,644  	if (hash == *hp) {
    189,344  	    i = hp - hv;
    189,344  	    ent = ev[i];

Total improvement is not very impressive (6M instead of expected 14M),
mostly due to memmove complications - hv[] cannot be shifted efficiently
using 8-byte words.  However, the code now scales better.  Also, recent
glibc versions supposedly provide much improved memmove implementation.
2012-03-08 02:35:59 +04:00
Alexey Tourbin
568fe52e61 set.c: reimplemented decode_base62_golomb using Knuth's coroutines
Since the combined base62+golomb decoder is still the most expensive
routine, I have to consider very clever tricks to give it a boost.

In the routine, its "master logic" is executed on behalf of the base62
decoder: it makes bits from the string and passes them on to the "slave"
golomb routine.  The slave routine has to maintain its own state (doing
q or doing r); after the bits are processed, it returns and base62 takes
over.  When the slave routine is invoked again, it has to recover the
state and take the right path (q or r).  These seemingly cheap state
transitions can actually become relatively expensive, since the "if"
clause involves branch prediction which is not particularly accurate on
variable-length inputs.  This change demonstrates that it is possible to
get rid of the state-related instructions altogether.

Roughly, the idea is that, instead of calling putNbits(), we can invoke
"goto *putNbits", and the pointer will dispatch either to putNbitsQ or
putNbitsR label (we can do this with gcc's computed gotos).  However,
the goto will not return, and so the "putbits" guys will have to invoke
"goto getbits", and so on.  So it gets very similar to coroutines as
described in [Knuth 1997, vol. 1, p. 194].  Furthermore, one must
realize that computed gotos are not actually required: since the total
number of states is relatively small - roughly (q^r)x(reg^esc,align) -
it is possible to instantiate a few similar coroutines which pass
control directly to the right labels.

For example, the decoding is started with "get24q" coroutine - that is,
we're in the "Q" mode and we try to grab 24 bits (for the sake of the
example, I do not consider the initial align step).  If 24 bits are
obtained successfully, they are passed down to the "put24q" coroutine
which, as its name suggests, takes over in the "Q" mode immediately;
furthermore, in the "put24q" coroutine, the next call to get bits has to
be either "get24q" or "get24r" (depending on whether Q or R is processed
when no bits are left) - that is, the coroutine itself must "know" that
there is no base62 complications at this point.  The "get24r" is similar
to "get24q" except that it will invoke "put24r" instead of "put24q".  On
the other hand, consider that, in the beginning, only 12 bits have been
directly decoded (and the next 12 bits probably involve "Z").  We then
pass control to "put12q", which will in turn call either "get12q" or
"get12r" to handle irregular cases for the pending 12 bits (um, the
names "get12q" and "get12r" are a bit of a misnomer).

This change also removes another branch in golomb R->Q transition:

        r &= (1 << Mshift) - 1;
        *v++ = (q << Mshift) | r;
        q = 0;
        state = ST_VLEN;
-       if (left == 0)
-           return;
        bits >>= n - left;
        n = left;
    vlen:
        if (bits == 0) {
            q += n;
            return;
        }
        int vbits = __builtin_ffs(bits);
        ...

This first "left no bits" check is now removed and performed implicitly
by the latter "no need for bsf" check, with the result being far better
than I expected.  Perhaps it helps to understand that the condition
"left exactly 0" rarely holds, but CPU is stuck by the check.

So, Q and R processing step each now have exactly one branch (that is,
exactly one condition which completes the step).  Also, in the "put"
coroutines, I simply make a sequence of Q and R steps; this produces
a clean sequence of instructions which branches only when absolutely
necessary.

callginrd annotations for "apt-cache <<<unmet", previous commit:
2,671,717,564  PROGRAM TOTALS
1,059,874,219  lib/set.c:decode_base62_golomb
509,531,239  lib/set.c:rpmsetcmp

callginrd annotations for "apt-cache <<<unmet", this commit:
2,426,092,837  PROGRAM TOTALS
812,534,481  lib/set.c:decode_base62_golomb
509,531,239  lib/set.c:rpmsetcmp
2012-03-07 01:27:20 +04:00
Alexey Tourbin
63da57c20c 4.0.4-alt100.46
- set.c: Fixed bad sentinel due to off-by-one error in alt100.28.
- set.c: Improved linear cache search by using contiguous memory block.
- set.c: Improved decoding by combining and processing 24 bits at a time.
- set.c: Reimplemented downsampling using merges instead of full qsort(3).
- cpp.req: Implemented global/hierarchical mode in which subordinate files
  are processed implicitly, resulting in fewer failures and major speed-up.
- cpp.req: Recover missing refs due to cpp "once-only header" optimization.
2012-02-19 19:34:49 +04:00
Alexey Tourbin
53661a9938 cpp.req: fix double buildroot in filename-specific -I options 2012-02-19 19:09:55 +04:00
Alexey Tourbin
d7b8e36a16 cpp.req: single pkg-config invocation
Running pkg-config multiple times can produce too many cflags, most
of them being dups.  With this change, I rely on pkg-config itself to
discard dups properly - pkg-config(1) manpage says that "duplicate
flags are merged (maintaining proper ordering)".
2012-02-19 18:26:41 +04:00
Alexey Tourbin
50a5ad7320 cpp.req: recover missing once-only pushes using -dI
Hierarchical processing makes cpp.req more susceptible to "once-only
header" optimization.  To demonstrate the problem, I've implemented
some debugging facilities.  Here is how <gtk/gtk.h> is processed.

$ cpp.req -vv /usr/include/gtk-2.0/gtk/gtk.h
[...]
  Include gdk/gdk.h
+ Push /usr/include/gtk-2.0/gdk/gdk.h
    Include gdk/gdkapplaunchcontext.h
+   Push /usr/include/gtk-2.0/gdk/gdkapplaunchcontext.h
      Include gio/gio.h
!     Push /usr/include/glib-2.0/gio/gio.h
        Include gio/giotypes.h
        Push /usr/include/glib-2.0/gio/giotypes.h
          Include gio/gioenums.h
          Push /usr/include/glib-2.0/gio/gioenums.h
            Include glib-object.h
            Push /usr/include/glib-2.0/glib-object.h
              Include gobject/gbinding.h
              Push /usr/include/glib-2.0/gobject/gbinding.h
                Include glib.h
                Push /usr/include/glib-2.0/glib.h
[...]
+               Push /usr/include/gtk-2.0/gtk/gtkdebug.h
                  Include glib.h
                Pop
[...]
recovered glib.h -> /usr/include/glib-2.0/glib.h
recovered stdarg.h -> /usr/lib64/gcc/x86_64-alt-linux/4.5.3/include/stdarg.h
recovered time.h -> /usr/include/time.h
recovered glib-object.h -> /usr/include/glib-2.0/glib-object.h

In the output, "Include" lines annotate "#include" instructions which
are about to be processed by cpp; "Push" and "Pop" annotate actual
file operations performed by cpp.  Technically, "Include" annotations
are enabled via -dI option which installs cb_include callback in
gcc/c-ppoutput.c; "Push" and "Pop" are triggered in the guts of the
libcpp library.  The library has hardcoded optimization against repeated
inclusions.  According to "info cpp", "It remembers when a header file
has a wrapper #ifndef.  If a subsequent #include specifies that header,
and the macro in the #ifndef is still defined, it does not bother to
rescan the file at all."  (See should_stack_file in libcpp/files.c.)

This means that, normally, each "Include" should be followed by a
corresponding "Push".  However, due to "once-only header" optimization,
some includes are not followed by a push.  This means that the file
has already been pushed, and it happens to use a wrapper #ifndef.
Note that, in the output, this is exactly the case with <glib2.h>.

Also note that, in the output, files internal to the package are marked
with "+" on the left.  They are tracked down to the first non-packaged
file, which makes a dependency; these files are marked with "!".  The
problem with <glib2.h> is then that it gets first included in an
external file.  Later it is also included in an internal file, but
a "Push" is not triggered.  And since the internal file is subordinate
to <gtk/gtk.h> and is not going to be processed on its own, the
dependency on <glib2.h> is lost.

To recover missing pushes, we have to associate each include with the
first-time push.  In other words, we need a table which maintains a
(header -> filename) mapping; in the above example, the table will
contain (glib.h -> /usr/include/glib-2.0/glib.h).  Using this table,
we can ascertain that each internal #include produced a result.

Now, this might still have some corner cases: includes with
non-canonical header names probably will not be recovered, and it is not
clear whether <foo.h> and "foo.h" should be processed differently.
It works well enough in simple cases, though.
2012-02-19 18:23:24 +04:00
Alexey Tourbin
e4835167bb cpp.req: hierarchical processing - fewer errors and major speedup
I have to admit that cpp.req can be slow and often fails in an ugly
manner.  To address these issues, this change introduces "hierarchical
processing".  Consider the package libgtk+2-devel.  Only a few header
files from this package can be included directly, and these files in
turn include other "private" headers which are protected against direct
inclusion.  The idea is then that only those few files with the highest
rank have to be processed explicitly, and most of the "private" files
can be processed implicitly as they are included on behalf of
higher-ranking files.

To implement the idea, somehow we have to sort the files by their rank.
This probably has to involve some guesswork.  However, assigning higher
ranks to shorter filenames seems to produce nice guesses.  More precisely,
files are sorted by shorter directory names and then by shorter basenames.
Another possible criteria which is not currently implemented is also to
take into account the number of path components in a directory name.

The result is pretty amazing: the amount of time needed to process
libgtk+2-devel headers is reduced from 150s to 5s.  Notably <gtk/gtk.h>
includes 241 packaged files.  This is also due to other optimizations:
packaged files are excluded from dependencies early on, and each
required filename gets passed to FindPackage only once.
2012-02-19 09:13:44 +04:00
Alexey Tourbin
4d55d9fad0 set.c: better estimation of encode_base62_size 2012-02-19 08:43:36 +04:00
Alexey Tourbin
17452dba48 set.c: reimplmeneted downsampling unsing merges
Most of the time, downsampling is needed for Provides versions,
which are expensive, and values are reduced by only 1 bit, which
can be implemented without sorting the values again.  Indeed,
only a merge is required.  The array v[] can be split into two
parts: the first part v1[] and the second part v2[], the latter
having values with high bit set.  After the high bit is stripped,
v2[] values are still sorted.  It suffices to merge v1[] and v2[].

Note that, however, a merge cannot be done inplace, and also we have
to support 2 or more downsampling steps.  We also want to avoid copying.
This requires careful buffer management - each version needs two
alternate buffers.

callgrind annotations for "apt-cache <<<unmet", previous commit:
2,743,058,808  PROGRAM TOTALS
1,068,102,605  lib/set.c:decode_base62_golomb
  509,186,920  lib/set.c:rpmsetcmp
  131,678,282  stdlib/msort.c:msort_with_tmp'2
   93,496,965  sysdeps/x86_64/strcmp.S:__GI_strcmp
   91,066,266  sysdeps/x86_64/memcmp.S:bcmp
   83,062,668  sysdeps/x86_64/strlen.S:__GI_strlen
   64,584,024  sysdeps/x86_64/memcpy.S:memcpy

callgrind annotations for "apt-cache <<<unmet", this commit:
2,683,295,262  PROGRAM TOTALS
1,068,102,605  lib/set.c:decode_base62_golomb
  510,261,969  lib/set.c:rpmsetcmp
   93,692,793  sysdeps/x86_64/strcmp.S:__GI_strcmp
   91,066,275  sysdeps/x86_64/memcmp.S:bcmp
   90,080,205  stdlib/msort.c:msort_with_tmp'2
   83,062,524  sysdeps/x86_64/strlen.S:__GI_strlen
   58,165,691  sysdeps/x86_64/memcpy.S:memcpy
2012-02-17 14:14:25 +04:00
Alexey Tourbin
692818eb72 set.c: combine and process 24 bits at a time
callgrind annotations for "apt-shell <<<unmet", previous commit:
2,794,697,010  PROGRAM TOTALS
1,119,563,508  lib/set.c:decode_base62_golomb
  509,186,920  lib/set.c:rpmsetcmp

callgrind annotations for "apt-shell <<<unmet", this commit:
2,743,128,315  PROGRAM TOTALS
1,068,102,605  lib/set.c:decode_base62_golomb
  509,186,920  lib/set.c:rpmsetcmp
2012-02-17 09:42:53 +04:00
Alexey Tourbin
7d414b68aa set.c: use plain array to make linear search even simpler
The only reason for using a linked list is to make LRU reordering O(1).
This change replaces the linked list with a plain array.  The inner loop
is now very tight, but reordering involves memmove(3) and is O(N), since
on average, half the array has to be shifted.  Note, however, that the
leading part of the array which is to be shifted is already there in L1
cache, and modern memmove(3) must be very efficient - I expect it to
take much fewer instructions than the loop itself.
2012-02-17 09:42:00 +04:00
Alexey Tourbin
5d0932c8a0 set.c: use contiguous memory to facilitate linear search
Recently I tried to implement another data structure similar to SVR2
buffer cache [Bach 1986], but the code got too complicated.  So I still
maintain that, for small cache sizes, linear search is okay.  Dennis
Ritchie famously argued that a linear search of a directory is efficient
because it is bounded by the size of the directory [Ibid., p. 76].
Great minds think alike (and share similar views on a linear search).

What can make the search slow, however, is not the loop per se, but
rather memory loads: on average, about 67% entries have to be loaded
(assuming 67% hit ratio), checked for entry->hash, and most probably
followed by entry->next.

With malloc'd cache entries, memory loads can be slow.  To facilitate
the search, this change introduces new structure "cache_hdr", which
has only 3 members necessary for the search.  The structures are
pre-allocated in contiguous memory block.  This must play nice with
CPU caches, resulting in fewer memory loads and faster searches.

Indeed, based on some measurements of "apt-shell <<<unmet", this change
can demonstrate about 2% overall improvement in user time.  Using more
sophisticated SVR2-like data structure further improves the result only
by about %0.5.
2012-02-11 06:44:55 +04:00
Alexey Tourbin
c3f705993b set.c: fixed off-by-one error in barrier allocation 2012-02-11 04:46:23 +04:00
Vitaly Kuznetsov
28c4088d19 4.0.4-alt100.45
- Introduced %_rpmlibdir/brp.d/ directory to allow existance of various brp-*
  scripts not only in rpm-build package.
- brp-hardlink_pyo_pyc: splitted from brp-bytecompile_python
2012-01-25 14:25:17 +00:00
Vitaly Kuznetsov
ca5b17e03c introduce brp-hardlink_pyo_pyc (splitted from brp-bytecompile_python)
Hardlinking identical .pyo and .pyc files splitted from brp-bytecompile_python to
brp-hardlink_pyo_pyc to make this brp work for python3 files (generated by separate
brp-bytecompile_python3).
2012-01-25 14:23:12 +00:00
Vitaly Kuznetsov
a771af0403 brp: introduce /usr/lib/rpm/brp.d directory
Made it possible for third party packages to have their own brp-* scripts. All
existent brp-* scripts migrated to /usr/lib/rpm/brp.d, brp-alt taught to execute
all from this directory in alphabetical order. All brp-* scripts obligated to
have three digit prefix (to specify execution order) and .brp suffix.
2012-01-25 14:16:15 +00:00
Vitaly Kuznetsov
a01a51c385 4.0.4-alt100.44
- GROUPS: add Development/Python3 (by Vitaly Kuznetsov) and Other (by Igor
  Vlasenko).
- %_sharedstatedir: change to /var/lib (suggested by Alexey Gladkov).
2012-01-20 09:20:55 +00:00
Vitaly Kuznetsov
3b14bb7720 GROUPS: add Development/Python3 2012-01-20 09:20:37 +00:00
17b988d408 %_sharedstatedir: change to /var/lib
The old value (/usr/com) was pure nonsense.

Suggested-by: Alexey Gladkov <legion@altlinux.org>
2012-01-12 22:13:54 +00:00
Igor Vlasenko
4aa0534dec GROUPS: add Other 2011-12-16 21:11:36 +02:00
c023f529bc 4.0.4-alt100.43
- 0common-files.req.list: removed /etc/sysctl.d directory.
- verify-elf: check RPATH for non-ascii symbols, illegal absolute and
  relative paths, and paths to standard libraries.
2011-12-13 15:43:22 +00:00
3a6b8bd83b 0common-files.req.list: remove /etc/sysctl.d
/etc/sysctl.d is going to be added to filesystem package.

This reverts commit bec54ac071.
2011-12-13 14:55:50 +00:00
820414df17 verify-elf: move check for rpath, stack and unresolved symbols to separate functions 2011-12-12 16:27:52 +00:00
c66e9c38e4 verify-elf: more RPATH checks
Check RPATH for non-ascii symbols, invalid absolute and relative paths,
and standard library directories.
2011-12-10 21:51:42 +00:00
6eea0604ad verify-elf: Rewrite error reporting code 2011-12-10 17:50:11 +00:00