rpm-build

Author	SHA1	Message	Date
Vitaly Chikunov	785ae7a9a2	Backport: Add support for dpkg-style sorting of tilde in version/release Original commit message: - This allows much nicer handling some common scenarios such as upstream pre-releases where the pre-release version would normally appear newer than final release, eg 1.0-rc1 vs 1.0. Previously this required mapping the pre-release tag into the release tag to achieve desired sorting, with tilde this becomes simply 1.0~rc1 < 1.0. - Add a rpmlib() tracking dependency to prevent older rpm versions from getting confused with packages relying on the new behavior. Picked: db28221a4a ("Add support for dpkg-style sorting of tilde in version/release") Authored-by: Michael Schroeder <mls@suse.de> Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Picked: 8002b3f985 ("Spelling fixes.") Authored-by: Ville Skyttä <ville.skytta@iki.fi> Signed-off-by: Panu Matilainen <pmatilai@redhat.com> Link: https://bugzilla.altlinux.org/46585 [ vt: Change to parseRCPOT is not applied because no rpmCharCheck call. Unsupported RPM tags (ORDERVERSION, SUGGESTSVERSION, ENHANCESVERSION) are removed. haveTildeDep is reworked because we don't have headerGet API. ] Signed-off-by: Vitaly Chikunov <vt@altlinux.org>	2023-09-10 21:33:18 +03:00
Arseny Maslennikov	70ad448746	build: Unconditionally nullify stdin for build scripts When we run a build script, redirect its standard input to a newly created pipe with no open writers. This makes the behaviour of build scripts more robust against e. g. unsolicited interactivity (esp. if inherited stdio points to a tty) and more reproducible.	2023-05-24 15:08:28 +03:00
Vitaly Chikunov	36d9f39b47	Backport packaging '%pretrans' Lua scriptlets Based on rpm-4.4 with minor changes. Running functionality is not backported. Based-on: 73260d956c ("Implemented %pretrans and %posttrans script slots.") Signed-off-by: Vitaly Chikunov <vt@altlinux.org>	2022-10-11 08:05:12 +03:00
Colin Walters	8e0d85c61e	Add 'VCS' key Spec files have a lot of metadata about a project. However one of the most key components is the upstream version control system which was notably lacking. Resolve this by adding a "VCS" key. There is no specification for contents of this key, given that the set of version control systems (and features thereof) are not well-defined. However, recommendations are: * git: This URL should be in a form that can be passed to "git clone", with the additional feature that an optional fragment identifier "#foo" denotes a branch or tag.	2019-11-01 16:16:23 +00:00
Ivan Zakharyaschev	e805931cf3	remark: we don't print the disttag	2019-02-26 23:19:51 +03:00
Ivan Zakharyaschev	3c72c222d5	lib/depends.c: remark: There is no psp->problems->byDisttag. (TODO?)	2019-02-26 23:19:51 +03:00
Ivan Zakharyaschev	2b320b142d	lib/psm.c: make runScript() print the disttag on errors	2019-02-26 23:19:51 +03:00
Ivan Zakharyaschev	9dfcf26d97	lib/psm.c: print disttag to syslog if available	2019-02-26 23:19:50 +03:00
Ivan Zakharyaschev	dfb710403d	lib/depends.c: make headerMatchesDepFlags() aware of the disttag of the header	2019-02-26 23:19:50 +03:00
Ivan Zakharyaschev	ef382d2a80	add disttag to struct availablePackage (like buildtime; affects rpm -U & interdep.c)	2019-02-26 23:19:50 +03:00
Ivan Zakharyaschev	6be20da468	headerNVR() replaced by the new header{NVRD,Name{,Version}}() in trivial cases These are the cases where even the release was not needed (so, the disttag is not needed, too), or one case where the filename is contructed (and it doesn't include the disttag). Now grep -Ee 'headerNVR[^D]' will show the remaining non-trivial cases, where adapting to disttags may be needed.	2019-02-26 23:16:13 +03:00
Vladimir D. Seleznev	69ddb1de4c	lib/psm.c: hack to make upgrade packages between branches possible (cherry picked from commit c5c67cf171246d433290cb5b917ce3b6a8638880)	2019-02-26 22:17:52 +03:00
Vladimir D. Seleznev	e624792819	lib/depends.c: make rpmRangesOverlap() handle DistTag In a constraint (Requires, Conflicts), some components of E:V-R:D may be unspecified; here are the sensible possibilities: V E:V-R E:V-R:D Remember that the DistTag represents the ID of a particular build. (V represents a particular upstream version. E:V-R represents a particular source package release.) To satisfy a requirement, only the specified components must be checked. So, if the requirement doesn't specify a DistTag, then we don't have to check the DistTag to satisfy it. If the requirement does specify a DistTag, then if it is an "equals" requirement, then only the same DistTag can satisfy it. (I.e., we want that particular build.) A "less" (or "greater") requirement of a DistTag basically makes no sense, because the build IDs are not ordered. So, such a requirement cannot be satisfied, if it has come to checking the DistTag, i.e. the EVRs have been equal. (It cannot be satisfied by a package with an equal EVR and any DistTag value, but can be satisfied by a package with EVR which is strictly less or greater.) What does the last part mean for "Requires" and "Conflicts"? Requires: N>E:V-R:D would have the same effect as Requires: N>E:V-R Conflicts: N>E:V-R:D would have the same effect as Conflicts: N>E:V-R i.e., a conflict would not disallow packages with the specified E:V-R, but a different DistTag. (We can't do any better, unless there is a "not-equals" type of requirement in RPM.) Commit message author: Ivan Zakharyaschev <imz@altlinux.org> Commit is based on 6cb615d6112a2ca841481d8153ba652d512a2f23 of git://git.altlinux.org/gears/r/rpm.git (cherry picked from commit d169679410a0d02a731addb8b526ecbc8a3a56fc)	2019-02-26 22:17:52 +03:00
Ivan Zakharyaschev	ee2e436b00	Merge branch 'predisttag-fixes' into _BUILD/alt # Conflicts: # rpm-4_0.spec	2019-02-26 22:17:43 +03:00
Ivan Zakharyaschev	6162356696	rpmRangesOverlap() optimized with strdupa; simplified this place a bit Analoguous strdupa optmization in rpm-4.13-alt6 gave around 30% improvement in speed when using this functions. Simplfied code (with variables available only in the scope where they are used and without extra variables whose value is not used anymore) is simpler to understand and modify.	2019-02-26 22:02:56 +03:00
Dmitry V. Levin	4134fc39ac	Fix automake warnings Makefile.am:13: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '_CPPFLAGS') build/Makefile.am:5: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '_CPPFLAGS') lib/Makefile.am:5: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '_CPPFLAGS') python/Makefile.am:7: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '_CPPFLAGS') rpmdb/Makefile.am:5: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '_CPPFLAGS') rpmio/Makefile.am:9: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '_CPPFLAGS') tools/Makefile.am:5: warning: 'INCLUDES' is the old name for 'AM_CPPFLAGS' (or '*_CPPFLAGS')	2014-02-15 22:19:47 +00:00
Alexey Tourbin	f25f962fe6	set.c: fixed sentinel allocation	2012-12-24 16:24:15 +04:00
Dmitry V. Levin	a538345515	Change default %_tmppath value to %_tmpdir	2012-12-22 17:07:17 +00:00
Dmitry V. Levin	427a26e82a	Remove getdate.y Removal of rollback support by 4.0.4-alt100.35-1-g9e15c26 made getdate obsolete.	2012-08-31 20:11:21 +00:00
Alexey Tourbin	798ce0db28	set.c: implemented 4-byte and 8-byte steppers for rpmsetcmp main loop Provides versions, on average, are about 34 times longer that Requires versions. More precisely, if we consider all rpmsetcmp calls for "apt-shell <<<unmet" command, then sum(c1)/sum(c2)=33.88. This means that we can save some time and instructions by skipping intermediate bytes - in other words, by stepping a few bytes at a time. Of course, after all the bytes are skipped, we must recheck a few final bytes and possibly step back. Also, this requires more than one sentinel for proper boundary checking. This change implements two such "steppers" - 4-byte stepper for c1/c2 ratio below 16 and 8-byte stepper which is used otherwise. When stepping back, both steppers use bisecting. Note that replacing last two bisecting steps with a simple loop might be actually more efficient with respect to branch prediction and CPU's BTB. It is very hard to measure any user time improvement, though, even in a series of 100 runs. The improvement is next to none, at least on older AMD CPUs. And so I choose to keep bisecting. callgrind annotations for "apt-shell <<<unmet", previous commit: 2,279,520,414 PROGRAM TOTALS 646,107,201 lib/set.c:decode_base62_golomb 502,438,804 lib/set.c:rpmsetcmp 98,243,148 sysdeps/x86_64/memcmp.S:bcmp 93,038,752 sysdeps/x86_64/strcmp.S:__GI_strcmp callgrind annotations for "apt-shell <<<unmet", this commit: 2,000,254,692 PROGRAM TOTALS 642,039,009 lib/set.c:decode_base62_golomb 227,036,590 lib/set.c:rpmsetcmp 98,247,798 sysdeps/x86_64/memcmp.S:bcmp 93,047,422 sysdeps/x86_64/strcmp.S:__GI_strcmp	2012-03-15 07:02:09 +04:00
Alexey Tourbin	d78a2cbf3d	set.c: increased cache size from 160 to 256 slots, 75% hit ratio Hit ratio for "apt-shell <<<unmet" command: 160 slots: hit=46813 miss=22862 67.2% 256 slots: hit=52238 miss=17437 75.0% So, we've increased the cache size by a factor of 256/160=1.6 or by 60%, and the number of misses has decreased by a factor of 22862/17437=1.31 or by 1-17437/22862=23.7%. This is not so bad, but it looks like we're paying more for less. The following analysis shows that this is not quite true, since the real memory usage has increased by a somewhat smaller factor. 160 slots, callgrind annotations: 2,406,630,571 PROGRAM TOTALS 795,320,289 lib/set.c:decode_base62_golomb 496,682,547 lib/set.c:rpmsetcmp 93,466,677 sysdeps/x86_64/strcmp.S:__GI_strcmp 91,323,900 sysdeps/x86_64/memcmp.S:bcmp 90,314,290 stdlib/msort.c:msort_with_tmp'2 83,003,684 sysdeps/x86_64/strlen.S:__GI_strlen 58,300,129 sysdeps/x86_64/memcpy.S:memcpy ... inclusive: 1,458,467,003 lib/set.c:rpmsetcmp 256 slots, callgrind annotations: 2,246,961,708 PROGRAM TOTALS 634,410,352 lib/set.c:decode_base62_golomb 492,003,532 lib/set.c:rpmsetcmp 95,643,612 sysdeps/x86_64/memcmp.S:bcmp 93,467,414 sysdeps/x86_64/strcmp.S:__GI_strcmp 90,314,290 stdlib/msort.c:msort_with_tmp'2 79,217,962 sysdeps/x86_64/strlen.S:__GI_strlen 56,509,877 sysdeps/x86_64/memcpy.S:memcpy ... inclusive: 1,298,977,925 lib/set.c:rpmsetcmp So the decoding routine now takes about 20% fewer instructions, and inclusive rpmsetcmp cost is reduced by about 11%. Note, however, that bcmp is now the third most expensive routine (due to higher hit ratio). Since recent glibc versions provide optimized memcmp implementations, I imply that total/inclusive improvement can be somewhat better than 11%. As per memory usage, the question "how much the cache takes" cannot be generally answered with a single number. However, if we simply sum the size of all malloc'd chunks on each rpmsetcmp invocation, using the piece of code with a few obvious modifications elsewhere, we can obtain the following statistics. if (hc == CACHE_SIZE) { int total = 0; for (i = 0; i < hc; i++) total += ev[i]->msize; printf("total %d\n", total); } 160 slots, memory usage: min=1178583 max=2048701 avg=1330104 dev=94747 q25=1266647 q50=1310287 q75=1369005 256 slots, memory usage: min=1670029 max=2674909 avg=1895076 dev=122062 q25=1828928 q50=1868214 q75=1916025 This indicates that average cache size is increased by about 42% from 1.27M to 1.81M; however, the third quartile is increased by about 40%, and the maximum size is increased only by about 31% from 1.95M to 2.55M. By which I conclude that extra 600K must be available even on low-memory machines like Raspberry Pi (256M RAM). * * * What's a good hit ratio? $ DepNames() { pkglist-query '[%{RequireName}\t%{RequireVersion}\n]' \ /var/lib/apt/lists/_ALT_Sisyphus_x86%5f64_base_pkglist.classic \| fgrep set: \|cut -f1; } $ DepNames \|wc -l 34763 $ DepNames \|sort -u \|wc -l 2429 $ DepNames \|sort \|uniq -c \|sort -n \|awk '$1>1{print$1}' \|Sum 33924 $ DepNames \|sort \|uniq -c \|sort -n \|awk '$1>1{print$1}' \|wc -l 1590 $ DepNames \|sort \|uniq -c \|sort -n \|tail -256 \|Sum 27079 $ We have 34763 set-versioned dependencies, which refer to 2429 sonames; however, only 33924 dependencies refer to 1590 sonames more than once, and the first reference is always a miss. Thus the best possible hit ratio (if we use at least 1590 slots) is (33924-1590)/34763=93.0%. What happens if we use only 256 slots? Assuming that dependencies are processed in random order, the best strategy must spend its cache slots on sonames with the most references. This way we can serve (27079-256) dependencies via cache hit, and so the best possible hit ratio for 256 slots is is 77.2%, assuming that dependencies are processed in random order.	2012-03-09 02:42:21 +04:00
Alexey Tourbin	0af7afd2e5	set.c: precompute r mask for putbits coroutines callgrind annotations for "apt-shell <<<unmet", previous commit: 2,424,712,279 PROGRAM TOTALS 813,389,804 lib/set.c:decode_base62_golomb 496,701,778 lib/set.c:rpmsetcmp callgrind annotations for "apt-shell <<<unmet", this commit: 2,406,630,571 PROGRAM TOTALS 795,320,289 lib/set.c:decode_base62_golomb 496,682,547 lib/set.c:rpmsetcmp	2012-03-09 00:51:58 +04:00
Alexey Tourbin	80cec29464	set.c: improved cache_decode_set loop I am going to consdier whether it is worthwhile to increase the cache size. Thus I have to ensure that the linear search won't be an obstacle for doing so. Particularly, its loop must be efficient in terms of both cpu instructions and memory access patterns. 1) On behalf of memory access patterns, this change introduces two separate arrays: hv[] with hash values and ev[] with actual cache entries. On x86-64, this saves 4 bytes per entry which have previously been wasted to align cache_hdr structures. This has some benefits on i686 as well: for example, ev[] is not accessed on a cache miss. 2) As per instructions, the loop has two branches: the first is for boundary checking, and the second is for matching hash condition. Since the boundary checking condition (cur->ent != NULL) relies on a sentinel, the loop cannot be unrolled; it takes 6 instructions per iteration. If we replace the condition with explicit boundary check (hp < hv + hc), the number of iterations becomes known upon entry to the loop, and gcc will unroll the loop; it takes now 3 instructions per iteration, plus some (smaller) overhead for boundary checking. This change also removes __thread specifiers, since gcc is apparently not very good at optimizing superfluous __tls_get_addr calls. Also, if we are to consider larger cache sizes, it becomes questionable whether each thread should posess its own cache only as a means of achieving thread safety. Anyway, currently I'm not aware of threaded applications which make concurrent librpm calls. callgrind annotations for "apt-shell <<<unmet", previous commit: 2,437,446,116 PROGRAM TOTALS 820,835,411 lib/set.c:decode_base62_golomb 510,957,897 lib/set.c:rpmsetcmp ... 23,671,760 for (cur = cache; cur->ent; cur++) { 1,114,800 => /usr/src/debug/glibc-2.11.3-alt7/elf/dl-tls.c:__tls_get_addr (69675x) 11,685,644 if (hash == cur->hash) { . ent = cur->ent; callgrind annotations for "apt-shell <<<unmet", this commit: 2,431,849,572 PROGRAM TOTALS 820,835,411 lib/set.c:decode_base62_golomb 496,682,547 lib/set.c:rpmsetcmp ... 10,204,175 for (hp = hv; hp < hv + hc; hp++) { 11,685,644 if (hash == *hp) { 189,344 i = hp - hv; 189,344 ent = ev[i]; Total improvement is not very impressive (6M instead of expected 14M), mostly due to memmove complications - hv[] cannot be shifted efficiently using 8-byte words. However, the code now scales better. Also, recent glibc versions supposedly provide much improved memmove implementation.	2012-03-08 02:35:59 +04:00
Alexey Tourbin	568fe52e61	set.c: reimplemented decode_base62_golomb using Knuth's coroutines Since the combined base62+golomb decoder is still the most expensive routine, I have to consider very clever tricks to give it a boost. In the routine, its "master logic" is executed on behalf of the base62 decoder: it makes bits from the string and passes them on to the "slave" golomb routine. The slave routine has to maintain its own state (doing q or doing r); after the bits are processed, it returns and base62 takes over. When the slave routine is invoked again, it has to recover the state and take the right path (q or r). These seemingly cheap state transitions can actually become relatively expensive, since the "if" clause involves branch prediction which is not particularly accurate on variable-length inputs. This change demonstrates that it is possible to get rid of the state-related instructions altogether. Roughly, the idea is that, instead of calling putNbits(), we can invoke "goto putNbits", and the pointer will dispatch either to putNbitsQ or putNbitsR label (we can do this with gcc's computed gotos). However, the goto will not return, and so the "putbits" guys will have to invoke "goto getbits", and so on. So it gets very similar to coroutines as described in [Knuth 1997, vol. 1, p. 194]. Furthermore, one must realize that computed gotos are not actually required: since the total number of states is relatively small - roughly (q^r)x(reg^esc,align) - it is possible to instantiate a few similar coroutines which pass control directly to the right labels. For example, the decoding is started with "get24q" coroutine - that is, we're in the "Q" mode and we try to grab 24 bits (for the sake of the example, I do not consider the initial align step). If 24 bits are obtained successfully, they are passed down to the "put24q" coroutine which, as its name suggests, takes over in the "Q" mode immediately; furthermore, in the "put24q" coroutine, the next call to get bits has to be either "get24q" or "get24r" (depending on whether Q or R is processed when no bits are left) - that is, the coroutine itself must "know" that there is no base62 complications at this point. The "get24r" is similar to "get24q" except that it will invoke "put24r" instead of "put24q". On the other hand, consider that, in the beginning, only 12 bits have been directly decoded (and the next 12 bits probably involve "Z"). We then pass control to "put12q", which will in turn call either "get12q" or "get12r" to handle irregular cases for the pending 12 bits (um, the names "get12q" and "get12r" are a bit of a misnomer). This change also removes another branch in golomb R->Q transition: r &= (1 << Mshift) - 1; v++ = (q << Mshift) \| r; q = 0; state = ST_VLEN; - if (left == 0) - return; bits >>= n - left; n = left; vlen: if (bits == 0) { q += n; return; } int vbits = __builtin_ffs(bits); ... This first "left no bits" check is now removed and performed implicitly by the latter "no need for bsf" check, with the result being far better than I expected. Perhaps it helps to understand that the condition "left exactly 0" rarely holds, but CPU is stuck by the check. So, Q and R processing step each now have exactly one branch (that is, exactly one condition which completes the step). Also, in the "put" coroutines, I simply make a sequence of Q and R steps; this produces a clean sequence of instructions which branches only when absolutely necessary. callginrd annotations for "apt-cache <<<unmet", previous commit: 2,671,717,564 PROGRAM TOTALS 1,059,874,219 lib/set.c:decode_base62_golomb 509,531,239 lib/set.c:rpmsetcmp callginrd annotations for "apt-cache <<<unmet", this commit: 2,426,092,837 PROGRAM TOTALS 812,534,481 lib/set.c:decode_base62_golomb 509,531,239 lib/set.c:rpmsetcmp	2012-03-07 01:27:20 +04:00
Alexey Tourbin	4d55d9fad0	set.c: better estimation of encode_base62_size	2012-02-19 08:43:36 +04:00
Alexey Tourbin	17452dba48	set.c: reimplmeneted downsampling unsing merges Most of the time, downsampling is needed for Provides versions, which are expensive, and values are reduced by only 1 bit, which can be implemented without sorting the values again. Indeed, only a merge is required. The array v[] can be split into two parts: the first part v1[] and the second part v2[], the latter having values with high bit set. After the high bit is stripped, v2[] values are still sorted. It suffices to merge v1[] and v2[]. Note that, however, a merge cannot be done inplace, and also we have to support 2 or more downsampling steps. We also want to avoid copying. This requires careful buffer management - each version needs two alternate buffers. callgrind annotations for "apt-cache <<<unmet", previous commit: 2,743,058,808 PROGRAM TOTALS 1,068,102,605 lib/set.c:decode_base62_golomb 509,186,920 lib/set.c:rpmsetcmp 131,678,282 stdlib/msort.c:msort_with_tmp'2 93,496,965 sysdeps/x86_64/strcmp.S:__GI_strcmp 91,066,266 sysdeps/x86_64/memcmp.S:bcmp 83,062,668 sysdeps/x86_64/strlen.S:__GI_strlen 64,584,024 sysdeps/x86_64/memcpy.S:memcpy callgrind annotations for "apt-cache <<<unmet", this commit: 2,683,295,262 PROGRAM TOTALS 1,068,102,605 lib/set.c:decode_base62_golomb 510,261,969 lib/set.c:rpmsetcmp 93,692,793 sysdeps/x86_64/strcmp.S:__GI_strcmp 91,066,275 sysdeps/x86_64/memcmp.S:bcmp 90,080,205 stdlib/msort.c:msort_with_tmp'2 83,062,524 sysdeps/x86_64/strlen.S:__GI_strlen 58,165,691 sysdeps/x86_64/memcpy.S:memcpy	2012-02-17 14:14:25 +04:00
Alexey Tourbin	692818eb72	set.c: combine and process 24 bits at a time callgrind annotations for "apt-shell <<<unmet", previous commit: 2,794,697,010 PROGRAM TOTALS 1,119,563,508 lib/set.c:decode_base62_golomb 509,186,920 lib/set.c:rpmsetcmp callgrind annotations for "apt-shell <<<unmet", this commit: 2,743,128,315 PROGRAM TOTALS 1,068,102,605 lib/set.c:decode_base62_golomb 509,186,920 lib/set.c:rpmsetcmp	2012-02-17 09:42:53 +04:00
Alexey Tourbin	7d414b68aa	set.c: use plain array to make linear search even simpler The only reason for using a linked list is to make LRU reordering O(1). This change replaces the linked list with a plain array. The inner loop is now very tight, but reordering involves memmove(3) and is O(N), since on average, half the array has to be shifted. Note, however, that the leading part of the array which is to be shifted is already there in L1 cache, and modern memmove(3) must be very efficient - I expect it to take much fewer instructions than the loop itself.	2012-02-17 09:42:00 +04:00
Alexey Tourbin	5d0932c8a0	set.c: use contiguous memory to facilitate linear search Recently I tried to implement another data structure similar to SVR2 buffer cache [Bach 1986], but the code got too complicated. So I still maintain that, for small cache sizes, linear search is okay. Dennis Ritchie famously argued that a linear search of a directory is efficient because it is bounded by the size of the directory [Ibid., p. 76]. Great minds think alike (and share similar views on a linear search). What can make the search slow, however, is not the loop per se, but rather memory loads: on average, about 67% entries have to be loaded (assuming 67% hit ratio), checked for entry->hash, and most probably followed by entry->next. With malloc'd cache entries, memory loads can be slow. To facilitate the search, this change introduces new structure "cache_hdr", which has only 3 members necessary for the search. The structures are pre-allocated in contiguous memory block. This must play nice with CPU caches, resulting in fewer memory loads and faster searches. Indeed, based on some measurements of "apt-shell <<<unmet", this change can demonstrate about 2% overall improvement in user time. Using more sophisticated SVR2-like data structure further improves the result only by about %0.5.	2012-02-11 06:44:55 +04:00
Alexey Tourbin	c3f705993b	set.c: fixed off-by-one error in barrier allocation	2012-02-11 04:46:23 +04:00
Dmitry V. Levin	3946369bfb	fsmStage: be careful with file permissions on package removal or upgrade Do not erase permissions from regular files on package removal or upgrade unless these files are both setXid and executable. It is legal to have regular system files linked somewhere, e.g. by chrooted installs, so we must be careful not to break these files.	2011-11-30 17:07:27 +00:00
Alexey Tourbin	55409f2b03	set.c: fixed assertion failure with malformed "empty set" set-string In decode_set_init(), we explicitly prohibit empty sets: // no empty sets for now if (str == '\0') return -4; This does not validate str character, since the decoder will check for errors anyway. However, this assumes that, otherwise, a non-empty set will be decoded. The assumption is wrong: it was actually possible to construct an "empty set" which triggered assertion failure. $ /usr/lib/rpm/setcmp yx00 yx00 setcmp: set.c:705: decode_delta: Assertion `c > 0' failed. zsh: abort /usr/lib/rpm/setcmp yx00 yx00 $ Here, the "00" part of the set-version yields a sequence of zero bits. Since trailing zero bits are okay, golomb decoding routine basically skips the whole sequence and returns 0. To fix the problem, we have to observe that only up to 5 trailing zero bits can be required to complete last base62 character, and the leading "0" sequence occupies 6 or more bits.	2011-10-03 05:28:00 +04:00
Alexey Tourbin	9e15c26f3f	removed support for repackaging and rollbacks (rpm.org)	2011-09-23 02:47:36 +04:00
Dmitry V. Levin	f74cea6470	Remove unsafe file permissions on package removal or upgrade Import rpm-4.2-owl-remove-unsafe-perms.diff from Owl, to remove unsafe file permissions (chmod'ing files to 0) on package removal or upgrade to prevent continued access to such files via hard-links possibly created by a user (CVE-2005-4889, CVE-2010-2059).	2011-09-07 21:37:40 +00:00
Alexey Tourbin	771548f6ec	set.c: increased cache size somewhat (128 -> 160) Below I use 'apt-shell <<<unmet' as a baseline for measurements. Cache performance with cache_size = 128: hit=39628 miss=22394 (64%) Cache performance with cache_size = 160: hit=42031 miss=19991 (68%) (11% fewer cache misses) Cache performance with cache_size = 160 pivot_size = 1 (plain LRU): hit=36172 miss=25850 (58%) Total number of soname set-versions which must be decoded at least once: miss=2173 (max 96%) callgrind annotations, 4.0.4-alt100.27: 3,904,042,289 PROGRAM TOTALS 1,378,794,846 decode_base62_golomb 1,176,120,148 rpmsetcmp 291,805,495 __GI_strcmp 162,494,544 __GI_strlen 162,222,530 msort_with_tmp'2 56,758,517 memcpy 53,132,375 __GI_strcpy ... callgrind annotations, this commit (rebuilt in hasher): 2,558,482,547 PROGRAM TOTALS 987,220,089 decode_base62_golomb 468,510,579 rpmsetcmp 162,222,530 msort_with_tmp'2 85,422,341 __GI_strcmp 82,063,609 bcmp 76,510,060 __GI_strlen 63,806,309 memcpy ... Inclusive rpmsetcmp annotation, this commit: 1,719,199,968 rpmsetcmp Typical execution time, 4.0.4-alt100.27: 1.87s user 0.29s system 96% cpu 2.242 total Typical execution time, this commit: 1.52s user 0.31s system 96% cpu 1.895 total Based on user time, this constitutes about 20% speed-up. For some reason, the speed-up is more noticable on i586 architecture (27%). Note that the cache should not be further increased, because of two reasons: 1) LRU search is linear - this is fixable; 2) cache memory cannot be reclaimed - this is unfixable. On average, the cache now takes 1.3M (max 2M). For small cache sizes, linear search is okay then (cache_decode_set costs about 20M Ir, which is less than memcmp). An interesting question is to what extent it is worth to increase the cache size, assuming that memory footprint is not an issue. A plausible answer is that decode_base62_golomb should cost no more than 1/2 of rpmsetcmp inclusive time, which is 987M Ir and 1,719M Ir respectively. So, Ideally, the cache should be increased up to the point where decode_base62_golomb takes about 700M Ir. Note, however, that using midpoint insertion technique seems to improve cache performance far more than simply increasing cache size.	2011-06-18 22:54:51 +04:00
Alexey Tourbin	d98cab549d	set.c: more redesign to avoid extra copying and strlen This partially reverts what's been introduced with previous commit. Realize that strlen() must be only called when allocating space for v[]. There is no reason to call strlen() for every Provides string, since most of them are decoded via the cache hit. Note, however, that now I have to use the following trick: memcmp(str, cur->str, cur->len + 1) == 0 I rely on the fact this works as expected even when str is shorter than cur->len. Namely, memcmp must start from lower addresses and stop at the first difference (i.e. memcmp must not read past the end of str, possibly except for a few trailing bytes on the same memory page); this is not specified by the standard, but this is how it must work. Also, since the cache now stores full decoded values, it is possible to avoid copying and instead to set the pointer to internal cache memory. Copying must be performed, however, when the set is to be downsampled. Note that average Provides set size is around 1024, which corresponds to base62 string length of about 2K and v[] of 4K. Saving strlen(2K) and memcpy(4K) on every rpmsetcmp call is indeed an improvement. callgrind annotations for "apt-cache unmet", 4.0.4-alt100.27 1,900,016,996 PROGRAM TOTALS 694,132,522 decode_base62_golomb 583,376,772 rpmsetcmp 106,136,459 __GI_strcmp 102,581,178 __GI_strlen 80,781,386 msort_with_tmp'2 38,648,490 memcpy 26,936,309 __GI_strcpy 26,918,522 regionSwab.clone.2 21,000,896 _int_malloc ... callgrind annotations for "apt-cache unmet", this commit (rebuilt in hasher): 1,264,977,497 PROGRAM TOTALS 533,131,492 decode_base62_golomb 230,706,690 rpmsetcmp 80,781,386 msort_with_tmp'2 60,541,804 __GI_strlen 42,518,368 memcpy 39,865,182 bcmp 26,918,522 regionSwab.clone.2 21,841,085 _int_malloc ...	2011-06-16 00:49:41 +04:00
Alexey Tourbin	91d560c35c	set.c: redesigned decode API to avoid extra strlen/cmp/cpy calls Now that string functions are expensive, the API is redesigned so that strlen is called only once, in rpmsetcmp. The length is then passed as an argument down to decoding functions. With the length argument, it is now possible to replace strcmp with memcmp and strcpy with memcpy.	2011-06-14 00:43:33 +04:00
Alexey Tourbin	4d6a444af4	set.c: minor cleanup and English fixes "Effectively avoided" means something like "prakticheski avoided" in Russian. Multiple escapse are not avoided "prakticheski", though; they are avoided altogether and "in principle". The right word does not come to mind.	2011-06-14 00:00:54 +04:00
Alexey Tourbin	68df596fd7	set.c: removed support for caching short deltas, shrinked cache Now that decode_base62_golomb is much cheaper, the question is: is it still worth to store short deltas, as opposed to storing full values at the expense of shrinking the cache? callgrind annotations for previous commit: 1,526,256,208 PROGRAM TOTALS 470,195,400 decode_base62_golomb 434,006,244 rpmsetcmp 106,137,949 __GI_strcmp 102,459,314 __GI_strlen ... callgrind annotations for this commit: 1,427,199,731 PROGRAM TOTALS 533,131,492 decode_base62_golomb 231,592,751 rpmsetcmp 103,476,056 __GI_strlen 102,008,203 __GI_strcmp ... So, decode_base62_golomb now takes more cycles, but the overall price goes down. This is because, when caching short deltas, two additional stages should be performed: 1) short deltas must be copied into unsigned v[] array; 2) decode_delta must be invoked to recover hash values. Both stages iterate on per-value basis and both are seemingly fast. However, they are not that fast when both of them are replaced with bare memcpy, which uses xmm registers or something like this.	2011-06-10 23:58:43 +04:00
Alexey Tourbin	3ff35a310c	set.c: improved rpmsetcmp main loop performance The loop is logically impeccable, but its main condition (v1 < v1end && v2 < v2end) is somewhat redundant: in two of the three cases, only one pointer gets advanced. To save instructions, the conditions are now handled within the cases. The loop is now a while (1) loop, a disguised form of goto. Also not that, when comparing Requires against Provides, the Requires is usually sparse: P: a b c d e f g h i j k l ... R: a c h j ... This means that a nested loop which skips intermediate Provides elements towards the next Requires element may improve performance. while (v1 < v1end && v1 < v2) v1++; However, note that the first condition (v1 < v1end) is also somewhat redundant. This kind of boundary checking can be partially omitted if the loop gets unrolled. There is a better technique, however, called the barrier: v1end must contain the biggest element possible, so that the trailing v1 is never smaller than any of v2. The nested loop is then becomes as simple as while (v1 < *v2) v1++; callgrind annotations, 4.0.4-alt100.27: 1,899,657,916 PROGRAM TOTALS 694,132,522 decode_base62_golomb 583,376,772 rpmsetcmp 106,225,572 __GI_strcmp 102,459,314 __GI_strlen ... callgrind annotations, this commit (rebuilt in hasher): 1,526,256,208 PROGRAM TOTALS 470,195,400 decode_base62_golomb 434,006,244 rpmsetcmp 106,137,949 __GI_strcmp 102,459,314 __GI_strlen ... Note that rpmsetcmp also absorbs cache_decode_set and decode_delta; the loop is now about twice as faster.	2011-06-10 15:12:33 +04:00
Alexey Tourbin	2651bb3246	set.c: unindented rpmsetcmp	2011-06-10 10:50:05 +04:00
Alexey Tourbin	0cfbd8401f	set.c: use __builtin_ffs to count vlen bits	2011-06-08 10:29:02 +04:00
Alexey Tourbin	292af70160	spec, lib/Makefile.am: compile and run set.c in -DSELF_TEST mode	2011-06-07 10:50:10 +04:00
Alexey Tourbin	57e25bb189	set.c: implemented two-bytes-at-a-time base62 decoding callgrind annotations, 4.0.4-alt100.27: 1,899,576,194 PROGRAM TOTALS 694,132,522 decode_base62_golomb 583,376,772 rpmsetcmp 106,136,459 __GI_strcmp 102,459,362 __GI_strlen ... callgrind annotations, this commit (built in hasher): 1,691,904,239 PROGRAM TOTALS 583,395,352 rpmsetcmp 486,433,168 decode_base62_golomb 106,122,657 __GI_strcmp 102,458,654 __GI_strlen	2011-06-07 10:49:48 +04:00
Alexey Tourbin	238e421ad3	set.c: use long subscript for table lookup, to avoid extra movslq instructions	2011-05-25 08:20:06 +04:00
Alexey Tourbin	97ff0102cd	set.c: improved base62_decode table lookup The whole point of using a table is not only that comparisons like (c >= 'a' && c <= 'z') can be eliminated; but also that conditional branches (the "ands" and "ifs") should be eliminated as well. The existing code, however, uses separate branches to check e.g. for the end of string; to check for an error; and to check for the (num6b < 61) common case. With this change, the table is restructured so that the common case will be handled with only a single instruction.	2011-05-25 08:18:40 +04:00
Alexey Tourbin	e061586385	build/files.c (finalizePkg): calculate RPMTAG_SIZE after optimizations Note that checkHardLinks function is now removed. It was unclear whether it was supposed to verify %lang attributes (returning non-zero on error) or indicate if all hardlinks are packaged within the package. It turns out that only a single package in our repo has PartialHardlinkSets dependency: $ cd /ALT/Sisyphus/files/x86_64/RPMS/ $ rpm -qp --qf '[%{NAME}\t%{REQUIRENAME}\n]' .rpm \|fgrep 'PartialHardlinkSets' $ cd /ALT/Sisyphus/files/noarch/RPMS/ $ rpm -qp --qf '[%{NAME}\t%{REQUIRENAME}\n]' .rpm \|fgrep 'PartialHardlinkSets' freeciv-common rpmlib(PartialHardlinkSets) $ This probably means that freeciv-common has hardlinks with different %lang attributes (which probably was supposed to be an error). So the whole issue should be reconsidered. A leave XXX marks in the code and suggest new PartialHardlinkSets implementation (however, the dependency is not being added yet).	2011-02-05 03:49:54 +03:00
Alexey Tourbin	2d3c3cef27	removed ancient dependency loop whiteout mechanism (rpm.org)	2011-01-23 02:30:59 +03:00
Alexey Tourbin	42b139d1eb	removed --fileid query selector and Filemd5s rpmdb index (rpm.org)	2011-01-22 17:35:13 +03:00
Alexey Tourbin	fad9df878b	set.c: tweak LRU first-time insertion policy Pushing new elements to the front tends to assign extra weight to that elements, at the expense of other elements that are already in the cache. The idea is then to try first-time insertion somewhere in the middle. Further attempts suggest that the "pivot" should be closer to the end. Cache performance for "apt-shell <<<unmet", previous commit: hit=62375 miss=17252 Cache performance for "apt-shell <<<unmet", this commit: hit=65085 miss=14542	2011-01-07 06:45:38 +03:00

1 2 3 4 5 ...

322 Commits