rpm-build

Author	SHA1	Message	Date
Dmitry V. Levin	7ac394b01f	rpmrc.in: use -mtune=generic for other x86 flavours	2012-12-22 15:02:31 +00:00
Dmitry V. Levin	b6b5de93b3	Revert "rpmrc.in: Use -mtune=i686 instead of -mtune=generic for i[3456]86" This reverts commit `94a5f49889`, which is almost 6 year old, and the rationale for it is no longer relevant.	2012-12-22 14:57:30 +00:00
Dmitry V. Levin	5a8fdc10de	pkgconfig.req.files: ignore file type, treat all non-symlinks the same way Stop relying on file(1) output for obvious reasons (see e.g. ALT#28261), assume that all non-symlinks in pkgconfig directories are valid input.	2012-12-22 14:41:22 +00:00
Dmitry V. Levin	bd8eb51d08	4.0.4-alt100.57 - Build selinux support in dynamically linked objects only. - %configure: export -m* part of %optflags as ASFLAGS (for assembler) along with other *FLAGS exported for compilers.	2012-10-28 23:48:23 +00:00
Dmitry V. Levin	0f3f8af1f6	Run the same setcmp test in --without-profile mode as well	2012-10-28 23:48:20 +00:00
Dmitry V. Levin	1949b7d5df	platform.in (%configure): Export -m* part of %optflags as ASFLAGS Some of %optflags options, -m* in particular, have to be passed to assembler so that the output it produces would be consistent with the output made by compilers.	2012-10-28 23:48:09 +00:00
Dmitry V. Levin	8aa38ccc49	Build selinux support in dynamically linked objects only	2012-10-09 00:21:33 +00:00
Dmitry V. Levin	8666efb7ff	4.0.4-alt100.56 - Removed obsolete getdate.y.	2012-08-31 20:25:22 +00:00
Dmitry V. Levin	427a26e82a	Remove getdate.y Removal of rollback support by 4.0.4-alt100.35-1-g9e15c26 made getdate obsolete.	2012-08-31 20:11:21 +00:00
Dmitry V. Levin	6838569328	4.0.4-alt100.55 - %patch: added -F<N> support (by Igor Vlasenko; closes: #27662). - 0ldconfig.filetrigger: execute "telinit u" if appropriate (see: #27666).	2012-08-30 23:08:53 +00:00
Dmitry V. Levin	7d84f7da5a	0ldconfig.filetrigger: execute telinit if appropriate	2012-08-30 23:05:00 +00:00
Igor Vlasenko	8468c6fb06	build/parsePrep.c: add %patch -F<N> support (ALT#27662) In modern rpms both %patch -F <N> and %patch -F<N> are valid option calls whereas old -F implementation supported -F <N> syntax only. This patch adds support for %patch -F<N> syntax.	2012-08-24 08:30:44 +00:00
Dmitry V. Levin	f83f47f6f1	4.0.4-alt100.54 - Added armh arch support (by Sergey Bolshakov; closes: #26253). - GROUPS: added Graphical desktop/MATE (by Igor Vlasenko; closes: #27626). - %patch: added -F/-d options and appropriate macros for better spec file compatibility (by Igor Vlasenko; closes: #27627). - %configure: update config.sub and config.guess right before configure. - debugedit: backported DWARF-4 support from rpm.org.	2012-08-17 16:41:30 +00:00
Sergey Bolshakov	56d12e6708	platform.in: add hook to update config.sub and config.guess right before configure	2012-08-17 16:37:18 +00:00
Panu Matilainen	ca9831743b	Teach debugedit about .debug_macro dwarf section (RhBug:759272)	2012-08-17 16:36:15 +00:00
Jakub Jelinek	0bed5fc55d	Add DWARF-4 support to debugedit (RhBug:707677) Signed-off-by: Panu Matilainen <pmatilai@redhat.com>	2012-08-17 16:34:56 +00:00
Panu Matilainen	be7b36d39b	Warn but dont fail the build on STABS debuginfo (RhBug:725378, others) - debugedit doesn't support STABS but there are some crazy cases like PPC Linux kernel which contains both STABS and DWARF debuginfo sections, manually added. A better fix would be erroring out if we didn't find any usable debuginfo and warning otherwise but this at least folks get their kernels built.	2012-08-17 16:32:02 +00:00
Panu Matilainen	a2b0166bb3	Bail out of debuginfo if stabs format encountered (RhBug:453506) The previous "silently ignore" policy produces bogus debuginfo packages on some architectures and fails with other mysterious errors on others, better just fail hard until (if ever) somebody adds stabs support.	2012-08-17 16:31:18 +00:00
Dmitry V. Levin	06e9e6e9cc	Add gnu-config to rpm-build requirements	2012-08-17 01:43:55 +00:00
Dmitry V. Levin	c4a3fcd4ca	Cleanup previous commit	2012-08-16 23:16:57 +00:00
Igor Vlasenko	f1e0f83d55	build/parsePrep.c: %patch -F, -d options and macros (ALT#27627) Backport from rpm 4.10: %patch -F <fuzz> %patch -d <destdir> support for macros: %{_default_patch_fuzz} Default fuzz level for %patch in spec file. %{_default_patch_flags} Default patch flags	2012-08-16 22:57:17 +00:00
Igor Vlasenko	3ad21f4eb2	GROUPS: add Graphical desktop/MATE (ALT#27626)	2012-08-16 22:56:05 +00:00
Sergey Bolshakov	d289faa950	rpmrc.in, macros.in, installplatform: introduce armh arch	2012-08-14 13:20:54 +04:00
Dmitry V. Levin	dd6ab30865	4.0.4-alt100.53 - brp-fix-perms: fixed "find -perm" syntax.	2012-08-08 13:16:07 +00:00
Dmitry V. Levin	d78a04e393	brp-fix-perms: fix "find -perm" syntax	2012-08-08 13:15:41 +00:00
Panu Matilainen	31d64932f7	"prereq" is not a valid qualifier to regular Requires (cherry picked from commit cce0fb4387089db4b860a042bf0a163fd3ea7b6f)	2012-08-07 05:24:57 +00:00
Dmitry V. Levin	6ce84ceff3	4.0.4-alt100.52 - 0common-files.req.list: added /etc/sudoers.d directory.	2012-07-12 10:03:37 +00:00
Dmitry V. Levin	8d16b4186d	0common-files.req.list: add /etc/sudoers.d	2012-07-12 10:02:33 +00:00
Dmitry V. Levin	4af3824e1b	4.0.4-alt100.51 - find-lang: added --all-name option (by Igor Vlasenko; closes: #27284).	2012-05-24 17:50:40 +00:00
Igor Vlasenko	da9e0cf514	scripts/find-lang.in: add --all-name option (ALT#27284) Add PLD/Fedora compatible --all-name option (by mkochano,pascalek@PLD).	2012-05-24 13:44:54 +00:00
Dmitry V. Levin	cacba087ef	4.0.4-alt100.50 - Fixed build with ld --no-copy-dt-needed-entries.	2012-05-21 01:33:43 +00:00
Dmitry V. Levin	787f805585	Fix build with ld --no-copy-dt-needed-entries	2012-05-21 00:41:34 +00:00
Dmitry V. Levin	8c1ee3ba41	4.0.4-alt100.49 - platform.in: Added %_unitdir macro. - Fixed build with new automake.	2012-05-10 21:31:58 +00:00
Dmitry V. Levin	e7cd0f8ed2	platform.in: add %_unitdir	2012-05-10 11:37:08 +00:00
Dmitry V. Levin	8d06e06fbc	Makefile.am: fix build with new automake	2012-04-13 17:09:14 +00:00
Dmitry V. Levin	f9b82ab936	python/Makefile.am: fix typo	2012-04-13 16:54:02 +00:00
Dmitry V. Levin	552f5329c9	4.0.4-alt100.48 - parseSpec: + fixed long lines processing; + made size of line buffer configurable via %_spec_line_buffer_size.	2012-03-19 16:19:55 +00:00
Dmitry V. Levin	12538e4010	parseSpec: use getline() to fix long line processing * build/rpmspec.h (OpenFileInfo): Change readBuf to a pointer, add readBufSize. (freeOpenFileInfo): New prototype. * build/spec.c (freeSpec): Initialize readBuf and readBufSize. (freeOpenFileInfo): New function. * build/parseSpec.c (readLine): Use getline and freeOpenFileInfo. (closeSpec): Use freeOpenFileInfo.	2012-03-19 19:02:25 +00:00
Dmitry V. Levin	e870902a7b	build: make size of line buffer for .spec parsing configurable via %_spec_line_buffer_size Based on http://rpm5.org/cvs/chngview?cn=9215 In the meantime, decrease this default line buffer size from 80K to 64K.	2012-03-19 01:29:03 +00:00
Dmitry V. Levin	a470a11bac	parseSpec: enhance line buffer overflow diagnostics	2012-03-19 18:18:03 +00:00
Dmitry V. Levin	060fd2e340	parseSpec: implement line buffer overflow protection * build/parseSpec.c (copyNextLine): Protect spec->lbuf line buffer from overflow.	2012-03-19 01:13:13 +00:00
Alexey Tourbin	b20a5248ea	4.0.4-alt100.47 - set.c: Reimplemented base62+golomb decoder using Knuth's coroutines. - set.c: Increased cache size from 160 to 256 slots, 75% hit ratio. - set.c: Implemented 4-byte and 8-byte steppers for rpmsetcmp main loop.	2012-03-15 07:34:01 +04:00
Alexey Tourbin	798ce0db28	set.c: implemented 4-byte and 8-byte steppers for rpmsetcmp main loop Provides versions, on average, are about 34 times longer that Requires versions. More precisely, if we consider all rpmsetcmp calls for "apt-shell <<<unmet" command, then sum(c1)/sum(c2)=33.88. This means that we can save some time and instructions by skipping intermediate bytes - in other words, by stepping a few bytes at a time. Of course, after all the bytes are skipped, we must recheck a few final bytes and possibly step back. Also, this requires more than one sentinel for proper boundary checking. This change implements two such "steppers" - 4-byte stepper for c1/c2 ratio below 16 and 8-byte stepper which is used otherwise. When stepping back, both steppers use bisecting. Note that replacing last two bisecting steps with a simple loop might be actually more efficient with respect to branch prediction and CPU's BTB. It is very hard to measure any user time improvement, though, even in a series of 100 runs. The improvement is next to none, at least on older AMD CPUs. And so I choose to keep bisecting. callgrind annotations for "apt-shell <<<unmet", previous commit: 2,279,520,414 PROGRAM TOTALS 646,107,201 lib/set.c:decode_base62_golomb 502,438,804 lib/set.c:rpmsetcmp 98,243,148 sysdeps/x86_64/memcmp.S:bcmp 93,038,752 sysdeps/x86_64/strcmp.S:__GI_strcmp callgrind annotations for "apt-shell <<<unmet", this commit: 2,000,254,692 PROGRAM TOTALS 642,039,009 lib/set.c:decode_base62_golomb 227,036,590 lib/set.c:rpmsetcmp 98,247,798 sysdeps/x86_64/memcmp.S:bcmp 93,047,422 sysdeps/x86_64/strcmp.S:__GI_strcmp	2012-03-15 07:02:09 +04:00
Alexey Tourbin	d78a2cbf3d	set.c: increased cache size from 160 to 256 slots, 75% hit ratio Hit ratio for "apt-shell <<<unmet" command: 160 slots: hit=46813 miss=22862 67.2% 256 slots: hit=52238 miss=17437 75.0% So, we've increased the cache size by a factor of 256/160=1.6 or by 60%, and the number of misses has decreased by a factor of 22862/17437=1.31 or by 1-17437/22862=23.7%. This is not so bad, but it looks like we're paying more for less. The following analysis shows that this is not quite true, since the real memory usage has increased by a somewhat smaller factor. 160 slots, callgrind annotations: 2,406,630,571 PROGRAM TOTALS 795,320,289 lib/set.c:decode_base62_golomb 496,682,547 lib/set.c:rpmsetcmp 93,466,677 sysdeps/x86_64/strcmp.S:__GI_strcmp 91,323,900 sysdeps/x86_64/memcmp.S:bcmp 90,314,290 stdlib/msort.c:msort_with_tmp'2 83,003,684 sysdeps/x86_64/strlen.S:__GI_strlen 58,300,129 sysdeps/x86_64/memcpy.S:memcpy ... inclusive: 1,458,467,003 lib/set.c:rpmsetcmp 256 slots, callgrind annotations: 2,246,961,708 PROGRAM TOTALS 634,410,352 lib/set.c:decode_base62_golomb 492,003,532 lib/set.c:rpmsetcmp 95,643,612 sysdeps/x86_64/memcmp.S:bcmp 93,467,414 sysdeps/x86_64/strcmp.S:__GI_strcmp 90,314,290 stdlib/msort.c:msort_with_tmp'2 79,217,962 sysdeps/x86_64/strlen.S:__GI_strlen 56,509,877 sysdeps/x86_64/memcpy.S:memcpy ... inclusive: 1,298,977,925 lib/set.c:rpmsetcmp So the decoding routine now takes about 20% fewer instructions, and inclusive rpmsetcmp cost is reduced by about 11%. Note, however, that bcmp is now the third most expensive routine (due to higher hit ratio). Since recent glibc versions provide optimized memcmp implementations, I imply that total/inclusive improvement can be somewhat better than 11%. As per memory usage, the question "how much the cache takes" cannot be generally answered with a single number. However, if we simply sum the size of all malloc'd chunks on each rpmsetcmp invocation, using the piece of code with a few obvious modifications elsewhere, we can obtain the following statistics. if (hc == CACHE_SIZE) { int total = 0; for (i = 0; i < hc; i++) total += ev[i]->msize; printf("total %d\n", total); } 160 slots, memory usage: min=1178583 max=2048701 avg=1330104 dev=94747 q25=1266647 q50=1310287 q75=1369005 256 slots, memory usage: min=1670029 max=2674909 avg=1895076 dev=122062 q25=1828928 q50=1868214 q75=1916025 This indicates that average cache size is increased by about 42% from 1.27M to 1.81M; however, the third quartile is increased by about 40%, and the maximum size is increased only by about 31% from 1.95M to 2.55M. By which I conclude that extra 600K must be available even on low-memory machines like Raspberry Pi (256M RAM). * * * What's a good hit ratio? $ DepNames() { pkglist-query '[%{RequireName}\t%{RequireVersion}\n]' \ /var/lib/apt/lists/_ALT_Sisyphus_x86%5f64_base_pkglist.classic \| fgrep set: \|cut -f1; } $ DepNames \|wc -l 34763 $ DepNames \|sort -u \|wc -l 2429 $ DepNames \|sort \|uniq -c \|sort -n \|awk '$1>1{print$1}' \|Sum 33924 $ DepNames \|sort \|uniq -c \|sort -n \|awk '$1>1{print$1}' \|wc -l 1590 $ DepNames \|sort \|uniq -c \|sort -n \|tail -256 \|Sum 27079 $ We have 34763 set-versioned dependencies, which refer to 2429 sonames; however, only 33924 dependencies refer to 1590 sonames more than once, and the first reference is always a miss. Thus the best possible hit ratio (if we use at least 1590 slots) is (33924-1590)/34763=93.0%. What happens if we use only 256 slots? Assuming that dependencies are processed in random order, the best strategy must spend its cache slots on sonames with the most references. This way we can serve (27079-256) dependencies via cache hit, and so the best possible hit ratio for 256 slots is is 77.2%, assuming that dependencies are processed in random order.	2012-03-09 02:42:21 +04:00
Alexey Tourbin	0af7afd2e5	set.c: precompute r mask for putbits coroutines callgrind annotations for "apt-shell <<<unmet", previous commit: 2,424,712,279 PROGRAM TOTALS 813,389,804 lib/set.c:decode_base62_golomb 496,701,778 lib/set.c:rpmsetcmp callgrind annotations for "apt-shell <<<unmet", this commit: 2,406,630,571 PROGRAM TOTALS 795,320,289 lib/set.c:decode_base62_golomb 496,682,547 lib/set.c:rpmsetcmp	2012-03-09 00:51:58 +04:00
Alexey Tourbin	d1650ccdfe	spec: use shuf instead of sort -R to prepare profile data In sort -R output, identical lines adhere to each other. Manpage says that -R sorts by random hash of keys, which probably means that, a random hash function, when applied to the same keys, makes the same hash value. What we need instead is a random permutation of the input lines, though.	2012-03-09 00:51:42 +04:00
Alexey Tourbin	80cec29464	set.c: improved cache_decode_set loop I am going to consdier whether it is worthwhile to increase the cache size. Thus I have to ensure that the linear search won't be an obstacle for doing so. Particularly, its loop must be efficient in terms of both cpu instructions and memory access patterns. 1) On behalf of memory access patterns, this change introduces two separate arrays: hv[] with hash values and ev[] with actual cache entries. On x86-64, this saves 4 bytes per entry which have previously been wasted to align cache_hdr structures. This has some benefits on i686 as well: for example, ev[] is not accessed on a cache miss. 2) As per instructions, the loop has two branches: the first is for boundary checking, and the second is for matching hash condition. Since the boundary checking condition (cur->ent != NULL) relies on a sentinel, the loop cannot be unrolled; it takes 6 instructions per iteration. If we replace the condition with explicit boundary check (hp < hv + hc), the number of iterations becomes known upon entry to the loop, and gcc will unroll the loop; it takes now 3 instructions per iteration, plus some (smaller) overhead for boundary checking. This change also removes __thread specifiers, since gcc is apparently not very good at optimizing superfluous __tls_get_addr calls. Also, if we are to consider larger cache sizes, it becomes questionable whether each thread should posess its own cache only as a means of achieving thread safety. Anyway, currently I'm not aware of threaded applications which make concurrent librpm calls. callgrind annotations for "apt-shell <<<unmet", previous commit: 2,437,446,116 PROGRAM TOTALS 820,835,411 lib/set.c:decode_base62_golomb 510,957,897 lib/set.c:rpmsetcmp ... 23,671,760 for (cur = cache; cur->ent; cur++) { 1,114,800 => /usr/src/debug/glibc-2.11.3-alt7/elf/dl-tls.c:__tls_get_addr (69675x) 11,685,644 if (hash == cur->hash) { . ent = cur->ent; callgrind annotations for "apt-shell <<<unmet", this commit: 2,431,849,572 PROGRAM TOTALS 820,835,411 lib/set.c:decode_base62_golomb 496,682,547 lib/set.c:rpmsetcmp ... 10,204,175 for (hp = hv; hp < hv + hc; hp++) { 11,685,644 if (hash == *hp) { 189,344 i = hp - hv; 189,344 ent = ev[i]; Total improvement is not very impressive (6M instead of expected 14M), mostly due to memmove complications - hv[] cannot be shifted efficiently using 8-byte words. However, the code now scales better. Also, recent glibc versions supposedly provide much improved memmove implementation.	2012-03-08 02:35:59 +04:00
Alexey Tourbin	568fe52e61	set.c: reimplemented decode_base62_golomb using Knuth's coroutines Since the combined base62+golomb decoder is still the most expensive routine, I have to consider very clever tricks to give it a boost. In the routine, its "master logic" is executed on behalf of the base62 decoder: it makes bits from the string and passes them on to the "slave" golomb routine. The slave routine has to maintain its own state (doing q or doing r); after the bits are processed, it returns and base62 takes over. When the slave routine is invoked again, it has to recover the state and take the right path (q or r). These seemingly cheap state transitions can actually become relatively expensive, since the "if" clause involves branch prediction which is not particularly accurate on variable-length inputs. This change demonstrates that it is possible to get rid of the state-related instructions altogether. Roughly, the idea is that, instead of calling putNbits(), we can invoke "goto putNbits", and the pointer will dispatch either to putNbitsQ or putNbitsR label (we can do this with gcc's computed gotos). However, the goto will not return, and so the "putbits" guys will have to invoke "goto getbits", and so on. So it gets very similar to coroutines as described in [Knuth 1997, vol. 1, p. 194]. Furthermore, one must realize that computed gotos are not actually required: since the total number of states is relatively small - roughly (q^r)x(reg^esc,align) - it is possible to instantiate a few similar coroutines which pass control directly to the right labels. For example, the decoding is started with "get24q" coroutine - that is, we're in the "Q" mode and we try to grab 24 bits (for the sake of the example, I do not consider the initial align step). If 24 bits are obtained successfully, they are passed down to the "put24q" coroutine which, as its name suggests, takes over in the "Q" mode immediately; furthermore, in the "put24q" coroutine, the next call to get bits has to be either "get24q" or "get24r" (depending on whether Q or R is processed when no bits are left) - that is, the coroutine itself must "know" that there is no base62 complications at this point. The "get24r" is similar to "get24q" except that it will invoke "put24r" instead of "put24q". On the other hand, consider that, in the beginning, only 12 bits have been directly decoded (and the next 12 bits probably involve "Z"). We then pass control to "put12q", which will in turn call either "get12q" or "get12r" to handle irregular cases for the pending 12 bits (um, the names "get12q" and "get12r" are a bit of a misnomer). This change also removes another branch in golomb R->Q transition: r &= (1 << Mshift) - 1; v++ = (q << Mshift) \| r; q = 0; state = ST_VLEN; - if (left == 0) - return; bits >>= n - left; n = left; vlen: if (bits == 0) { q += n; return; } int vbits = __builtin_ffs(bits); ... This first "left no bits" check is now removed and performed implicitly by the latter "no need for bsf" check, with the result being far better than I expected. Perhaps it helps to understand that the condition "left exactly 0" rarely holds, but CPU is stuck by the check. So, Q and R processing step each now have exactly one branch (that is, exactly one condition which completes the step). Also, in the "put" coroutines, I simply make a sequence of Q and R steps; this produces a clean sequence of instructions which branches only when absolutely necessary. callginrd annotations for "apt-cache <<<unmet", previous commit: 2,671,717,564 PROGRAM TOTALS 1,059,874,219 lib/set.c:decode_base62_golomb 509,531,239 lib/set.c:rpmsetcmp callginrd annotations for "apt-cache <<<unmet", this commit: 2,426,092,837 PROGRAM TOTALS 812,534,481 lib/set.c:decode_base62_golomb 509,531,239 lib/set.c:rpmsetcmp	2012-03-07 01:27:20 +04:00
Alexey Tourbin	63da57c20c	4.0.4-alt100.46 - set.c: Fixed bad sentinel due to off-by-one error in alt100.28. - set.c: Improved linear cache search by using contiguous memory block. - set.c: Improved decoding by combining and processing 24 bits at a time. - set.c: Reimplemented downsampling using merges instead of full qsort(3). - cpp.req: Implemented global/hierarchical mode in which subordinate files are processed implicitly, resulting in fewer failures and major speed-up. - cpp.req: Recover missing refs due to cpp "once-only header" optimization.	2012-02-19 19:34:49 +04:00
Alexey Tourbin	53661a9938	cpp.req: fix double buildroot in filename-specific -I options	2012-02-19 19:09:55 +04:00

... 2 3 4 5 6 ...

2245 Commits