IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
We don't include source4/selftest/provisions/ in source tarballs!
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Mon Dec 5 08:22:29 UTC 2022 on sn-devel-184
lib/compression/tests/test_lzx_huffman.c: In function ‘test_lzxpress_huffman_overlong_matches’:
lib/compression/tests/test_lzx_huffman.c:1013:35: error: ‘j’ may be used uninitialized [-Werror=maybe-uninitialized]
1013 | assert_int_equal(score, i * j);
| ^
lib/compression/tests/test_lzx_huffman.c:979:19: note: ‘j’ was declared here
979 | size_t i, j;
| ^
lib/compression/tests/test_lzx_huffman.c: In function ‘test_lzxpress_huffman_overlong_matches_abc’:
lib/compression/tests/test_lzx_huffman.c:1059:39: error: ‘k’ may be used uninitialized [-Werror=maybe-uninitialized]
1059 | assert_int_equal(score, i * j * k);
| ^
lib/compression/tests/test_lzx_huffman.c:1020:22: note: ‘k’ was declared here
1020 | size_t i, j, k;
| ^
lib/compression/tests/test_lzx_huffman.c:1059:35: error: ‘j’ may be used uninitialized [-Werror=maybe-uninitialized]
1059 | assert_int_equal(score, i * j * k);
| ^
lib/compression/tests/test_lzx_huffman.c:1020:19: note: ‘j’ was declared here
1020 | size_t i, j, k;
| ^
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Sun Dec 4 09:12:30 UTC 2022 on sn-devel-184
These functions are now only called from check_chown in posix_acls.c
Signed-off-by: Christof Schmitt <cs@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
The same assignment is already done earlier, and nothing is changed in
between.
Signed-off-by: Christof Schmitt <cs@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
This uses the same hash table method as lzxpress_huffman, though the
code can't be directly reused as the sizes of the offsets is
different, and there is not a block processing step here.
This will worsen the compression ratio compared to the exhaustive
search we previously used, though we still perform better than
Windows. To put numbers on it, the test files used to compress to 0.91
of Windows' compression size, and now they compress to 0.96.
On the other hand this is many orders of magnitude faster. It is
difficult to say exactly how much faster -- while the testsuite time
has only improved 200-fold (from 7 minutes to 2 seconds), most of the
remaining 2 seconds is used in data generation and management, not
compression. OSSFuzz consistently finds new vectors that time out
after a minute; on these we'll see nearly an order of magnitude of
orders of magnitude inprovement.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Autobuild-User(master): Joseph Sutton <jsutton@samba.org>
Autobuild-Date(master): Fri Dec 2 00:00:04 UTC 2022 on sn-devel-184
This makes it easier to rework the encoding decision to depend on a
hash table match rather than the current exhaustive search.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This will make it possible to move encoding operations into helper
functions, which will make it easier to restructure the code to use a
hash table for faster matching.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
These are based on (i.e. copied and pasted from) the LZ77 + Huffman
tests.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Everything that is in testdata/compression/lzxpress-huffman/ can also
be used for lzxpress plain tests, which is something we really need.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
We are going to change from a slow exact match algorithm to a fast
heuristic search that will not always get the same results as the
exhaustive search.
To be precise, a million zeros will compress to 112 rather than 93 bytes.
We don't insist on an exact size, because that is not an issue here.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Mainly so I can go
make bin/test_lzxpress_plain && bin/test_lzxpress_plain
valgrind bin/test_lzxpress_plain
rr bin/test_lzxpress_plain
rr replay
in a tight loop.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
st/summary is useless. If you'll find anything, it'll be in st/subunit.
However, in case *something* useful ever ends up there we still mention it.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
If compiled on Windows using Cygwin, MSYS2, or similar, this will output
compressed versions of files exactly as specified by MZ-XCA, if the
following conditions are met:
1. The file > 300 bytes.
2. The compressed file is smaller than the decompressed file.
Otherwise it returns the data unchanged. Without warning; that's just
how the API works.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Compression uses a 3 byte hash remember LZ77 matches in a 14-bit table.
This script runs the hash over all 16M combinations, then again over
all ASCII combinations, counting collisions to find hot-spots.
If you think you have a better hash, you are probably right, but you
should try it here -- alter h() -- before committing to it. This one is
literally the first one I thought of.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Huffman tree re-quantisation and perhaps other code paths are only
triggered by pathological data like this.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This compresses some data, decompresses it, and asserts that the
result is identical to the original string.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This differs from fuzz_lzxpress_huffman_round_trip (next commit) in
that the output buffer might be too small for the compressed data, in
which case we want to see an error and not a crash.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Most strings will not successfully decompress, which is OK. What we
care about of course is memory safety.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
With LZXHUFF_DEBUG_VERBOSE set, we measure the compression and
decompression rate relative to the decompressed size.
On reasonably long strings on my laptop, compiled with -O0, it turns
out to between 20 and 500 MB/s, both ways, depending on the complexity
of the string. Very short strings are of course dominated by overhead.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
If you need to see a Huffman tree (and sometimes you do), set
DEBUG_HUFFMAN_TREE to true at the top of lzxpress_huffman.c, and run:
make bin/test_lzx_huffman && bin/test_lzx_huffman
Actually, that will show you hundreds of trees, and you'll be glad of
that if you are ever trying to understand this.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Encoding without LZ77 matches is valid, and it is useful for isolating
bugs.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This compresses files as described in MS-XCA 2.2, and as decompressed
by the decompressor in the previous commit.
As with the decompressor, there are two public functions -- one that
uses a talloc context, and one that uses pre-allocated memory. The
compressor requires a tightly bound amount of auxillary memory
(>220kB) in a few different buffers, which is all gathered together in
the public struct lzxhuff_compressor_mem. An instantiated but not
initialised copy of this struct is required by the non-talloc
function; it can be used over and over again.
Our compression speed is about the same as the decompression speed
(between 20 and 500 MB/s on this laptop, depending on the data), and
our compression ratio is very similar to that of Windows.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in
many posts on the cifs-protocol list[1].
The two public functions are:
ssize_t lzxpress_huffman_decompress(const uint8_t *input,
size_t input_size,
uint8_t *output,
size_t output_size);
uint8_t *lzxpress_huffman_decompress_talloc(TALLOC_CTX *mem_ctx,
const uint8_t *input_bytes,
size_t input_size,
size_t output_size);
In both cases the caller needs to know the *exact* decompressed size,
which is essential for decompression. The _talloc version allocates
the buffer for you, and uses the talloc context to allocate a 128k
working buffer. THe non-talloc function will allocate the working
buffer on the stack.
This compression format gives better compression for messages of
several kilobytes than the "plain" LXZPRESS compression, but is
probably a bit slower to decompress and is certainly worse for very
short messages, having a fixed 256 byte overhead for the first Huffman
table.
Experiments show decompression rates between 20 and 500 MB per second,
depending on the compression ratio and data size, on an i5-1135G7 with
no compiler optimisations.
This compression format is used in AD claims and in SMB, but that
doesn't happen with this commit.
I will not try to describe LZ77 or Huffman encoding here. Don't expect
an answer in MS-XCA either; instead read the code and/or Wikipedia.
[1] Much of that starts here:
https://lists.samba.org/archive/cifs-protocol/2022-October/
but there's more earlier, particularly in June/July 2020, when
Aurélien Aptel was working on an implementation that ended up in
Wireshark.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Some of the decompressed files were found via fuzzing, some are public
domain texts, and some are designed to test one aspect or another of
the format. For example, some aspects of Huffman tree creation can
only be tested when there is an extreme imbalance in the frequency of
symbols.
See the README for what files are where.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
We are going to have all kinds of rubbish there.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
We are going to add more tests for lib/compression, and they can't all
be called "testsuite.c".
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Sometimes (e.g. in lzxpress Huffman encoding, and in some of our
tests: c.f. https://lists.samba.org/archive/samba-technical/2018-March/126010.html)
we want a stable sort algorithm (meaning one that retains the previous
order of items that compare equal).
The GNU libc qsort() is *usually* stable, in that it first tries to
use a mergesort but reverts to quicksort if the necessary allocations
fail. That has led Samba developers to unthinkingly assume qsort() is
stable which is not the case on many platforms, and might not always
be on GNU/Linuxes either.
This adds four functions. stable_sort() sorts an array, and requires
an auxiliary working array of the same size. stable_sort_talloc()
takes a talloc context so it ca create a working array and call
stable_sort(). stable_sort_r() takes an opaque context blob that gets
passed to the compare function, like qsort_r() and ldb_qsort(). And
stable_sort_talloc_r() rounds out the quadrant.
These are LGPL so that the can be used in ldb, which has problems with
unstable sort.
The tests are borrowed and extended from test_ldb_qsort.c.
When sorting non-trivial structs this is roughly as fast as GNU qsort,
but GNU qsort has optimisations for small items, using direct
assignments of rather than memcpy where the size allows the item to be
cast as some kind of int.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Remove knownfail.
Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Autobuild-User(master): Ralph Böhme <slow@samba.org>
Autobuild-Date(master): Thu Dec 1 16:04:07 UTC 2022 on sn-devel-184
The protocol allows the last read in a related compound to be split
off and possibly go async (and smbd soon will do this). If the
last read is split off, then the padding is different. By sending
3 reads and checking the padding on the 2nd read, we cope with
the smbd change and are still correctly checking the padding
on a compound related read.
Do this for the stream filename compound padding test.
Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
The protocol allows the last read in a related compound to be split
off and possibly go async (and smbd soon will do this). If the
last read is split off, then the padding is different. By sending
3 reads and checking the padding on the 2nd read, we cope with
the smbd change and are still correctly checking the padding
on a compound related read.
Do this for the base filename compound padding test.
Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
It doesn't hurt the fsync compound async tests, and we need this for
the next commits to ensure smb2_read/smb2_write compound tests take
longer than 500ms so can be sure the last read/write in the compound
will go async.
Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
The timer for the timeout_cb() handler was created on a memory context
which doesn't get freed, so the timer was still valid when running
the next test and fired there. It was then writing into random memory
leading to segfaults.
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Thu Dec 1 15:03:19 UTC 2022 on sn-devel-184
When receiving a query for '..', hide the owner
and group sids, the inode, and the dev id.
Signed-off-by: David Mulder <dmulder@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>