IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Compression uses a 3 byte hash remember LZ77 matches in a 14-bit table.
This script runs the hash over all 16M combinations, then again over
all ASCII combinations, counting collisions to find hot-spots.
If you think you have a better hash, you are probably right, but you
should try it here -- alter h() -- before committing to it. This one is
literally the first one I thought of.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Huffman tree re-quantisation and perhaps other code paths are only
triggered by pathological data like this.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This compresses some data, decompresses it, and asserts that the
result is identical to the original string.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This differs from fuzz_lzxpress_huffman_round_trip (next commit) in
that the output buffer might be too small for the compressed data, in
which case we want to see an error and not a crash.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Most strings will not successfully decompress, which is OK. What we
care about of course is memory safety.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
With LZXHUFF_DEBUG_VERBOSE set, we measure the compression and
decompression rate relative to the decompressed size.
On reasonably long strings on my laptop, compiled with -O0, it turns
out to between 20 and 500 MB/s, both ways, depending on the complexity
of the string. Very short strings are of course dominated by overhead.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
If you need to see a Huffman tree (and sometimes you do), set
DEBUG_HUFFMAN_TREE to true at the top of lzxpress_huffman.c, and run:
make bin/test_lzx_huffman && bin/test_lzx_huffman
Actually, that will show you hundreds of trees, and you'll be glad of
that if you are ever trying to understand this.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Encoding without LZ77 matches is valid, and it is useful for isolating
bugs.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This compresses files as described in MS-XCA 2.2, and as decompressed
by the decompressor in the previous commit.
As with the decompressor, there are two public functions -- one that
uses a talloc context, and one that uses pre-allocated memory. The
compressor requires a tightly bound amount of auxillary memory
(>220kB) in a few different buffers, which is all gathered together in
the public struct lzxhuff_compressor_mem. An instantiated but not
initialised copy of this struct is required by the non-talloc
function; it can be used over and over again.
Our compression speed is about the same as the decompression speed
(between 20 and 500 MB/s on this laptop, depending on the data), and
our compression ratio is very similar to that of Windows.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in
many posts on the cifs-protocol list[1].
The two public functions are:
ssize_t lzxpress_huffman_decompress(const uint8_t *input,
size_t input_size,
uint8_t *output,
size_t output_size);
uint8_t *lzxpress_huffman_decompress_talloc(TALLOC_CTX *mem_ctx,
const uint8_t *input_bytes,
size_t input_size,
size_t output_size);
In both cases the caller needs to know the *exact* decompressed size,
which is essential for decompression. The _talloc version allocates
the buffer for you, and uses the talloc context to allocate a 128k
working buffer. THe non-talloc function will allocate the working
buffer on the stack.
This compression format gives better compression for messages of
several kilobytes than the "plain" LXZPRESS compression, but is
probably a bit slower to decompress and is certainly worse for very
short messages, having a fixed 256 byte overhead for the first Huffman
table.
Experiments show decompression rates between 20 and 500 MB per second,
depending on the compression ratio and data size, on an i5-1135G7 with
no compiler optimisations.
This compression format is used in AD claims and in SMB, but that
doesn't happen with this commit.
I will not try to describe LZ77 or Huffman encoding here. Don't expect
an answer in MS-XCA either; instead read the code and/or Wikipedia.
[1] Much of that starts here:
https://lists.samba.org/archive/cifs-protocol/2022-October/
but there's more earlier, particularly in June/July 2020, when
Aurélien Aptel was working on an implementation that ended up in
Wireshark.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Some of the decompressed files were found via fuzzing, some are public
domain texts, and some are designed to test one aspect or another of
the format. For example, some aspects of Huffman tree creation can
only be tested when there is an extreme imbalance in the frequency of
symbols.
See the README for what files are where.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
We are going to have all kinds of rubbish there.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
We are going to add more tests for lib/compression, and they can't all
be called "testsuite.c".
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Sometimes (e.g. in lzxpress Huffman encoding, and in some of our
tests: c.f. https://lists.samba.org/archive/samba-technical/2018-March/126010.html)
we want a stable sort algorithm (meaning one that retains the previous
order of items that compare equal).
The GNU libc qsort() is *usually* stable, in that it first tries to
use a mergesort but reverts to quicksort if the necessary allocations
fail. That has led Samba developers to unthinkingly assume qsort() is
stable which is not the case on many platforms, and might not always
be on GNU/Linuxes either.
This adds four functions. stable_sort() sorts an array, and requires
an auxiliary working array of the same size. stable_sort_talloc()
takes a talloc context so it ca create a working array and call
stable_sort(). stable_sort_r() takes an opaque context blob that gets
passed to the compare function, like qsort_r() and ldb_qsort(). And
stable_sort_talloc_r() rounds out the quadrant.
These are LGPL so that the can be used in ldb, which has problems with
unstable sort.
The tests are borrowed and extended from test_ldb_qsort.c.
When sorting non-trivial structs this is roughly as fast as GNU qsort,
but GNU qsort has optimisations for small items, using direct
assignments of rather than memcpy where the size allows the item to be
cast as some kind of int.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Remove knownfail.
Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Autobuild-User(master): Ralph Böhme <slow@samba.org>
Autobuild-Date(master): Thu Dec 1 16:04:07 UTC 2022 on sn-devel-184
The protocol allows the last read in a related compound to be split
off and possibly go async (and smbd soon will do this). If the
last read is split off, then the padding is different. By sending
3 reads and checking the padding on the 2nd read, we cope with
the smbd change and are still correctly checking the padding
on a compound related read.
Do this for the stream filename compound padding test.
Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
The protocol allows the last read in a related compound to be split
off and possibly go async (and smbd soon will do this). If the
last read is split off, then the padding is different. By sending
3 reads and checking the padding on the 2nd read, we cope with
the smbd change and are still correctly checking the padding
on a compound related read.
Do this for the base filename compound padding test.
Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
It doesn't hurt the fsync compound async tests, and we need this for
the next commits to ensure smb2_read/smb2_write compound tests take
longer than 500ms so can be sure the last read/write in the compound
will go async.
Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
The timer for the timeout_cb() handler was created on a memory context
which doesn't get freed, so the timer was still valid when running
the next test and fired there. It was then writing into random memory
leading to segfaults.
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Thu Dec 1 15:03:19 UTC 2022 on sn-devel-184
When receiving a query for '..', hide the owner
and group sids, the inode, and the dev id.
Signed-off-by: David Mulder <dmulder@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>
Cf MS-FSA 2.1.5.14.2
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15252
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Autobuild-User(master): Ralph Böhme <slow@samba.org>
Autobuild-Date(master): Mon Nov 28 10:14:12 UTC 2022 on sn-devel-184
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Autobuild-User(master): Ralph Böhme <slow@samba.org>
Autobuild-Date(master): Fri Nov 25 06:07:32 UTC 2022 on sn-devel-184
For now we allow build warnings and only do some basic testing.
We also ignore timestamp related problems, as well as some charset
failures.
Over time we should try to address the situation by not allowing warnings
and verify if expected failures are harmless or not.
But it's already much better then having no 32bit testing at all!
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Autobuild-User(master): Stefan Metzmacher <metze@samba.org>
Autobuild-Date(master): Thu Nov 24 12:05:26 UTC 2022 on sn-devel-184
We need to pass --mitkrb5 to selftest.pl in all cases we use
system mit kerberos not only when we also test the kdc.
We can't use a replay cache in selftest verifies the stat.st_uid
against getuid().
BTW: while debugging this on ubuntu 22.04 I exported
KRB5_TRACE="/dev/stderr", which means we get tracing into
the servers log file and into selftest_prefix/subunit for the client...
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
sizeof(struct share_mode_lock) is only 28 bytes instead of 32 bytes
on 32bit systems...
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Volker Lendecke <vl@samba.org>