IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
If you need to see a Huffman tree (and sometimes you do), set
DEBUG_HUFFMAN_TREE to true at the top of lzxpress_huffman.c, and run:
make bin/test_lzx_huffman && bin/test_lzx_huffman
Actually, that will show you hundreds of trees, and you'll be glad of
that if you are ever trying to understand this.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Encoding without LZ77 matches is valid, and it is useful for isolating
bugs.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This compresses files as described in MS-XCA 2.2, and as decompressed
by the decompressor in the previous commit.
As with the decompressor, there are two public functions -- one that
uses a talloc context, and one that uses pre-allocated memory. The
compressor requires a tightly bound amount of auxillary memory
(>220kB) in a few different buffers, which is all gathered together in
the public struct lzxhuff_compressor_mem. An instantiated but not
initialised copy of this struct is required by the non-talloc
function; it can be used over and over again.
Our compression speed is about the same as the decompression speed
(between 20 and 500 MB/s on this laptop, depending on the data), and
our compression ratio is very similar to that of Windows.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in
many posts on the cifs-protocol list[1].
The two public functions are:
ssize_t lzxpress_huffman_decompress(const uint8_t *input,
size_t input_size,
uint8_t *output,
size_t output_size);
uint8_t *lzxpress_huffman_decompress_talloc(TALLOC_CTX *mem_ctx,
const uint8_t *input_bytes,
size_t input_size,
size_t output_size);
In both cases the caller needs to know the *exact* decompressed size,
which is essential for decompression. The _talloc version allocates
the buffer for you, and uses the talloc context to allocate a 128k
working buffer. THe non-talloc function will allocate the working
buffer on the stack.
This compression format gives better compression for messages of
several kilobytes than the "plain" LXZPRESS compression, but is
probably a bit slower to decompress and is certainly worse for very
short messages, having a fixed 256 byte overhead for the first Huffman
table.
Experiments show decompression rates between 20 and 500 MB per second,
depending on the compression ratio and data size, on an i5-1135G7 with
no compiler optimisations.
This compression format is used in AD claims and in SMB, but that
doesn't happen with this commit.
I will not try to describe LZ77 or Huffman encoding here. Don't expect
an answer in MS-XCA either; instead read the code and/or Wikipedia.
[1] Much of that starts here:
https://lists.samba.org/archive/cifs-protocol/2022-October/
but there's more earlier, particularly in June/July 2020, when
Aurélien Aptel was working on an implementation that ended up in
Wireshark.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
We are going to add more tests for lib/compression, and they can't all
be called "testsuite.c".
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
A simple degenerate case for our compressor has been a large number of
repeated bytes that will match the maximum length (~64k) at all 8192
search positions, 8191 of which searches are in vain because the
matches are not of greater length than the first one.
Here we recognise the inevitable and reduce runtime proportionately.
Credit to OSS-Fuzz.
REF: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=47428
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Autobuild-User(master): Douglas Bagnall <dbagnall@samba.org>
Autobuild-Date(master): Tue May 17 23:11:21 UTC 2022 on sn-devel-184
We get *very* slow when long runs of the bytes are the same. On this
laptop the test takes 18s; with the next commit it will be 0.006s.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
You'll thank me if you're ever debugging these and wondering why
'lzxpress4' calls 'lzxpress2' (or is it the other way round?).
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
We set `uncompressed_pos = 0;` unconditionally, just ~10 lines up.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This mirrors the behaviour of lzxpress_compress, which "encodes" an
empty string as an empty string.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Every so often, lzxpress adds a 32-bit block of indicator flags to
help decode the next clump of 32 code words. A naive compressor (such
as we have) might do this at the very end for flags that aren't
actually used because there are no more bytes to decompress. If that
happens we need to stop processing, or we'll come to worse outcome at
the next CHECK_INPUT_BYTES.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Empty strings and trailing flag blocks.
(found with Honggfuzz and a round-trip fuzzer that aborts if the
strings differ).
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This makes the code clearer.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
This is more consistent with the compression code.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This makes the code clearer.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This simplifies the code and makes it clearer.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
If indic_bit == 0, the shift amount of 32 - indic_bit == 32 will equal
the width of a 32-bit integer type, and these shifts will invoke
undefined behaviour, which is likely to cause incorrect output. Fix this
by not shifting a 32-bit integer type by 32 bits or more.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
We can simplify this code using the identity:
byte_left + uncompressed_pos = uncompressed_size
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
This now matches the other use of indic_pos.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
If nibble_index is non-zero, we have already written to it, and so don't
need to check again that it is in bounds.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Previously, we were setting this to the wrong value and overwriting
existing output data.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Previously, we were setting this to the wrong value and overwriting
existing output data.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Signed-off-by: Matt Suiche <msuiche@comae.com>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Signed-off-by: Matt Suiche <msuiche@comae.com>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Signed-off-by: Matt Suiche <msuiche@comae.com>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Signed-off-by: Matt Suiche <msuiche@comae.com>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
MS-XCA contains examples, and we should at least get those right.
Signed-off-by: Matt Suiche <msuiche@comae.com>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
UBSAN:
runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
Credit to OSS-fuzz.
REF: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22283
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Mon Aug 31 22:31:13 UTC 2020 on sn-devel-184
lib/compression/lzxpress.c:228 runtime error: store to misaligned
address 0x5631d53ca9fe for type 'uint32_t', which requires 4 byte
alignment
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>
This is consistent with the test names used by selftest, should
make the names less confusing and easier to integrate with other tools.
Autobuild-User: Jelmer Vernooij <jelmer@samba.org>
Autobuild-Date: Sat Dec 11 04:16:13 CET 2010 on sn-devel-104