1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-10 01:18:15 +03:00
Commit Graph

55 Commits

Author SHA1 Message Date
Douglas Bagnall
77048aaa61 lib/compression: debug routines for lzxpress-huffman
If you need to see a Huffman tree (and sometimes you do), set
DEBUG_HUFFMAN_TREE to true at the top of lzxpress_huffman.c, and run:

  make bin/test_lzx_huffman && bin/test_lzx_huffman

Actually, that will show you hundreds of trees, and you'll be glad of
that if you are ever trying to understand this.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
955214ef6e lib/compression/lzhuff: add debug flag to skip LZ77
Encoding without LZ77 matches is valid, and it is useful for isolating
bugs.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
d4e3f0c88e lib/compression: LZ77 + Huffman compression
This compresses files as described in MS-XCA 2.2, and as decompressed
by the decompressor in the previous commit.

As with the decompressor, there are two public functions -- one that
uses a talloc context, and one that uses pre-allocated memory. The
compressor requires a tightly bound amount of auxillary memory
(>220kB) in a few different buffers, which is all gathered together in
the public struct lzxhuff_compressor_mem. An instantiated but not
initialised copy of this struct is required by the non-talloc
function; it can be used over and over again.

Our compression speed is about the same as the decompression speed
(between 20 and 500 MB/s on this laptop, depending on the data), and
our compression ratio is very similar to that of Windows.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
f86035c65b lib/compression: add LZ77 + Huffman decompression
This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in
many posts on the cifs-protocol list[1].

The two public functions are:

ssize_t lzxpress_huffman_decompress(const uint8_t *input,
				    size_t input_size,
				    uint8_t *output,
				    size_t output_size);

uint8_t *lzxpress_huffman_decompress_talloc(TALLOC_CTX *mem_ctx,
					    const uint8_t *input_bytes,
					    size_t input_size,
					    size_t output_size);

In both cases the caller needs to know the *exact* decompressed size,
which is essential for decompression. The _talloc version allocates
the buffer for you, and uses the talloc context to allocate a 128k
working buffer. THe non-talloc function will allocate the working
buffer on the stack.

This compression format gives better compression for messages of
several kilobytes than the "plain" LXZPRESS compression, but is
probably a bit slower to decompress and is certainly worse for very
short messages, having a fixed 256 byte overhead for the first Huffman
table.

Experiments show decompression rates between 20 and 500 MB per second,
depending on the compression ratio and data size, on an i5-1135G7 with
no compiler optimisations.

This compression format is used in AD claims and in SMB, but that
doesn't happen with this commit.

I will not try to describe LZ77 or Huffman encoding here. Don't expect
an answer in MS-XCA either; instead read the code and/or Wikipedia.

[1] Much of that starts here:

https://lists.samba.org/archive/cifs-protocol/2022-October/

but there's more earlier, particularly in June/July 2020, when
Aurélien Aptel was working on an implementation that ended up in
Wireshark.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
f6cda06dfb lib/compression: move lzxpress_plain test into tests/
We are going to add more tests for lib/compression, and they can't all
be called "testsuite.c".

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Volker Lendecke
605d646935 lib: Fix the 32-bit build
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2022-07-23 23:29:38 +00:00
Douglas Bagnall
637e7cbdba lzxpress: compress shortcut if we've reached maximum length
A simple degenerate case for our compressor has been a large number of
repeated bytes that will match the maximum length (~64k) at all 8192
search positions, 8191 of which searches are in vain because the
matches are not of greater length than the first one.

Here we recognise the inevitable and reduce runtime proportionately.

Credit to OSS-Fuzz.

REF: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=47428

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>

Autobuild-User(master): Douglas Bagnall <dbagnall@samba.org>
Autobuild-Date(master): Tue May 17 23:11:21 UTC 2022 on sn-devel-184
2022-05-17 23:11:21 +00:00
Douglas Bagnall
04309bc682 lzxpress/test: time performance of long boring sequences
We get *very* slow when long runs of the bytes are the same. On this
laptop the test takes 18s; with the next commit it will be 0.006s.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-17 22:13:35 +00:00
Douglas Bagnall
505d2879fa compression:tests: align test names with functions
You'll thank me if you're ever debugging these and wondering why
'lzxpress4' calls 'lzxpress2' (or is it the other way round?).

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Douglas Bagnall
05c760165b compression: add a few comments, including MS-XCA pointers.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Douglas Bagnall
383a7cfed9 compression: remove always false constant comparison
We set `uncompressed_pos = 0;` unconditionally, just ~10 lines up.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Douglas Bagnall
e36cb10b16 compression: lzxpress decompress empty string as empty string
This mirrors the behaviour of lzxpress_compress, which "encodes" an
empty string as an empty string.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Douglas Bagnall
1ca4449294 compression: fix lzxpress decompress with trailing flags
Every so often, lzxpress adds a 32-bit block of indicator flags to
help decode the next clump of 32 code words. A naive compressor (such
as we have) might do this at the very end for flags that aren't
actually used because there are no more bytes to decompress. If that
happens we need to stop processing, or we'll come to worse outcome at
the next CHECK_INPUT_BYTES.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Douglas Bagnall
d8a90d2a8f compression:tests: test lzxpress in some edge cases
Empty strings and trailing flag blocks.

(found with Honggfuzz and a round-trip fuzzer that aborts if the
strings differ).

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Joseph Sutton
075df819cc compression: Move maximum length calculation out of inner loop
This makes the code clearer.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
877f007f32 compression: Use correct values for max len and offset
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
fe5fa7e197 compression: Replace divisions with shifts
This is more consistent with the compression code.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
131eb75269 compression: Remove unneeded loop variable
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Joseph Sutton
5b1f8ea8d3 compression: Reduce scope of variables
This makes the code clearer.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Joseph Sutton
1a964210d2 compression: Use PUSH_LE_U32 for first output buffer write
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Joseph Sutton
41b88d35ce compression: Add bounds check for first output buffer write
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
0c813ee563 compression: Remove helper variables str1 and str2
This simplifies the code and makes it clearer.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
430bcd7a08 compression: Fix writing output flags
If indic_bit == 0, the shift amount of 32 - indic_bit == 32 will equal
the width of a 32-bit integer type, and these shifts will invoke
undefined behaviour, which is likely to cause incorrect output. Fix this
by not shifting a 32-bit integer type by 32 bits or more.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
bb9115e023 compression: Remove byte_left variable
We can simplify this code using the identity:
  byte_left + uncompressed_pos = uncompressed_size

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
417e0c914f compression: Remove redundant bounds check
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
6f3f1ba5b4 compression: Add range check for indic_pos
This now matches the other use of indic_pos.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
b62fbc4a53 compression: Remove redundant nibble_index check
If nibble_index is non-zero, we have already written to it, and so don't
need to check again that it is in bounds.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
52982c01a5 compression: Make use of PUSH_LE_Uxx macros
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Joseph Sutton
f2ea8d4c05 compression: Simplify code by making indic_pos an index
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
b153445798 compression: Make use of CHECK_{IN,OUT}PUT_BYTES macros
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
ea42717cca compression: Simplify code by removing metadata_size variable
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
69244b52ed compression: Use correct value for indic_pos
Previously, we were setting this to the wrong value and overwriting
existing output data.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
7fab9f90e8 compression: Use correct value for nibble_index
Previously, we were setting this to the wrong value and overwriting
existing output data.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
f8feac11cb compression: Simplify redundant branches
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Joseph Sutton
d368fa61cf compression: Consistently use PUSH_LE_Uxx macros
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Joseph Sutton
9516b26845 compression: Use explicit data sizes
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Joseph Sutton
eb7f139dec compression tests: Add additional compression tests
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Matt Suiche
3c2f1f03c1 compression: fix lzxpress-compress
Signed-off-by: Matt Suiche <msuiche@comae.com>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Matt Suiche
8f7fbc5c8f compression: lzxpress_compress: fix no-op shift of 0
Signed-off-by: Matt Suiche <msuiche@comae.com>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Matt Suiche
a8fb45247b compression: fix lzxpress_decompress
Signed-off-by: Matt Suiche <msuiche@comae.com>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2022-05-12 02:22:35 +00:00
Matt Suiche
f67ff611e9 compression tests: add test for legacy compressed data
Signed-off-by: Matt Suiche <msuiche@comae.com>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Matt Suiche
4bcdc3bf30 compression tests: add LZXpress tests based on [MS-XCA]
MS-XCA contains examples, and we should at least get those right.

Signed-off-by: Matt Suiche <msuiche@comae.com>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2022-05-12 02:22:35 +00:00
Douglas Bagnall
0c461f3bd5 lzxpress: avoid technically undefined shift
UBSAN:

  runtime error: left shift of 1 by 31 places cannot be represented in type 'int'

Credit to OSS-fuzz.

REF: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22283

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Mon Aug 31 22:31:13 UTC 2020 on sn-devel-184
2020-08-31 22:31:13 +00:00
Stefan Metzmacher
a97c78fb22 lzxpress: add bounds checking to lzxpress_decompress()
lzxpress_decompress() would wander past the end of the array in
numerous locations.

Credit to OSS-Fuzz.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14190
REF: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=19382
REF: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=20083
REF: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22485
REF: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22667

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>

Autobuild-User(master): Douglas Bagnall <dbagnall@samba.org>
Autobuild-Date(master): Sun Aug  9 00:30:26 UTC 2020 on sn-devel-184
2020-08-09 00:30:26 +00:00
Andreas Schneider
aab5034a9d lib:compression: Fix undefined behavior in lzxpress
lib/compression/lzxpress.c:228 runtime error: store to misaligned
address 0x5631d53ca9fe for type 'uint32_t', which requires 4 byte
alignment

Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>
2018-11-22 22:13:27 +01:00
Stefan Metzmacher
3f5e05f024 lib/compression/tests: add missing #include "torture/local/proto.h"
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2014-04-02 09:03:44 +02:00
Christian Ambach
094747aaf3 lib/compression: fix a compiler warnings
about set but unused variable

Signed-off-by: Christian Ambach <ambi@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2013-12-12 14:21:26 -08:00
Andreas Schneider
390734acca lib: Remove dead mszip code.
RIP, long live zlib.
2012-01-25 11:58:26 +01:00
Günther Deschner
56fe080d87 lib/compression: add shared wscript_build.
Guenther
2011-02-08 14:05:36 +01:00
Jelmer Vernooij
35fbc7bbda s4-smbtorture: Make test names lowercase and dot-separated.
This is consistent with the test names used by selftest, should
make the names less confusing and easier to integrate with other tools.

Autobuild-User: Jelmer Vernooij <jelmer@samba.org>
Autobuild-Date: Sat Dec 11 04:16:13 CET 2010 on sn-devel-104
2010-12-11 04:16:13 +01:00