1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-13 13:18:06 +03:00
Commit Graph

16 Commits

Author SHA1 Message Date
Anoop C S
0c2146eb00 lib/compression: Include missing stat header file
<sys/stat.h> was missing from compression library tests which resulted
in the following compile time error:

../../lib/compression/tests/test_lzx_huffman.c: In function
                                                   ‘datablob_from_file’:
../../lib/compression/tests/test_lzx_huffman.c:383:21: error:
                                         storage size of ‘s’ isn’t known
  383 |         struct stat s;
      |                     ^
../../lib/compression/tests/test_lzx_huffman.c:389:15: warning:
    implicit declaration of function ‘fstat’ [-Wimplicit-function-declaration]
  389 |         ret = fstat(fileno(fh), &s);
      |               ^~~~~

Signed-off-by: Anoop C S <anoopcs@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Volker Lendecke <vl@samba.org>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Tue Dec  6 11:39:16 UTC 2022 on sn-devel-184
2022-12-06 11:39:16 +00:00
Andreas Schneider
a451fa5ef9 lib:compression: Initialize variables
lib/compression/tests/test_lzx_huffman.c: In function ‘test_lzxpress_huffman_overlong_matches’:
lib/compression/tests/test_lzx_huffman.c:1013:35: error: ‘j’ may be used uninitialized [-Werror=maybe-uninitialized]
 1013 |         assert_int_equal(score, i * j);
      |                                   ^
lib/compression/tests/test_lzx_huffman.c:979:19: note: ‘j’ was declared here
  979 |         size_t i, j;
      |                   ^
lib/compression/tests/test_lzx_huffman.c: In function ‘test_lzxpress_huffman_overlong_matches_abc’:
lib/compression/tests/test_lzx_huffman.c:1059:39: error: ‘k’ may be used uninitialized [-Werror=maybe-uninitialized]
 1059 |         assert_int_equal(score, i * j * k);
      |                                       ^
lib/compression/tests/test_lzx_huffman.c:1020:22: note: ‘k’ was declared here
 1020 |         size_t i, j, k;
      |                      ^
lib/compression/tests/test_lzx_huffman.c:1059:35: error: ‘j’ may be used uninitialized [-Werror=maybe-uninitialized]
 1059 |         assert_int_equal(score, i * j * k);
      |                                   ^
lib/compression/tests/test_lzx_huffman.c:1020:19: note: ‘j’ was declared here
 1020 |         size_t i, j, k;
      |                   ^

Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>

Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Sun Dec  4 09:12:30 UTC 2022 on sn-devel-184
2022-12-04 09:12:30 +00:00
Douglas Bagnall
e4066b2be6 lib/compression: more tests for lzxpress plain compression
These are based on (i.e. copied and pasted from) the LZ77 + Huffman
tests.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:40 +00:00
Douglas Bagnall
ce7ea07d07 testdata: move compression examples to re-use with lzxpress plain
Everything that is in testdata/compression/lzxpress-huffman/ can also
be used for lzxpress plain tests, which is something we really need.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:40 +00:00
Douglas Bagnall
9589f5282b lib/compression/lzx-plain: relax size requirements on long file
We are going to change from a slow exact match algorithm to a fast
heuristic search that will not always get the same results as the
exhaustive search.

To be precise, a million zeros will compress to 112 rather than 93 bytes.

We don't insist on an exact size, because that is not an issue here.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
c2db7fda4e lib/comression: convert test_lzxpress_plain to cmocka
Mainly so I can go

 make bin/test_lzxpress_plain && bin/test_lzxpress_plain
 valgrind bin/test_lzxpress_plain
 rr bin/test_lzxpress_plain
 rr replay

in a tight loop.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
e5f9deed0d lib/compression: add test scripts README
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
1a3d8da731 lib/compression: test util to generate fuzzing seeds
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
6a7c0ca23c lib/compression: Windows utility to generate test vectors
If compiled on Windows using Cygwin, MSYS2, or similar, this will output
compressed versions of files exactly as specified by MZ-XCA, if the
following conditions are met:

1. The file > 300 bytes.
2. The compressed file is smaller than the decompressed file.

Otherwise it returns the data unchanged. Without warning; that's just
how the API works.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
7804570a37 lib/compression: script to test 3 byte hash
Compression uses a 3 byte hash remember LZ77 matches in a 14-bit table.
This script runs the hash over all 16M combinations, then again over
all ASCII combinations, counting collisions to find hot-spots.

If you think you have a better hash, you are probably right, but you
should try it here -- alter h() -- before committing to it. This one is
literally the first one I thought of.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
dadecede54 lib/compression: helper script to make unbalanced data
Huffman tree re-quantisation and perhaps other code paths are only
triggered by pathological data like this.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
bce33816ec lib/compression: add a debug script to describe headers
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
e795985067 lib/compression/tests: add lzhuffman timer functions
With LZXHUFF_DEBUG_VERBOSE set, we measure the compression and
decompression rate relative to the decompressed size.

On reasonably long strings on my laptop, compiled with -O0, it turns
out to between 20 and 500 MB/s, both ways, depending on the complexity
of the string. Very short strings are of course dominated by overhead.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
d4e3f0c88e lib/compression: LZ77 + Huffman compression
This compresses files as described in MS-XCA 2.2, and as decompressed
by the decompressor in the previous commit.

As with the decompressor, there are two public functions -- one that
uses a talloc context, and one that uses pre-allocated memory. The
compressor requires a tightly bound amount of auxillary memory
(>220kB) in a few different buffers, which is all gathered together in
the public struct lzxhuff_compressor_mem. An instantiated but not
initialised copy of this struct is required by the non-talloc
function; it can be used over and over again.

Our compression speed is about the same as the decompression speed
(between 20 and 500 MB/s on this laptop, depending on the data), and
our compression ratio is very similar to that of Windows.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
f86035c65b lib/compression: add LZ77 + Huffman decompression
This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in
many posts on the cifs-protocol list[1].

The two public functions are:

ssize_t lzxpress_huffman_decompress(const uint8_t *input,
				    size_t input_size,
				    uint8_t *output,
				    size_t output_size);

uint8_t *lzxpress_huffman_decompress_talloc(TALLOC_CTX *mem_ctx,
					    const uint8_t *input_bytes,
					    size_t input_size,
					    size_t output_size);

In both cases the caller needs to know the *exact* decompressed size,
which is essential for decompression. The _talloc version allocates
the buffer for you, and uses the talloc context to allocate a 128k
working buffer. THe non-talloc function will allocate the working
buffer on the stack.

This compression format gives better compression for messages of
several kilobytes than the "plain" LXZPRESS compression, but is
probably a bit slower to decompress and is certainly worse for very
short messages, having a fixed 256 byte overhead for the first Huffman
table.

Experiments show decompression rates between 20 and 500 MB per second,
depending on the compression ratio and data size, on an i5-1135G7 with
no compiler optimisations.

This compression format is used in AD claims and in SMB, but that
doesn't happen with this commit.

I will not try to describe LZ77 or Huffman encoding here. Don't expect
an answer in MS-XCA either; instead read the code and/or Wikipedia.

[1] Much of that starts here:

https://lists.samba.org/archive/cifs-protocol/2022-October/

but there's more earlier, particularly in June/July 2020, when
Aurélien Aptel was working on an implementation that ended up in
Wireshark.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00
Douglas Bagnall
f6cda06dfb lib/compression: move lzxpress_plain test into tests/
We are going to add more tests for lib/compression, and they can't all
be called "testsuite.c".

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-12-01 22:56:39 +00:00