samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00

Author	SHA1	Message	Date
Douglas Bagnall	41249302a3	lib/compression: add simple python bindings There are four functions, allowing compression and decompression in the two formats we support so far. The functions will accept bytes or unicode strings which are treated as utf-8. The LZ77+Huffman decompression algorithm requires an exact target length to decompress, so this is mandatory. The plain decompression algorithm does not need an exact length, but you can provide one to help it know how much space to allocate. As currently written, you can provide a short length and it will often succeed in decompressing to a different shorter string. These bindings are intended to make ad-hoc investigation easier, not for production use. This is reflected in the guesses about output size that plain_decompress() makes if you don't supply one -- either they are stupidly wasteful or ridiculously insufficient, depending on whether or not you were trying to decompress a 20MB string. >>> a = '12345678' >>> import compression >>> b = compression.huffman_compress(a) >>> b b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 #.... >>> len(b) 262 >>> c = compression.huffman_decompress(b, len(a)) >>> c b'12345678' # note, c is bytes, a is str >>> a '12345678' >>> d = compression.plain_compress(a) >>> d b'\xff\xff\xff\x0012345678' >>> compression.plain_decompress(d) # no size specified, guesses b'12345678' >>> compression.plain_decompress(d,5) b'12345' >>> compression.plain_decompress(d,0) # 0 for auto b'12345678' >>> compression.plain_decompress(d,1) b'1' >>> compression.plain_decompress(a,444) Traceback (most recent call last): compression.CompressionError: unable to decompress data into a buffer of 444 bytes. >>> compression.plain_decompress(b,444) b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 #... That last one decompresses the Huffman compressed file with the plain compressor; pretty much any string is valid for plain decompression. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>	2022-12-22 19:50:33 +00:00
Douglas Bagnall	c2db7fda4e	lib/comression: convert test_lzxpress_plain to cmocka Mainly so I can go make bin/test_lzxpress_plain && bin/test_lzxpress_plain valgrind bin/test_lzxpress_plain rr bin/test_lzxpress_plain rr replay in a tight loop. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>	2022-12-01 22:56:39 +00:00
Douglas Bagnall	f86035c65b	lib/compression: add LZ77 + Huffman decompression This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in many posts on the cifs-protocol list[1]. The two public functions are: ssize_t lzxpress_huffman_decompress(const uint8_t input, size_t input_size, uint8_t output, size_t output_size); uint8_t lzxpress_huffman_decompress_talloc(TALLOC_CTX mem_ctx, const uint8_t input_bytes, size_t input_size, size_t output_size); In both cases the caller needs to know the exact* decompressed size, which is essential for decompression. The _talloc version allocates the buffer for you, and uses the talloc context to allocate a 128k working buffer. THe non-talloc function will allocate the working buffer on the stack. This compression format gives better compression for messages of several kilobytes than the "plain" LXZPRESS compression, but is probably a bit slower to decompress and is certainly worse for very short messages, having a fixed 256 byte overhead for the first Huffman table. Experiments show decompression rates between 20 and 500 MB per second, depending on the compression ratio and data size, on an i5-1135G7 with no compiler optimisations. This compression format is used in AD claims and in SMB, but that doesn't happen with this commit. I will not try to describe LZ77 or Huffman encoding here. Don't expect an answer in MS-XCA either; instead read the code and/or Wikipedia. [1] Much of that starts here: https://lists.samba.org/archive/cifs-protocol/2022-October/ but there's more earlier, particularly in June/July 2020, when Aurélien Aptel was working on an implementation that ended up in Wireshark. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>	2022-12-01 22:56:39 +00:00
Günther Deschner	56fe080d87	lib/compression: add shared wscript_build. Guenther	2011-02-08 14:05:36 +01:00

4 Commits