1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-22 13:34:15 +03:00
samba-mirror/lib/compression/lzxpress_huffman.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

96 lines
2.7 KiB
C
Raw Permalink Normal View History

lib/compression: add LZ77 + Huffman decompression This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in many posts on the cifs-protocol list[1]. The two public functions are: ssize_t lzxpress_huffman_decompress(const uint8_t *input, size_t input_size, uint8_t *output, size_t output_size); uint8_t *lzxpress_huffman_decompress_talloc(TALLOC_CTX *mem_ctx, const uint8_t *input_bytes, size_t input_size, size_t output_size); In both cases the caller needs to know the *exact* decompressed size, which is essential for decompression. The _talloc version allocates the buffer for you, and uses the talloc context to allocate a 128k working buffer. THe non-talloc function will allocate the working buffer on the stack. This compression format gives better compression for messages of several kilobytes than the "plain" LXZPRESS compression, but is probably a bit slower to decompress and is certainly worse for very short messages, having a fixed 256 byte overhead for the first Huffman table. Experiments show decompression rates between 20 and 500 MB per second, depending on the compression ratio and data size, on an i5-1135G7 with no compiler optimisations. This compression format is used in AD claims and in SMB, but that doesn't happen with this commit. I will not try to describe LZ77 or Huffman encoding here. Don't expect an answer in MS-XCA either; instead read the code and/or Wikipedia. [1] Much of that starts here: https://lists.samba.org/archive/cifs-protocol/2022-October/ but there's more earlier, particularly in June/July 2020, when Aurélien Aptel was working on an implementation that ended up in Wireshark. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-11-17 04:24:52 +03:00
/*
* Samba compression library - LGPLv3
*
* Copyright © Catalyst IT 2022
*
* ** NOTE! The following LGPL license applies to this file.
* ** It does NOT imply that all of Samba is released under the LGPL
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 3 of the License, or (at your option) any later version.
*
* This library is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with this library; if not, see <http://www.gnu.org/licenses/>.
*/
#ifndef HAVE_LZXPRESS_HUFFMAN_H
#define HAVE_LZXPRESS_HUFFMAN_H
struct huffman_node {
struct huffman_node *left;
struct huffman_node *right;
uint32_t count;
uint16_t symbol;
int8_t depth;
};
/*
* LZX_HUFF_COMP_HASH_BITS is how big to make the hash tables
* (12 means 4096, etc).
*
* A larger number (up to 16) will be faster on long messages (fewer
* collisions), but probably slower on short ones (more prep).
*/
#define LZX_HUFF_COMP_HASH_BITS 14
/*
* This struct just coalesces all the memory you need for LZ77 + Huffman
* compression together in one bundle.
*
* There are a few different things you want, you usually want them all, so
* this makes it easy to allocate them all at once.
*/
struct lzxhuff_compressor_mem {
struct huffman_node leaf_nodes[512];
struct huffman_node internal_nodes[512];
uint16_t symbol_values[512];
uint16_t intermediate[65536 + 6];
uint16_t hash_table1[1 << LZX_HUFF_COMP_HASH_BITS];
uint16_t hash_table2[1 << LZX_HUFF_COMP_HASH_BITS];
};
ssize_t lzxpress_huffman_compress(struct lzxhuff_compressor_mem *cmp,
const uint8_t *input_bytes,
size_t input_size,
uint8_t *output,
size_t available_size);
ssize_t lzxpress_huffman_compress_talloc(TALLOC_CTX *mem_ctx,
const uint8_t *input_bytes,
size_t input_size,
uint8_t **output);
lib/compression: add LZ77 + Huffman decompression This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in many posts on the cifs-protocol list[1]. The two public functions are: ssize_t lzxpress_huffman_decompress(const uint8_t *input, size_t input_size, uint8_t *output, size_t output_size); uint8_t *lzxpress_huffman_decompress_talloc(TALLOC_CTX *mem_ctx, const uint8_t *input_bytes, size_t input_size, size_t output_size); In both cases the caller needs to know the *exact* decompressed size, which is essential for decompression. The _talloc version allocates the buffer for you, and uses the talloc context to allocate a 128k working buffer. THe non-talloc function will allocate the working buffer on the stack. This compression format gives better compression for messages of several kilobytes than the "plain" LXZPRESS compression, but is probably a bit slower to decompress and is certainly worse for very short messages, having a fixed 256 byte overhead for the first Huffman table. Experiments show decompression rates between 20 and 500 MB per second, depending on the compression ratio and data size, on an i5-1135G7 with no compiler optimisations. This compression format is used in AD claims and in SMB, but that doesn't happen with this commit. I will not try to describe LZ77 or Huffman encoding here. Don't expect an answer in MS-XCA either; instead read the code and/or Wikipedia. [1] Much of that starts here: https://lists.samba.org/archive/cifs-protocol/2022-October/ but there's more earlier, particularly in June/July 2020, when Aurélien Aptel was working on an implementation that ended up in Wireshark. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-11-17 04:24:52 +03:00
ssize_t lzxpress_huffman_decompress(const uint8_t *input,
size_t input_size,
uint8_t *output,
size_t max_output_size);
uint8_t *lzxpress_huffman_decompress_talloc(TALLOC_CTX *mem_ctx,
const uint8_t *input_bytes,
size_t input_size,
size_t output_size);
/*
* lzxpress_huffman_max_compressed_size()
*
* Return the most bytes the compression can take, to allow
* pre-allocation.
*/
size_t lzxpress_huffman_max_compressed_size(size_t input_size);
lib/compression: add LZ77 + Huffman decompression This format is described in [MS-XCA] 2.1 and 2.2, with exegesis in many posts on the cifs-protocol list[1]. The two public functions are: ssize_t lzxpress_huffman_decompress(const uint8_t *input, size_t input_size, uint8_t *output, size_t output_size); uint8_t *lzxpress_huffman_decompress_talloc(TALLOC_CTX *mem_ctx, const uint8_t *input_bytes, size_t input_size, size_t output_size); In both cases the caller needs to know the *exact* decompressed size, which is essential for decompression. The _talloc version allocates the buffer for you, and uses the talloc context to allocate a 128k working buffer. THe non-talloc function will allocate the working buffer on the stack. This compression format gives better compression for messages of several kilobytes than the "plain" LXZPRESS compression, but is probably a bit slower to decompress and is certainly worse for very short messages, having a fixed 256 byte overhead for the first Huffman table. Experiments show decompression rates between 20 and 500 MB per second, depending on the compression ratio and data size, on an i5-1135G7 with no compiler optimisations. This compression format is used in AD claims and in SMB, but that doesn't happen with this commit. I will not try to describe LZ77 or Huffman encoding here. Don't expect an answer in MS-XCA either; instead read the code and/or Wikipedia. [1] Much of that starts here: https://lists.samba.org/archive/cifs-protocol/2022-October/ but there's more earlier, particularly in June/July 2020, when Aurélien Aptel was working on an implementation that ended up in Wireshark. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Pair-programmed-with: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2022-11-17 04:24:52 +03:00
#endif /* HAVE_LZXPRESS_HUFFMAN_H */