1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00
Commit Graph

252 Commits

Author SHA1 Message Date
Douglas Bagnall
997b72d79e util: charset:util_str: use NUMERIC_CMP in strncasecmp_m_handle
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
f07ae69907 util:charset:codepoints: codepoint_cmpi warning about non-transitivity
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
675fdeee3d util:charset:codepoints: condepoint_cmpi uses NUMERIC_CMP()
If these are truly unicode codepoints (< ~2m) there is no overflow,
but the type is defined as uint32_t.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
f788a39999 util:charset:util_str: use NUMERIC_CMP in strcasecmp_m_handle
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
a512759d7b torture:charset: test more of strcasecmp_m
We now test cases:

1. where the first string compares less
2. one of the strings ends before the other
3. the strings differ on a character other than the first.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
dda0bb6fc7 torture:charset: use < and > assertions for strncasecmp_m
strncasecmp_m is supposed to return a negative, zero, or positive
number, not necessarily the difference between the codepoints in
the first  character that differs, which we have been asserting up to
now.

This fixes a knownfail on 32 bit.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
ac0a8cd92c torture:charset: use < and > assertions for strcasecmp_m
strcasecmp_m is supposed to return a negative, zero, or positive
number, depending on whether the first argument is less than, equal to,
or greater than the second argument (respectively).

We have been asserting that it returns exactly the difference between
the codepoints in the first character that differs.

This fixes a knownfail on 32 bit.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Joseph Sutton
346844b730 librpc: Change type of ‘u16string’ from ‘const uint16_t *’ to ‘const unsigned char *’
A u16string is supposed to contain UTF‐16 code units, but
ndr_pull_u16string() and ndr_push_u16string() fail to correctly ensure
this on big‐endian systems. Code that relies on the u16string array
containing correct values will then fail.

Fix ndr_pull_u16string() and ndr_push_u16string() to work on big‐endian
systems, ensuring that other code can use these strings without having
to worry about first encoding them to little‐endian.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-12-21 23:48:46 +00:00
Joseph Sutton
1947bd6d6d util/charset: Remove trailing whitespace
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-12-08 02:28:33 +00:00
Joseph Sutton
4629fc7c61 util/charset: Have talloc_utf16_str[n]dup() accept NULL pointers
This is in line with ‘talloc_str[n]dup()’.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-20 21:50:32 +00:00
Joseph Sutton
939ceb233e util/charset: Add talloc_utf16_str[n]dup()
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-16 05:18:36 +00:00
Joseph Sutton
b6ff89f6fb util/charset: Include missing headers
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-16 05:18:36 +00:00
Joseph Sutton
3f0809f1ee util/charset: Remove unnecessary cast
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-16 05:18:36 +00:00
Joseph Sutton
ec3e420840 util/charset: Prefer PULL_LE_U16() to older SVAL() macro
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
99e0a0f21a util/charset/tests: Add tests for UTF‐16 string length functions
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
a46746381b util/charset: Add utf16_len_n()
This function returns the length in bytes — at most ‘n’ — of a UTF‐16
string excluding the null terminator.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
74a5a3b74e util/charset: Include final UTF‐16 code unit in length calculation loop
Change ‘<’ to ‘<=’ so that we check the final UTF‐16 code unit in our
search for the null terminator. This makes no difference to the result:
if we’ve reached the final code unit without finding a terminator, the
final code unit will be included in the length whether it is a null
terminator or not.

Why make this change? We’re about to factor out this loop into a new
function, utf16_len_n(), where including the final code unit *will*
matter.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
516f35b5a1 util/charset: Add utf16_len()
This function returns the length in bytes of a UTF‐16 string excluding
the null terminator.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
16996d145b util/charset: Rename utf16_len() to utf16_null_terminated_len()
The new name indicates that — contrary to functions such as strnlen() —
the length may include the terminator.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
542e5a3039 util/charset: Rename utf16_len_n() to utf16_null_terminated_len_n()
The new name indicates that — contrary to functions such as strnlen() —
the length may include the terminator.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
982238e914 util/charset: Remove trailing whitespace
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Douglas Bagnall
3960eabca7 libutil/iconv: avoid overflow in surrogate pairs
Consider the non-conforment utf-8 sequence "\xf5\x80\x80\x80", which
would encode 0x140000. We would set the high byte of the first
surrogate to 0xd8 | (0x130000 >> 18), or 0xdc, which is an invalid
start for a high surrogate, making the sequence as a whole invalid (as
you would expect -- the Unicode range was set precisely to that
covered by utf-16 surrogates).

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
949fe57077 libutil/iconv: don't allow wtf-8 surrogate pairs
At present, if we meet a string like "hello \xed\xa7\x96 world", the
bytes in the middle will be converted into half of a surrogate pair,
and the UTF-16 will be invalid. It is better to error out immediately,
because the UTF-8 string is already invalid.

https://learn.microsoft.com/en-us/windows/win32/api/Stringapiset/nf-stringapiset-widechartomultibyte#remarks
is a citation for the statement about this being a pre-Vista
problem.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
d7481f94e0 util/charset/torture: test convert_string_talloc with emptyish strings
because it wasn't entirely obvious (a zero length string returns a
length 1 result).

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
b5a728e81e util/convert string: remove inaccurate misspelt comment
Previous commit to the "embarrassing" line was ce10a7a673 "Fix
typo in comment", which did not completely fix the typo in the
comment.

But there are no gotos anymore, so no embarrassment, however spelt.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
df8ab7edfa util/charset: disambiguate docs for convert_string twins
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
7cf4efe768 lib/util/charset: @param typos
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
e4da279b1c util/str: helper to check for utf-8 validity
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-09-26 23:45:36 +00:00
Joseph Sutton
dd2b568721 lib:charset: Fix code spelling
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-09-11 02:42:41 +00:00
Joseph Sutton
355fd3c7bf lib:charset: Update NUM_CHARSETS to reflect true value
CH_DISPLAY was removed in commit
125a2ff262, but NUM_CHARSETS was not
updated to match.

By assigning to NUM_CHARSETS the last enumeration value in charset_t, we
guard against its falling out of sync again.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-08-08 04:39:37 +00:00
Andreas Schneider
cfa53c8a80 lib:util: Fix code spelling
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2023-04-14 05:25:33 +00:00
Volker Lendecke
7fe12e79f9 lib: Fix a typo
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2022-08-26 18:54:37 +00:00
Volker Lendecke
4171736339 lib: Stay ASCII-compatible for toupper_m/tolower_m
This is an alternative patch for MR2339: It seems that Windows AD in
turkish locale is ASCII-compatible with 'i'. Björn tells me that the
turkish locale is the only one where upper/lower casing letters in the
ASCII range is not compatible to ASCII.

Simplify our code by not calling the locale-specific standard
toupper/tolower for the ASCII range but rely on our tables.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Mon Apr  4 11:45:24 UTC 2022 on sn-devel-184
2022-04-04 11:45:24 +00:00
Alex Richardson
2564e96e83 charset_macosxfs.c: fix compilation on macOS
The DEBUG macro was missing and the CFStringGetBytes() was triggering a
-Werror,-Wpointer-sign build failure.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14862

Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-10-13 01:42:35 +00:00
Douglas Bagnall
4711ad9e81 util/charset: warn loudly on unexpected E2BIG
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Fri Jun 18 04:27:17 UTC 2021 on sn-devel-184
2021-06-18 04:27:16 +00:00
Douglas Bagnall
1ea1816629 util/iconv: reject improperly packed UTF-8
If we allow a string that encodes say '\0' as a multi-byte sequence,
we are open to confusion where we mix NUL terminated strings with
sized data blobs, which is to say EVERYWHERE.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14684

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-06-18 03:39:28 +00:00
Volker Lendecke
1c2460a87e lib: Fix 'charset' dependencies
With this, 'charset' could be a SAMBA_LIBRARY without any undefined symbols

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Tue Jan 12 01:19:26 UTC 2021 on sn-devel-184
2021-01-12 01:19:26 +00:00
Volker Lendecke
1701041d53 lib: Avoid "includes.h" in lib/util/charset/
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
9de2c2c12d lib: Remove using talloc_stack from lib/util/charset/
'charset' should be as standalone as possible, and for this one use
talloc_stackframe() is not really necessary.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
8b5eda7535 lib: Move utf16_len[_n]() to lib/util/charset/
util_unistr.c references it, avoid broken dependencies

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
3d0e55b6d9 build: Move weird.c and charset_macosxfs.c to ICONV_WRAPPER
iconv.c directly references them, it does not make sense to have it
without them.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
8c02ebdbf8 lib: Simplify "weird" charset code
Don't depend on DEBUG. This is a pure developer module, the developer
should be able to figure out what's going on after this has abort()ed.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
8f08390c28 lib: Move ucs2_align() to 'charset' subsystem
Fix a circular dependency: util_str_common.c depends on 'charset',
which depends on util_str_common.c. Fix that.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
41e1b34026 lib: Use hex_byte() in ucs2hex_pull()
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-08 20:31:33 +00:00
Matthew DeVore
232054c09b lib/util: remove extra safe_string.h file
lib/util/safe_string.h is similar to source3/include/safe_string.h, but
the former has fewer checks. It is missing bcopy, strcasecmp, and
strncasecmp.

Add the missing elements to lib/util/safe_string.h remove the other
safe_string.h which is in the source3-specific path. To accomodate
existing uses of str(n?)casecmp, add #undef lines to source files where
they are used.

Signed-off-by: Matthew DeVore <matvore@google.com>
Reviewed-by: David Mulder <dmulder@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Fri Aug 28 02:18:40 UTC 2020 on sn-devel-184
2020-08-28 02:18:40 +00:00
Ralph Boehme
276d280d27 lib/util: add talloc_alpha_strcpy()
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
2020-02-06 10:17:42 +00:00
Andrew Bartlett
34a8cee348 CVE-2019-14907 lib/util: Do not print the failed to convert string into the logs
The string may be in another charset, or may be sensitive and
certainly may not be terminated.  It is not safe to just print.

Found by Robert Święcki using a fuzzer he wrote for smbd.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14208
Signed-off-by: Andrew Bartlett <abartlet@samba.org>
2020-01-21 10:11:39 +00:00
Swen Schillig
84e519f365 util: Free memory in charset torture test to satisfy sanitizer
Signed-off-by: Swen Schillig <swen@linux.ibm.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Matthias Dieter Wallnöfer <mdw@samba.org>
2019-08-08 10:08:32 +00:00
Ralph Boehme
2a90202052 charset: add tests for Unicode NFC <-> NFD conversion
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>

Autobuild-User(master): Andrew Bartlett <abartlet@samba.org>
Autobuild-Date(master): Wed Aug  7 07:25:39 UTC 2019 on sn-devel-184
2019-08-07 07:25:39 +00:00
Ralph Boehme
107020793c charset: add support for Unicode normalisation with libicu
This adds a direct conversion hook using libicu to perform NFC <-> NFD
conversion on UTF8 strings. The defined charset strings are "UTF8-NFC" and
"UTF8-NFD", to convert from one to the other the caller calls smb_iconv_open()
with the desired source and target charsets, eg

  smb_iconv_open("UTF8-NFD", "UTF8-NFC");

for converting from NFC to NFD.

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2019-08-07 06:07:28 +00:00