samba-mirror

mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00

Author	SHA1	Message	Date
Douglas Bagnall	997b72d79e	util: charset:util_str: use NUMERIC_CMP in strncasecmp_m_handle BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2024-04-10 22:56:33 +00:00
Douglas Bagnall	f07ae69907	util:charset:codepoints: codepoint_cmpi warning about non-transitivity BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2024-04-10 22:56:33 +00:00
Douglas Bagnall	675fdeee3d	util:charset:codepoints: condepoint_cmpi uses NUMERIC_CMP() If these are truly unicode codepoints (< ~2m) there is no overflow, but the type is defined as uint32_t. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2024-04-10 22:56:33 +00:00
Douglas Bagnall	f788a39999	util:charset:util_str: use NUMERIC_CMP in strcasecmp_m_handle BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2024-04-10 22:56:33 +00:00
Douglas Bagnall	a512759d7b	torture:charset: test more of strcasecmp_m We now test cases: 1. where the first string compares less 2. one of the strings ends before the other 3. the strings differ on a character other than the first. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2024-04-10 22:56:33 +00:00
Douglas Bagnall	dda0bb6fc7	torture:charset: use < and > assertions for strncasecmp_m strncasecmp_m is supposed to return a negative, zero, or positive number, not necessarily the difference between the codepoints in the first character that differs, which we have been asserting up to now. This fixes a knownfail on 32 bit. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2024-04-10 22:56:33 +00:00
Douglas Bagnall	ac0a8cd92c	torture:charset: use < and > assertions for strcasecmp_m strcasecmp_m is supposed to return a negative, zero, or positive number, depending on whether the first argument is less than, equal to, or greater than the second argument (respectively). We have been asserting that it returns exactly the difference between the codepoints in the first character that differs. This fixes a knownfail on 32 bit. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2024-04-10 22:56:33 +00:00
Joseph Sutton	346844b730	librpc: Change type of ‘u16string’ from ‘const uint16_t ’ to ‘const unsigned char ’ A u16string is supposed to contain UTF‐16 code units, but ndr_pull_u16string() and ndr_push_u16string() fail to correctly ensure this on big‐endian systems. Code that relies on the u16string array containing correct values will then fail. Fix ndr_pull_u16string() and ndr_push_u16string() to work on big‐endian systems, ensuring that other code can use these strings without having to worry about first encoding them to little‐endian. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-12-21 23:48:46 +00:00
Joseph Sutton	1947bd6d6d	util/charset: Remove trailing whitespace Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-12-08 02:28:33 +00:00
Joseph Sutton	4629fc7c61	util/charset: Have talloc_utf16_str[n]dup() accept NULL pointers This is in line with ‘talloc_str[n]dup()’. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-20 21:50:32 +00:00
Joseph Sutton	939ceb233e	util/charset: Add talloc_utf16_str[n]dup() Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-16 05:18:36 +00:00
Joseph Sutton	b6ff89f6fb	util/charset: Include missing headers Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-16 05:18:36 +00:00
Joseph Sutton	3f0809f1ee	util/charset: Remove unnecessary cast Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-16 05:18:36 +00:00
Joseph Sutton	ec3e420840	util/charset: Prefer PULL_LE_U16() to older SVAL() macro Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-15 22:07:36 +00:00
Joseph Sutton	99e0a0f21a	util/charset/tests: Add tests for UTF‐16 string length functions Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-15 22:07:36 +00:00
Joseph Sutton	a46746381b	util/charset: Add utf16_len_n() This function returns the length in bytes — at most ‘n’ — of a UTF‐16 string excluding the null terminator. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-15 22:07:36 +00:00
Joseph Sutton	74a5a3b74e	util/charset: Include final UTF‐16 code unit in length calculation loop Change ‘<’ to ‘<=’ so that we check the final UTF‐16 code unit in our search for the null terminator. This makes no difference to the result: if we’ve reached the final code unit without finding a terminator, the final code unit will be included in the length whether it is a null terminator or not. Why make this change? We’re about to factor out this loop into a new function, utf16_len_n(), where including the final code unit will matter. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-15 22:07:36 +00:00
Joseph Sutton	516f35b5a1	util/charset: Add utf16_len() This function returns the length in bytes of a UTF‐16 string excluding the null terminator. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-15 22:07:36 +00:00
Joseph Sutton	16996d145b	util/charset: Rename utf16_len() to utf16_null_terminated_len() The new name indicates that — contrary to functions such as strnlen() — the length may include the terminator. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-15 22:07:36 +00:00
Joseph Sutton	542e5a3039	util/charset: Rename utf16_len_n() to utf16_null_terminated_len_n() The new name indicates that — contrary to functions such as strnlen() — the length may include the terminator. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-15 22:07:36 +00:00
Joseph Sutton	982238e914	util/charset: Remove trailing whitespace Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-11-15 22:07:36 +00:00
Douglas Bagnall	3960eabca7	libutil/iconv: avoid overflow in surrogate pairs Consider the non-conforment utf-8 sequence "\xf5\x80\x80\x80", which would encode 0x140000. We would set the high byte of the first surrogate to 0xd8 \| (0x130000 >> 18), or 0xdc, which is an invalid start for a high surrogate, making the sequence as a whole invalid (as you would expect -- the Unicode range was set precisely to that covered by utf-16 surrogates). Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-10-26 01:24:32 +00:00
Douglas Bagnall	949fe57077	libutil/iconv: don't allow wtf-8 surrogate pairs At present, if we meet a string like "hello \xed\xa7\x96 world", the bytes in the middle will be converted into half of a surrogate pair, and the UTF-16 will be invalid. It is better to error out immediately, because the UTF-8 string is already invalid. https://learn.microsoft.com/en-us/windows/win32/api/Stringapiset/nf-stringapiset-widechartomultibyte#remarks is a citation for the statement about this being a pre-Vista problem. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-10-26 01:24:32 +00:00
Douglas Bagnall	d7481f94e0	util/charset/torture: test convert_string_talloc with emptyish strings because it wasn't entirely obvious (a zero length string returns a length 1 result). Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-10-26 01:24:32 +00:00
Douglas Bagnall	b5a728e81e	util/convert string: remove inaccurate misspelt comment Previous commit to the "embarrassing" line was `ce10a7a673` "Fix typo in comment", which did not completely fix the typo in the comment. But there are no gotos anymore, so no embarrassment, however spelt. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-10-26 01:24:32 +00:00
Douglas Bagnall	df8ab7edfa	util/charset: disambiguate docs for convert_string twins Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-10-26 01:24:32 +00:00
Douglas Bagnall	7cf4efe768	lib/util/charset: @param typos Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-10-26 01:24:32 +00:00
Douglas Bagnall	e4da279b1c	util/str: helper to check for utf-8 validity Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-09-26 23:45:36 +00:00
Joseph Sutton	dd2b568721	lib:charset: Fix code spelling Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-09-11 02:42:41 +00:00
Joseph Sutton	355fd3c7bf	lib:charset: Update NUM_CHARSETS to reflect true value CH_DISPLAY was removed in commit `125a2ff262`, but NUM_CHARSETS was not updated to match. By assigning to NUM_CHARSETS the last enumeration value in charset_t, we guard against its falling out of sync again. Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2023-08-08 04:39:37 +00:00
Andreas Schneider	cfa53c8a80	lib:util: Fix code spelling Signed-off-by: Andreas Schneider <asn@samba.org> Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>	2023-04-14 05:25:33 +00:00
Volker Lendecke	7fe12e79f9	lib: Fix a typo Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2022-08-26 18:54:37 +00:00
Volker Lendecke	4171736339	lib: Stay ASCII-compatible for toupper_m/tolower_m This is an alternative patch for MR2339: It seems that Windows AD in turkish locale is ASCII-compatible with 'i'. Björn tells me that the turkish locale is the only one where upper/lower casing letters in the ASCII range is not compatible to ASCII. Simplify our code by not calling the locale-specific standard toupper/tolower for the ASCII range but rely on our tables. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Alexander Bokovoy <ab@samba.org> Reviewed-by: Andreas Schneider <asn@samba.org> Autobuild-User(master): Volker Lendecke <vl@samba.org> Autobuild-Date(master): Mon Apr 4 11:45:24 UTC 2022 on sn-devel-184	2022-04-04 11:45:24 +00:00
Alex Richardson	2564e96e83	charset_macosxfs.c: fix compilation on macOS The DEBUG macro was missing and the CFStringGetBytes() was triggering a -Werror,-Wpointer-sign build failure. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14862 Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2021-10-13 01:42:35 +00:00
Douglas Bagnall	4711ad9e81	util/charset: warn loudly on unexpected E2BIG Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Fri Jun 18 04:27:17 UTC 2021 on sn-devel-184	2021-06-18 04:27:16 +00:00
Douglas Bagnall	1ea1816629	util/iconv: reject improperly packed UTF-8 If we allow a string that encodes say '\0' as a multi-byte sequence, we are open to confusion where we mix NUL terminated strings with sized data blobs, which is to say EVERYWHERE. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14684 Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Jeremy Allison <jra@samba.org>	2021-06-18 03:39:28 +00:00
Volker Lendecke	1c2460a87e	lib: Fix 'charset' dependencies With this, 'charset' could be a SAMBA_LIBRARY without any undefined symbols Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Tue Jan 12 01:19:26 UTC 2021 on sn-devel-184	2021-01-12 01:19:26 +00:00
Volker Lendecke	1701041d53	lib: Avoid "includes.h" in lib/util/charset/ Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2021-01-12 00:10:30 +00:00
Volker Lendecke	9de2c2c12d	lib: Remove using talloc_stack from lib/util/charset/ 'charset' should be as standalone as possible, and for this one use talloc_stackframe() is not really necessary. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2021-01-12 00:10:30 +00:00
Volker Lendecke	8b5eda7535	lib: Move utf16_len[_n]() to lib/util/charset/ util_unistr.c references it, avoid broken dependencies Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2021-01-12 00:10:30 +00:00
Volker Lendecke	3d0e55b6d9	build: Move weird.c and charset_macosxfs.c to ICONV_WRAPPER iconv.c directly references them, it does not make sense to have it without them. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2021-01-12 00:10:30 +00:00
Volker Lendecke	8c02ebdbf8	lib: Simplify "weird" charset code Don't depend on DEBUG. This is a pure developer module, the developer should be able to figure out what's going on after this has abort()ed. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2021-01-12 00:10:30 +00:00
Volker Lendecke	8f08390c28	lib: Move ucs2_align() to 'charset' subsystem Fix a circular dependency: util_str_common.c depends on 'charset', which depends on util_str_common.c. Fix that. Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2021-01-12 00:10:30 +00:00
Volker Lendecke	41e1b34026	lib: Use hex_byte() in ucs2hex_pull() Signed-off-by: Volker Lendecke <vl@samba.org> Reviewed-by: Ralph Boehme <slow@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org>	2021-01-08 20:31:33 +00:00
Matthew DeVore	232054c09b	lib/util: remove extra safe_string.h file lib/util/safe_string.h is similar to source3/include/safe_string.h, but the former has fewer checks. It is missing bcopy, strcasecmp, and strncasecmp. Add the missing elements to lib/util/safe_string.h remove the other safe_string.h which is in the source3-specific path. To accomodate existing uses of str(n?)casecmp, add #undef lines to source files where they are used. Signed-off-by: Matthew DeVore <matvore@google.com> Reviewed-by: David Mulder <dmulder@samba.org> Reviewed-by: Jeremy Allison <jra@samba.org> Autobuild-User(master): Jeremy Allison <jra@samba.org> Autobuild-Date(master): Fri Aug 28 02:18:40 UTC 2020 on sn-devel-184	2020-08-28 02:18:40 +00:00
Ralph Boehme	276d280d27	lib/util: add talloc_alpha_strcpy() Signed-off-by: Ralph Boehme <slow@samba.org> Reviewed-by: Andreas Schneider <asn@samba.org>	2020-02-06 10:17:42 +00:00
Andrew Bartlett	34a8cee348	CVE-2019-14907 lib/util: Do not print the failed to convert string into the logs The string may be in another charset, or may be sensitive and certainly may not be terminated. It is not safe to just print. Found by Robert Święcki using a fuzzer he wrote for smbd. BUG: https://bugzilla.samba.org/show_bug.cgi?id=14208 Signed-off-by: Andrew Bartlett <abartlet@samba.org>	2020-01-21 10:11:39 +00:00
Swen Schillig	84e519f365	util: Free memory in charset torture test to satisfy sanitizer Signed-off-by: Swen Schillig <swen@linux.ibm.com> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Reviewed-by: Matthias Dieter Wallnöfer <mdw@samba.org>	2019-08-08 10:08:32 +00:00
Ralph Boehme	2a90202052	charset: add tests for Unicode NFC <-> NFD conversion Signed-off-by: Ralph Boehme <slow@samba.org> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Wed Aug 7 07:25:39 UTC 2019 on sn-devel-184	2019-08-07 07:25:39 +00:00
Ralph Boehme	107020793c	charset: add support for Unicode normalisation with libicu This adds a direct conversion hook using libicu to perform NFC <-> NFD conversion on UTF8 strings. The defined charset strings are "UTF8-NFC" and "UTF8-NFD", to convert from one to the other the caller calls smb_iconv_open() with the desired source and target charsets, eg smb_iconv_open("UTF8-NFD", "UTF8-NFC"); for converting from NFC to NFD. Signed-off-by: Ralph Boehme <slow@samba.org> Reviewed-by: Andrew Bartlett <abartlet@samba.org>	2019-08-07 06:07:28 +00:00

1 2 3 4 5 ...

252 Commits