1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-07 17:18:11 +03:00
Commit Graph

266 Commits

Author SHA1 Message Date
Volker Lendecke
edc1f99ffa lib: Move some R/W "data" segment to R/O "text"
Doesn't really matter for tests, but I just came across it.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Martin Schwenke <martin@meltin.net>
2024-12-02 04:53:33 +00:00
Earl Chew
1655413f12 Describe implication of upstream ICU-22610
Add commentary to link commit 86c7688 (MR !3447) to the upstream
fix for ICU-22610 in case there is subsequent breakage.

Signed-off-by: Earl Chew <earl_chew@yahoo.com>
Reviewed-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>

Autobuild-User(master): Andrew Bartlett <abartlet@samba.org>
Autobuild-Date(master): Fri Nov  8 00:20:38 UTC 2024 on atb-devel-224
2024-11-08 00:20:38 +00:00
Douglas Bagnall
f914f53913 util:charset: s/the the\b/the/ in comments
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Volker Lendecke <vl@samba.org>
2024-11-06 10:57:35 +00:00
Joseph Sutton
228dd73cae util:charset: Remove unreachable code (CID 1272948)
Suppose that ‘slen’ is equal to (size_t)-1. A few lines up, we had:

    if (lastp != 0) goto slow_path;

Therefore, ‘lastp’ must evaluate to false.

Now suppose that ‘slen’ is not equal to (size_t)-1. In that case, we
would have executed:

    if (slen != 0) goto slow_path;

Therefore, ‘slen’ must evaluate to false.

Consequently, this code can be seen to be unreachable.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2024-08-28 04:24:39 +00:00
Volker Lendecke
a8405ed15b lib: Remove unused strnrchr_w
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
2024-07-04 15:26:36 +00:00
Douglas Bagnall
f9797950fd util:charset: strncasecmp_ldb avoids iconv for ASCII
This is a common case, and we can save a bit of work.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-05-22 23:12:32 +00:00
Douglas Bagnall
55397514db util:charset: strncasecmp_ldb degrades to ASCII strncasecmp
If strncasecmp_ldb() encounters invalid utf-8 bytes, it compares those
as greater than any valid bytes (that is, it sorts them to the end of
the list).

If an invalid sequence is encountered in both strings at once, the
rest of the strings are now compared using the default ldb_comparison_fold
rules, as implemented in ldb_comparison_fold_ascii(). That is, each
byte is compared individually, [a-z] are translated to [A-Z], and runs of
spaces are collapsed into single spaces.

There is no perfect answer in this case, but this solution is stable,
fine-grained, and probably close to what is expected. This
byte-by-byte comparison is equivalent to a utf-8 comparison without
case-folding of multibyte codes.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-05-22 23:12:32 +00:00
Douglas Bagnall
eb91e3437b util:charset: add strncasecmp_ldb()
This is a function for comparing strings in a way that suits a
case-insenstive syntaxes in LDB.

We have it here, rahter than in LDB itself, because it needs the
upcase table. By default uses ASCII-only comparisons. SSSD and
OpenChange use it in that configuration, but Samba replaces the
comparison and casefold functions with Unicode aware versions.

Until now Samba has done that in a bad way; this will allow it to do
better.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-05-22 23:12:32 +00:00
Douglas Bagnall
f9fbc7a506 lib/util/charset: be explicit about INVALID_CODEPOINT value
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-05-22 23:12:32 +00:00
Earl Chew
68a1200f66 Restore empty string default for conf.env['icu-libs']
The reworked ICU libraries configuration code used [] as
default for conf.env['icu-libs']. This breaks dependency analysis
in samba_deps.py because SAMBA_SUBSYSTEM() expects deps to be
a string.

Signed-off-by: Earl Chew <earl_chew@yahoo.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Volker Lendecke <vl@samba.org>

Autobuild-User(master): Andreas Schneider <asn@cryptomilk.org>
Autobuild-Date(master): Tue May 14 14:44:06 UTC 2024 on atb-devel-224
2024-05-14 14:44:06 +00:00
Earl Chew
05807488fd Combine ICU libraries icu-i18n and icu-uc into a single dependency
Rather than probing for icu-i18n, icu-uc, and icudata libraries
separately, only probe for icu-i18n, and icu-uc, as direct dependencies
This avoids overlinking with icudata, and allows the package
to build even when ICU is not installed as a system library.

RN: Only use icu-i18n and icu-uc to express ICU dependency

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15623

Signed-off-by: Earl Chew <earl_chew@yahoo.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2024-05-10 00:26:35 +00:00
Earl Chew
363c331857 Augment library_flags() to return libraries
Extend library_flags() to return the libraries provided by
pkg-config --libs.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15623

Signed-off-by: Earl Chew <earl_chew@yahoo.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2024-05-10 00:26:35 +00:00
Douglas Bagnall
13af2cb021 lib:util: codepoint_cmpi: be transitive and case-insensitive
the less/greater conparisons were not case-sensitive, which made the whole
function non-transitive.

I think codepoint_cmpi() is currently only used for equality tests, so
nothing will change.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-05-07 23:25:35 +00:00
Douglas Bagnall
310d59c7cc lib:util:tests: more tests for codepoint_cmpi
is codepoint_cmpi as case-insensitive as it claims when it comes to
inequalities? (no, it is not!).

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-05-07 23:25:35 +00:00
Douglas Bagnall
997b72d79e util: charset:util_str: use NUMERIC_CMP in strncasecmp_m_handle
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
f07ae69907 util:charset:codepoints: codepoint_cmpi warning about non-transitivity
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
675fdeee3d util:charset:codepoints: condepoint_cmpi uses NUMERIC_CMP()
If these are truly unicode codepoints (< ~2m) there is no overflow,
but the type is defined as uint32_t.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
f788a39999 util:charset:util_str: use NUMERIC_CMP in strcasecmp_m_handle
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
a512759d7b torture:charset: test more of strcasecmp_m
We now test cases:

1. where the first string compares less
2. one of the strings ends before the other
3. the strings differ on a character other than the first.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
dda0bb6fc7 torture:charset: use < and > assertions for strncasecmp_m
strncasecmp_m is supposed to return a negative, zero, or positive
number, not necessarily the difference between the codepoints in
the first  character that differs, which we have been asserting up to
now.

This fixes a knownfail on 32 bit.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Douglas Bagnall
ac0a8cd92c torture:charset: use < and > assertions for strcasecmp_m
strcasecmp_m is supposed to return a negative, zero, or positive
number, depending on whether the first argument is less than, equal to,
or greater than the second argument (respectively).

We have been asserting that it returns exactly the difference between
the codepoints in the first character that differs.

This fixes a knownfail on 32 bit.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2024-04-10 22:56:33 +00:00
Joseph Sutton
346844b730 librpc: Change type of ‘u16string’ from ‘const uint16_t *’ to ‘const unsigned char *’
A u16string is supposed to contain UTF‐16 code units, but
ndr_pull_u16string() and ndr_push_u16string() fail to correctly ensure
this on big‐endian systems. Code that relies on the u16string array
containing correct values will then fail.

Fix ndr_pull_u16string() and ndr_push_u16string() to work on big‐endian
systems, ensuring that other code can use these strings without having
to worry about first encoding them to little‐endian.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-12-21 23:48:46 +00:00
Joseph Sutton
1947bd6d6d util/charset: Remove trailing whitespace
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-12-08 02:28:33 +00:00
Joseph Sutton
4629fc7c61 util/charset: Have talloc_utf16_str[n]dup() accept NULL pointers
This is in line with ‘talloc_str[n]dup()’.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-20 21:50:32 +00:00
Joseph Sutton
939ceb233e util/charset: Add talloc_utf16_str[n]dup()
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-16 05:18:36 +00:00
Joseph Sutton
b6ff89f6fb util/charset: Include missing headers
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-16 05:18:36 +00:00
Joseph Sutton
3f0809f1ee util/charset: Remove unnecessary cast
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-16 05:18:36 +00:00
Joseph Sutton
ec3e420840 util/charset: Prefer PULL_LE_U16() to older SVAL() macro
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
99e0a0f21a util/charset/tests: Add tests for UTF‐16 string length functions
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
a46746381b util/charset: Add utf16_len_n()
This function returns the length in bytes — at most ‘n’ — of a UTF‐16
string excluding the null terminator.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
74a5a3b74e util/charset: Include final UTF‐16 code unit in length calculation loop
Change ‘<’ to ‘<=’ so that we check the final UTF‐16 code unit in our
search for the null terminator. This makes no difference to the result:
if we’ve reached the final code unit without finding a terminator, the
final code unit will be included in the length whether it is a null
terminator or not.

Why make this change? We’re about to factor out this loop into a new
function, utf16_len_n(), where including the final code unit *will*
matter.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
516f35b5a1 util/charset: Add utf16_len()
This function returns the length in bytes of a UTF‐16 string excluding
the null terminator.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
16996d145b util/charset: Rename utf16_len() to utf16_null_terminated_len()
The new name indicates that — contrary to functions such as strnlen() —
the length may include the terminator.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
542e5a3039 util/charset: Rename utf16_len_n() to utf16_null_terminated_len_n()
The new name indicates that — contrary to functions such as strnlen() —
the length may include the terminator.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Joseph Sutton
982238e914 util/charset: Remove trailing whitespace
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-11-15 22:07:36 +00:00
Douglas Bagnall
3960eabca7 libutil/iconv: avoid overflow in surrogate pairs
Consider the non-conforment utf-8 sequence "\xf5\x80\x80\x80", which
would encode 0x140000. We would set the high byte of the first
surrogate to 0xd8 | (0x130000 >> 18), or 0xdc, which is an invalid
start for a high surrogate, making the sequence as a whole invalid (as
you would expect -- the Unicode range was set precisely to that
covered by utf-16 surrogates).

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
949fe57077 libutil/iconv: don't allow wtf-8 surrogate pairs
At present, if we meet a string like "hello \xed\xa7\x96 world", the
bytes in the middle will be converted into half of a surrogate pair,
and the UTF-16 will be invalid. It is better to error out immediately,
because the UTF-8 string is already invalid.

https://learn.microsoft.com/en-us/windows/win32/api/Stringapiset/nf-stringapiset-widechartomultibyte#remarks
is a citation for the statement about this being a pre-Vista
problem.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
d7481f94e0 util/charset/torture: test convert_string_talloc with emptyish strings
because it wasn't entirely obvious (a zero length string returns a
length 1 result).

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
b5a728e81e util/convert string: remove inaccurate misspelt comment
Previous commit to the "embarrassing" line was ce10a7a673 "Fix
typo in comment", which did not completely fix the typo in the
comment.

But there are no gotos anymore, so no embarrassment, however spelt.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
df8ab7edfa util/charset: disambiguate docs for convert_string twins
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
7cf4efe768 lib/util/charset: @param typos
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-10-26 01:24:32 +00:00
Douglas Bagnall
e4da279b1c util/str: helper to check for utf-8 validity
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-09-26 23:45:36 +00:00
Joseph Sutton
dd2b568721 lib:charset: Fix code spelling
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-09-11 02:42:41 +00:00
Joseph Sutton
355fd3c7bf lib:charset: Update NUM_CHARSETS to reflect true value
CH_DISPLAY was removed in commit
125a2ff262, but NUM_CHARSETS was not
updated to match.

By assigning to NUM_CHARSETS the last enumeration value in charset_t, we
guard against its falling out of sync again.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-08-08 04:39:37 +00:00
Andreas Schneider
cfa53c8a80 lib:util: Fix code spelling
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2023-04-14 05:25:33 +00:00
Volker Lendecke
7fe12e79f9 lib: Fix a typo
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2022-08-26 18:54:37 +00:00
Volker Lendecke
4171736339 lib: Stay ASCII-compatible for toupper_m/tolower_m
This is an alternative patch for MR2339: It seems that Windows AD in
turkish locale is ASCII-compatible with 'i'. Björn tells me that the
turkish locale is the only one where upper/lower casing letters in the
ASCII range is not compatible to ASCII.

Simplify our code by not calling the locale-specific standard
toupper/tolower for the ASCII range but rely on our tables.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Mon Apr  4 11:45:24 UTC 2022 on sn-devel-184
2022-04-04 11:45:24 +00:00
Alex Richardson
2564e96e83 charset_macosxfs.c: fix compilation on macOS
The DEBUG macro was missing and the CFStringGetBytes() was triggering a
-Werror,-Wpointer-sign build failure.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14862

Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-10-13 01:42:35 +00:00
Douglas Bagnall
4711ad9e81 util/charset: warn loudly on unexpected E2BIG
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Fri Jun 18 04:27:17 UTC 2021 on sn-devel-184
2021-06-18 04:27:16 +00:00
Douglas Bagnall
1ea1816629 util/iconv: reject improperly packed UTF-8
If we allow a string that encodes say '\0' as a multi-byte sequence,
we are open to confusion where we mix NUL terminated strings with
sized data blobs, which is to say EVERYWHERE.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14684

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-06-18 03:39:28 +00:00