IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
If these are truly unicode codepoints (< ~2m) there is no overflow,
but the type is defined as uint32_t.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
We now test cases:
1. where the first string compares less
2. one of the strings ends before the other
3. the strings differ on a character other than the first.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
strncasecmp_m is supposed to return a negative, zero, or positive
number, not necessarily the difference between the codepoints in
the first character that differs, which we have been asserting up to
now.
This fixes a knownfail on 32 bit.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
strcasecmp_m is supposed to return a negative, zero, or positive
number, depending on whether the first argument is less than, equal to,
or greater than the second argument (respectively).
We have been asserting that it returns exactly the difference between
the codepoints in the first character that differs.
This fixes a knownfail on 32 bit.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15625
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
A u16string is supposed to contain UTF‐16 code units, but
ndr_pull_u16string() and ndr_push_u16string() fail to correctly ensure
this on big‐endian systems. Code that relies on the u16string array
containing correct values will then fail.
Fix ndr_pull_u16string() and ndr_push_u16string() to work on big‐endian
systems, ensuring that other code can use these strings without having
to worry about first encoding them to little‐endian.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This is in line with ‘talloc_str[n]dup()’.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This function returns the length in bytes — at most ‘n’ — of a UTF‐16
string excluding the null terminator.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Change ‘<’ to ‘<=’ so that we check the final UTF‐16 code unit in our
search for the null terminator. This makes no difference to the result:
if we’ve reached the final code unit without finding a terminator, the
final code unit will be included in the length whether it is a null
terminator or not.
Why make this change? We’re about to factor out this loop into a new
function, utf16_len_n(), where including the final code unit *will*
matter.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This function returns the length in bytes of a UTF‐16 string excluding
the null terminator.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
The new name indicates that — contrary to functions such as strnlen() —
the length may include the terminator.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
The new name indicates that — contrary to functions such as strnlen() —
the length may include the terminator.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Consider the non-conforment utf-8 sequence "\xf5\x80\x80\x80", which
would encode 0x140000. We would set the high byte of the first
surrogate to 0xd8 | (0x130000 >> 18), or 0xdc, which is an invalid
start for a high surrogate, making the sequence as a whole invalid (as
you would expect -- the Unicode range was set precisely to that
covered by utf-16 surrogates).
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
At present, if we meet a string like "hello \xed\xa7\x96 world", the
bytes in the middle will be converted into half of a surrogate pair,
and the UTF-16 will be invalid. It is better to error out immediately,
because the UTF-8 string is already invalid.
https://learn.microsoft.com/en-us/windows/win32/api/Stringapiset/nf-stringapiset-widechartomultibyte#remarks
is a citation for the statement about this being a pre-Vista
problem.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
because it wasn't entirely obvious (a zero length string returns a
length 1 result).
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Previous commit to the "embarrassing" line was ce10a7a673 "Fix
typo in comment", which did not completely fix the typo in the
comment.
But there are no gotos anymore, so no embarrassment, however spelt.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
CH_DISPLAY was removed in commit
125a2ff262, but NUM_CHARSETS was not
updated to match.
By assigning to NUM_CHARSETS the last enumeration value in charset_t, we
guard against its falling out of sync again.
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
This is an alternative patch for MR2339: It seems that Windows AD in
turkish locale is ASCII-compatible with 'i'. Björn tells me that the
turkish locale is the only one where upper/lower casing letters in the
ASCII range is not compatible to ASCII.
Simplify our code by not calling the locale-specific standard
toupper/tolower for the ASCII range but rely on our tables.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Mon Apr 4 11:45:24 UTC 2022 on sn-devel-184
The DEBUG macro was missing and the CFStringGetBytes() was triggering a
-Werror,-Wpointer-sign build failure.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14862
Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Fri Jun 18 04:27:17 UTC 2021 on sn-devel-184
If we allow a string that encodes say '\0' as a multi-byte sequence,
we are open to confusion where we mix NUL terminated strings with
sized data blobs, which is to say EVERYWHERE.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14684
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Jeremy Allison <jra@samba.org>
With this, 'charset' could be a SAMBA_LIBRARY without any undefined symbols
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Tue Jan 12 01:19:26 UTC 2021 on sn-devel-184
'charset' should be as standalone as possible, and for this one use
talloc_stackframe() is not really necessary.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
iconv.c directly references them, it does not make sense to have it
without them.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Don't depend on DEBUG. This is a pure developer module, the developer
should be able to figure out what's going on after this has abort()ed.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Fix a circular dependency: util_str_common.c depends on 'charset',
which depends on util_str_common.c. Fix that.
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
lib/util/safe_string.h is similar to source3/include/safe_string.h, but
the former has fewer checks. It is missing bcopy, strcasecmp, and
strncasecmp.
Add the missing elements to lib/util/safe_string.h remove the other
safe_string.h which is in the source3-specific path. To accomodate
existing uses of str(n?)casecmp, add #undef lines to source files where
they are used.
Signed-off-by: Matthew DeVore <matvore@google.com>
Reviewed-by: David Mulder <dmulder@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Fri Aug 28 02:18:40 UTC 2020 on sn-devel-184
The string may be in another charset, or may be sensitive and
certainly may not be terminated. It is not safe to just print.
Found by Robert Święcki using a fuzzer he wrote for smbd.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=14208
Signed-off-by: Andrew Bartlett <abartlet@samba.org>
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Autobuild-User(master): Andrew Bartlett <abartlet@samba.org>
Autobuild-Date(master): Wed Aug 7 07:25:39 UTC 2019 on sn-devel-184
This adds a direct conversion hook using libicu to perform NFC <-> NFD
conversion on UTF8 strings. The defined charset strings are "UTF8-NFC" and
"UTF8-NFD", to convert from one to the other the caller calls smb_iconv_open()
with the desired source and target charsets, eg
smb_iconv_open("UTF8-NFD", "UTF8-NFC");
for converting from NFC to NFD.
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>