1
0
mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00
Commit Graph

225 Commits

Author SHA1 Message Date
Douglas Bagnall
e4da279b1c util/str: helper to check for utf-8 validity
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-09-26 23:45:36 +00:00
Joseph Sutton
dd2b568721 lib:charset: Fix code spelling
Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-09-11 02:42:41 +00:00
Joseph Sutton
355fd3c7bf lib:charset: Update NUM_CHARSETS to reflect true value
CH_DISPLAY was removed in commit
125a2ff262, but NUM_CHARSETS was not
updated to match.

By assigning to NUM_CHARSETS the last enumeration value in charset_t, we
guard against its falling out of sync again.

Signed-off-by: Joseph Sutton <josephsutton@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2023-08-08 04:39:37 +00:00
Andreas Schneider
cfa53c8a80 lib:util: Fix code spelling
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Joseph Sutton <josephsutton@catalyst.net.nz>
2023-04-14 05:25:33 +00:00
Volker Lendecke
7fe12e79f9 lib: Fix a typo
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2022-08-26 18:54:37 +00:00
Volker Lendecke
4171736339 lib: Stay ASCII-compatible for toupper_m/tolower_m
This is an alternative patch for MR2339: It seems that Windows AD in
turkish locale is ASCII-compatible with 'i'. Björn tells me that the
turkish locale is the only one where upper/lower casing letters in the
ASCII range is not compatible to ASCII.

Simplify our code by not calling the locale-specific standard
toupper/tolower for the ASCII range but rely on our tables.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Alexander Bokovoy <ab@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>

Autobuild-User(master): Volker Lendecke <vl@samba.org>
Autobuild-Date(master): Mon Apr  4 11:45:24 UTC 2022 on sn-devel-184
2022-04-04 11:45:24 +00:00
Alex Richardson
2564e96e83 charset_macosxfs.c: fix compilation on macOS
The DEBUG macro was missing and the CFStringGetBytes() was triggering a
-Werror,-Wpointer-sign build failure.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14862

Signed-off-by: Alex Richardson <Alexander.Richardson@cl.cam.ac.uk>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-10-13 01:42:35 +00:00
Douglas Bagnall
4711ad9e81 util/charset: warn loudly on unexpected E2BIG
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Fri Jun 18 04:27:17 UTC 2021 on sn-devel-184
2021-06-18 04:27:16 +00:00
Douglas Bagnall
1ea1816629 util/iconv: reject improperly packed UTF-8
If we allow a string that encodes say '\0' as a multi-byte sequence,
we are open to confusion where we mix NUL terminated strings with
sized data blobs, which is to say EVERYWHERE.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14684

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-06-18 03:39:28 +00:00
Volker Lendecke
1c2460a87e lib: Fix 'charset' dependencies
With this, 'charset' could be a SAMBA_LIBRARY without any undefined symbols

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Tue Jan 12 01:19:26 UTC 2021 on sn-devel-184
2021-01-12 01:19:26 +00:00
Volker Lendecke
1701041d53 lib: Avoid "includes.h" in lib/util/charset/
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
9de2c2c12d lib: Remove using talloc_stack from lib/util/charset/
'charset' should be as standalone as possible, and for this one use
talloc_stackframe() is not really necessary.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
8b5eda7535 lib: Move utf16_len[_n]() to lib/util/charset/
util_unistr.c references it, avoid broken dependencies

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
3d0e55b6d9 build: Move weird.c and charset_macosxfs.c to ICONV_WRAPPER
iconv.c directly references them, it does not make sense to have it
without them.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
8c02ebdbf8 lib: Simplify "weird" charset code
Don't depend on DEBUG. This is a pure developer module, the developer
should be able to figure out what's going on after this has abort()ed.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
8f08390c28 lib: Move ucs2_align() to 'charset' subsystem
Fix a circular dependency: util_str_common.c depends on 'charset',
which depends on util_str_common.c. Fix that.

Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-12 00:10:30 +00:00
Volker Lendecke
41e1b34026 lib: Use hex_byte() in ucs2hex_pull()
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2021-01-08 20:31:33 +00:00
Matthew DeVore
232054c09b lib/util: remove extra safe_string.h file
lib/util/safe_string.h is similar to source3/include/safe_string.h, but
the former has fewer checks. It is missing bcopy, strcasecmp, and
strncasecmp.

Add the missing elements to lib/util/safe_string.h remove the other
safe_string.h which is in the source3-specific path. To accomodate
existing uses of str(n?)casecmp, add #undef lines to source files where
they are used.

Signed-off-by: Matthew DeVore <matvore@google.com>
Reviewed-by: David Mulder <dmulder@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Fri Aug 28 02:18:40 UTC 2020 on sn-devel-184
2020-08-28 02:18:40 +00:00
Ralph Boehme
276d280d27 lib/util: add talloc_alpha_strcpy()
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
2020-02-06 10:17:42 +00:00
Andrew Bartlett
34a8cee348 CVE-2019-14907 lib/util: Do not print the failed to convert string into the logs
The string may be in another charset, or may be sensitive and
certainly may not be terminated.  It is not safe to just print.

Found by Robert Święcki using a fuzzer he wrote for smbd.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=14208
Signed-off-by: Andrew Bartlett <abartlet@samba.org>
2020-01-21 10:11:39 +00:00
Swen Schillig
84e519f365 util: Free memory in charset torture test to satisfy sanitizer
Signed-off-by: Swen Schillig <swen@linux.ibm.com>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Matthias Dieter Wallnöfer <mdw@samba.org>
2019-08-08 10:08:32 +00:00
Ralph Boehme
2a90202052 charset: add tests for Unicode NFC <-> NFD conversion
Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>

Autobuild-User(master): Andrew Bartlett <abartlet@samba.org>
Autobuild-Date(master): Wed Aug  7 07:25:39 UTC 2019 on sn-devel-184
2019-08-07 07:25:39 +00:00
Ralph Boehme
107020793c charset: add support for Unicode normalisation with libicu
This adds a direct conversion hook using libicu to perform NFC <-> NFD
conversion on UTF8 strings. The defined charset strings are "UTF8-NFC" and
"UTF8-NFD", to convert from one to the other the caller calls smb_iconv_open()
with the desired source and target charsets, eg

  smb_iconv_open("UTF8-NFD", "UTF8-NFC");

for converting from NFC to NFD.

Signed-off-by: Ralph Boehme <slow@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2019-08-07 06:07:28 +00:00
Noel Power
add47e288b lib/util/charset: clang: Fix Value stored to 'reason' is never read warning
Fixes:

lib/util/charset/convert_string.c:301:5: warning: Value stored to 'reason' is never read <--[clang]

Signed-off-by: Noel Power <noel.power@suse.com>
Reviewed-by: Gary Lockyer gary@catalyst.net.nz
2019-06-11 12:10:17 +00:00
Douglas Bagnall
ac9333cb91 util/charset/torture: ensure each cp850 high bytes is 3 utf8 bytes
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2019-05-15 04:03:37 +00:00
Douglas Bagnall
103d248bee util/charset/convert: do not pretend to realloc
It seems very likely that our clever attempts to dynamically realloc
the output buffer were never triggered. Two lines of reasoning lead to
this conclusion:

1. We allocate 3 * srclen to start with, but no conversion we use will
   more than that. To be precise, from 8-bit charsets we will only deal
   with codepoints in the Unicode basic multilingual plane (up to 0xFFFF).

   These can all be expressed as 3 or fewer utf-8 bytes. In UTF16 they
   are naturally 2 bytes, while in the DOS codes they are 1 byte.

   We have checked the code tables, and can not find a plausible
   (e.g. not EBCDIC) DOS code page or unix charset that is outside
   this range.  Clients cannot chose the code page, the only code
   pages we will use come from 'unix charset' and 'dos charset'
   smb.conf parameters.

   Therefore the worst that can possibly happen is we expand 1 byte into 3
   (specifically, when converting some e.g. CP850 codepoints to UTF-8).

2. If the reallocation was ever used, the results would have been
   catastrophically wrong, as the input pointer was not reset.

Therefore we skip the complication of the goto loop and let E2BIG be
just another impossible error to report.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2019-05-15 04:03:37 +00:00
Douglas Bagnall
a6f47b4f75 util/charset/convert: when retrying, retry from the start
iconv() advances the inbuf pointer; if we decide to realloc and re-iconv,
we need to reset inbuf to the source string

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2019-05-15 04:03:37 +00:00
Douglas Bagnall
02507ebf10 util/charset/convert: do not overflow dest len in corner case
Now, if destlen were SIZE_MAX - 1, destlen * 2 would wrap to SIZE_MAX - 3,
which makes (destlen * 2 + 2) == SIZE_MAX - 1, the same number again.
So we need the <= comparison in this case.

As things stand, it is not actually possible for destlen to be
SIZE_MAX (because it is always an even number after the first round,
and the first round is constrained to be < SIZE_MAX / 2, but *if*
destlen was SIZE_MAX, destlen * 2 + 2 would be 0, so that case is OK.
Similarly the SIZE_MAX - 2 and smaller cases were covered by the
original formula.

We add the comment for people who are wondering WTF is going on with
all this destlen manipulation.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2019-05-15 04:03:37 +00:00
Douglas Bagnall
265b3b0c6c util/charset/convert: do not overflow dest len
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2019-05-15 04:03:37 +00:00
Douglas Bagnall
09355b7855 util/charset/convert_string: always set length
In failure cases the destination string pointer is set to NULL, but
the size is not changed. Some callers have not been checking the
return value and passing the destination pointer and uninitialised
length onto other functions. We can curse and blame those callers, but
let's also keep them safe.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2019-05-15 04:03:37 +00:00
Andreas Schneider
fe78ebcbc0 lib:util: Use C99 initializer for weird_table in charset
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2019-01-28 10:29:22 +01:00
Andreas Schneider
014a72c7da lib:util: Use C99 initializer for builtin_functions in iconv
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
2019-01-28 10:29:22 +01:00
Andreas Schneider
8075025303 lib:util: Use #ifdef instead of #if for config.h definitions
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Gary Lockyer <gary@catalyst.net.nz>
2018-11-28 23:19:22 +01:00
Christof Schmitt via samba-technical
3430c9c3c2 lib:charset: Fix error messages from charset conversion
When e.g. trying to access a filename through Samba that does not adhere
to the encoding configured in 'unix charset', the log will show the
encoding problem, followed by "strstr_m: src malloc fail". The problem
is that strstr_m assumes that any failure from push/pull_ucs2_talloc is
a memory allocation problem, which is not correct.

Address this by removing the misleading messages and add a missing
message in convert_string_talloc_handle.

Signed-off-by: Christof Schmitt <cs@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2018-07-07 13:41:09 +02:00
Douglas Bagnall
2157e8d83e util/charset/iconv: use read_hex_bytes rather than sscanf
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-05-31 01:57:16 +02:00
Andreas Schneider
8da568efea lib:util: Add FALL_THROUGH statements in charset/charset_macosxfs.c
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>
2018-03-01 04:37:41 +01:00
Stefan Metzmacher
3ed9c90367 charset: fix str[n]casecmp_m() by comparing lower case values
The commits c615ebed6e3d273a682806b952d543e834e5630d^..f19ab5d334e3fb15761fb009e5de876dfc6ea785
replaced Str[n]CaseCmp() by str[n]casecmp_m().

The logic we had in str[n]casecmp_w() used to compare
the upper cased as well as the lower cased versions of the
characters and returned the difference between the lower cased versions.

BUG: https://bugzilla.samba.org/show_bug.cgi?id=13018

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>

Autobuild-User(master): Ralph Böhme <slow@samba.org>
Autobuild-Date(master): Fri Sep 15 02:23:29 CEST 2017 on sn-devel-144
2017-09-15 02:23:29 +02:00
Stefan Metzmacher
9d99b640b9 charset/tests: also tests the system str[n]casecmp()
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13018

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2017-09-14 22:30:20 +02:00
Stefan Metzmacher
2a3d4fe0c9 charset/tests: add more str[n]casecmp_m() tests to demonstrate the bug
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13018

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2017-09-14 22:30:20 +02:00
Stefan Metzmacher
c18ecdecec charset/tests: assert the exact values of str[n]casecmp_m()
BUG: https://bugzilla.samba.org/show_bug.cgi?id=13018

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2017-09-14 22:30:20 +02:00
Jeremy Allison
bf8f7a36bf lib:charset: Remove use of talloc_autofree_context() for global_iconv_handle
All other callers use NULL here anyway, so there's no
need to use a special context for get_iconv_handle().

Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
2017-04-18 11:47:17 +02:00
Jeremy Allison
35b23711e8 lib:charset: Make global_iconv_handle private
Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
2017-04-18 11:47:17 +02:00
Jeremy Allison
c28e2c937a lib:charset: Add utility functions reinit_iconv_handle() and free_iconv_handle(void)
Not yet used. Will enable us to make global_iconv_handle private.

Signed-off-by: Jeremy Allison <jra@samba.org>
Reviewed-by: Andreas Schneider <asn@samba.org>
2017-04-18 11:47:17 +02:00
Volker Lendecke
b224b2033d lib: Avoid an includes.h
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2017-03-28 17:45:19 +02:00
Volker Lendecke
2ad26a63c9 lib: Avoid an includes.h
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2017-03-28 17:45:19 +02:00
Andreas Schneider
7fd3eb6c04 util:charset: Return EILSEQ in smb_iconv() if newer libc is detected
This is the behaviour of glibc 2.24 and newer.

Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>

Autobuild-User(master): Jeremy Allison <jra@samba.org>
Autobuild-Date(master): Wed Feb  1 05:16:46 CET 2017 on sn-devel-144
2017-02-01 05:16:46 +01:00
Volker Lendecke
07d9a909ba lib/util/charset: Optimize next_codepoint for the ascii case
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2017-01-22 18:30:11 +01:00
Andreas Schneider
dce3f1fc60 util: Fix the documentation of push_utf8_talloc()
Signed-off-by: Andreas Schneider <asn@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2016-09-09 00:32:12 +02:00
Volker Lendecke
93b982faad lib: Give base64.c its own .h
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Jeremy Allison <jra@samba.org>
2016-05-04 01:28:23 +02:00
Douglas Bagnall
538d305de9 CVE-2015-5330: next_codepoint_handle_ext: don't short-circuit UTF16 low bytes
UTF16 contains zero bytes when it is encoding ASCII (for example), so we
can't assume the absense of the 0x80 bit means a one byte encoding. No
current callers use UTF16.

Bug: https://bugzilla.samba.org/show_bug.cgi?id=11599

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Pair-programmed-with: Andrew Bartlett <abartlet@samba.org>
Reviewed-by: Ralph Boehme <slow@samba.org>
2015-12-09 17:19:53 +01:00