1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-12-25 23:21:26 +03:00
Commit Graph

182 Commits

Author SHA1 Message Date
Nick Wellnhofer
63ce5f9aed Make some globals const 2024-04-28 17:53:39 +02:00
Nick Wellnhofer
072facc49e encoding: Don't shrink input too early in xmlCharEncOutput
Some exotic encodings like ISO646-FR don't support '#' characters, so
encoding a character reference can actually fail. Don't skip the
offending input in this case so the error will be reported on the next
call.
2024-03-18 15:14:43 +01:00
Nick Wellnhofer
0821efc8ee encoding: Check whether encoding handlers support input/output
The "HTML" encoding handler doesn't support input which could lead to a
wrong error report.
2024-01-02 19:48:23 +01:00
Nick Wellnhofer
023aecc474 encoding: Support ASCII in xmlLookupCharEncodingHandler
Return our built-in ASCII handler. This was never implemented and
triggered the new and stricter error checks.
2023-12-13 23:58:45 +01:00
Nick Wellnhofer
bd5ad0308d encoding: Report malloc failures
Introduce new API functions that return a separate error code if a
memory allocation fails.

- xmlOpenCharEncodingHandler
- xmlLookupCharEncodingHandler

Fix a few places where malloc failures weren't reported.
2023-12-11 22:05:47 +01:00
Nick Wellnhofer
89d19534de encoding: Fix decoding of large chunks
After 95e81a36, we must support XML_ENC_ERR_SPACE when using built-in
encoding handlers.

Should fix #610.
2023-10-28 03:14:13 +02:00
Nick Wellnhofer
1734d27dca encoding: Suppress -Wcast-align warnings 2023-10-02 15:04:18 +02:00
Nick Wellnhofer
0533daf5d2 encoding: Fix infinite loop in xmlCharEncInput
Short-lived regression from 95e81a36.
2023-09-29 12:43:46 +02:00
Nick Wellnhofer
8c084ebdc7 doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
Nick Wellnhofer
699299cae3 globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
Nick Wellnhofer
7909ff08e2 include: Remove unnecessary includes
- Don't include tree.h from encoding.h
- Don't include parser.h from xmlIO.h
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
507f11edf0 encoding: Remove debugging code 2023-08-16 19:50:36 +02:00
Nick Wellnhofer
95e81a360c parser: Decode all data in xmlCharEncInput
Even with flush set to true, xmlCharEncInput didn't guarantee to decode
all data. This complicated the push parser.

Remove the flush flag and always decode all available data.

Also fix ICU code where the flush flag has a different meaning. Always
set flush to false and retry even with empty input buffers.
2023-08-08 15:21:31 +02:00
Nick Wellnhofer
4ee0815514 encoding: Move rawconsumed accounting to xmlCharEncInput 2023-08-08 15:19:51 +02:00
Nick Wellnhofer
b236b7a588 parser: Halt parser when growing buffer results in OOM
Fix short-lived regression from previous commit.

It might be safer to make xmlBufSetInputBaseCur use the original buffer
even in case of errors.

Found by OSS-Fuzz.
2023-06-08 21:59:20 +02:00
Nick Wellnhofer
db21cd5db9 malloc-fail: Handle malloc failures in xmlAddEncodingAlias
Avoid memory errors if an allocation fails.

See #344. Fixes #553.
2023-06-06 14:25:30 +02:00
Nick Wellnhofer
2f12e3a938 encoding: Stop calling xmlEncodingErr
This invokes the global error handler which should be avoided.
2023-04-30 21:45:39 +02:00
Nick Wellnhofer
320f5084cd parser: Improve handling of encoding and IO errors
Make sure that xmlCharEncInput, xmlParserInputBufferPush and
xmlParserInputBufferGrow set the correct error code in the
xmlParserInputBuffer. Handle errors when calling these functions.
2023-04-30 21:31:54 +02:00
Nick Wellnhofer
3ff6abbf58 encoding: Rework error codes
Use an enum instead of magic numbers. Fix a few error codes. Simplify
handling of "space" and "partial" errors.

See #506.
2023-04-30 16:43:29 +02:00
Nick Wellnhofer
33fb297b36 encoding: Fix compiler warning in ICU build 2023-04-17 14:59:47 +02:00
Nick Wellnhofer
a6b9e55a9e encoding: Fix error code in asciiToUTF8
Use correct error code when invalid ASCII bytes are encountered.

Found by OSS-Fuzz.
2023-03-26 15:42:02 +02:00
Nick Wellnhofer
98840d40da parser: Rework EBCDIC code page detection
To detect EBCDIC code pages, we used to switch the encoding twice and
had to be very careful not to decode data after the XML declaration
before the second switch. This relied on a hard-coded expected size of
the XML declaration and was complicated and unreliable.

Now we convert the first 200 bytes to EBCDIC-US and parse the encoding
declaration manually.
2023-03-21 21:35:15 +01:00
Nick Wellnhofer
1c5e1fc194 malloc-fail: Check for malloc failure in xmlFindCharEncodingHandler
Don't return encoding handlers with a NULL name.

Found with libFuzzer, see #344.
2023-02-17 17:16:50 +01:00
Nick Wellnhofer
d18f9c1102 malloc-fail: Fix leak of xmlCharEncodingHandler
Also free handler if its name is NULL.

Found with libFuzzer, see #344.
2023-02-17 17:16:50 +01:00
Nick Wellnhofer
3cc900f098 encoding: Cast toupper argument to unsigned char
Fixes undefined behavior.

Also cast return value explicitly to fix implicit-integer-sign-change
checks.
2023-02-17 17:16:50 +01:00
Nick Wellnhofer
2355eac59e malloc-fail: Fix null deref if growing input buffer fails
Also add some error checks.

Found with libFuzzer, see #344.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
0f54af7494 encoding.c: Fix for documentation generator
Top-level macro invocations throw off the documentation parser.
2022-12-08 18:40:58 +01:00
Nick Wellnhofer
53ab38408d encoding: Make init function private 2022-11-27 02:11:07 +01:00
Nick Wellnhofer
3e9d5e4f7f encoding: Remove unused variable xmlDefaultCharEncodingHandler 2022-11-27 02:11:07 +01:00
Nick Wellnhofer
1406b20fe9 encoding: Allocate default handlers statically 2022-11-24 19:21:01 +01:00
Nick Wellnhofer
2059df5358 buf: Deprecate static/immutable buffers 2022-11-20 21:16:03 +01:00
Nick Wellnhofer
ad338ca737 Remove explicit integer casts
Remove explicit integer casts as final operation

- in assignments
- when passing arguments
- when returning values

Remove casts

- to the same type
- from certain range-bound values

The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.

Document some explicit casts as truncating and add a few missing ones.
2022-09-01 02:33:57 +02:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
David Kilzer
c14cac8bba xmlBufAvail() should return length without including a byte for NUL terminator
* buf.c:
(xmlBufAvail):
- Return the number of bytes available in the buffer, but do not
  include a byte for the NUL terminator so that it is reserved.

* encoding.c:
(xmlCharEncFirstLineInput):
(xmlCharEncInput):
(xmlCharEncOutput):
* xmlIO.c:
(xmlOutputBufferWriteEscape):
- Remove code that subtracts 1 from the return value of
  xmlBufAvail().  It was implemented inconsistently anyway.
2022-05-25 18:25:19 -07:00
David Kilzer
21561e833a Mark more static data as const
Similar to 8f5710379, mark more static data structures with
`const` keyword.

Also fix placement of `const` in encoding.c.

Original patch by Sarah Wilkin.
2022-04-07 12:01:23 -07:00
Nick Wellnhofer
40483d0ce2 Deprecate module init and cleanup functions
These functions shouldn't be part of the public API. Most init
functions are only thread-safe when called from xmlInitParser. Global
variables should only be cleaned up by calling xmlCleanupParser.
2022-03-06 15:59:43 +01:00
Nick Wellnhofer
f2072a8b2f Fix memory leak in xmlFindCharEncodingHandler
Fix memory leak in an unlikely error condition. Thanks to Wentao Liang
for the report.

Fixes #342.
2022-03-05 18:27:12 +01:00
Nick Wellnhofer
21ddad5284 Remove ICONV_CONST test
We can simply cast the offending pointer to (void *).
2022-03-04 22:08:58 +01:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
b66ce0bba8 Don't include ICU headers in public headers
There's no need to make these implementation details public.
2022-03-01 13:02:49 +01:00
Nick Wellnhofer
c41bc10da3 Fix unused variable warnings with disabled features 2022-02-22 19:57:12 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
7abc6e6a24 Fix integer conversion warning in xmlIconvWrapper
Use size_t for return value of iconv(3) to avoid an UBSan integer
conversion warning.
2022-01-25 03:07:30 +01:00
Mohammad Razavi
eb4c1bf855 Fix random dropping of characters on dumping ASCII encoded XML
Fix a bug in xmlCharEncOutput return value which will cause
xmlNodeDumpOutput to drop characters randomly.

xmlCharEncOutput returns zero if the length of the input buffer is
zero but ignores the fact that it may already encoded the input buffer
and the input's length is zero due to the fact that xmlEncOutputChunk
returned -2 errors and underlying code tries to fix the error by
encoding the input.

xmlCharEncOutput is collecting the number of bytes written to the
output buffer but is returning zero instead of the total number of
bytes in this situation. This commit will fix this issue by returning
the total number of bytes instead. So the xmlNodeDumpOutput will also
continue writing and will not stop due to the fact that it mistakenly
thinks the output buffer is not changed in that iteration.

Fixes #314
2022-01-16 15:08:44 +01:00
David Kilzer
03bb929390 Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk
This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8().

* encoding.c:
(UTF16LEToUTF8):
- Fix comment to describe what the code does.
(UTF16BEToUTF8):
- Fix undefined behavior which was applied to UTF16LEToUTF8() in
  2f9382033e.
- Add bounds check to while() loop which was applied to
  UTF16LEToUTF8() in be803967db.
- Do not return -2 when (in >= inend) to fix the bug.  This was
  applied to UTF16LEToUTF8() in 496a1cf592.
- Inline (<< 8) statements to match UTF16LEToUTF8().

Add the following tests and results:

  test/text-4-byte-UTF-16-BE-offset.xml
  test/text-4-byte-UTF-16-BE.xml
  test/text-4-byte-UTF-16-LE-offset.xml
  test/text-4-byte-UTF-16-LE.xml
2022-01-16 14:07:17 +01:00
David King
b92b16f659 Remove unused variable in xmlCharEncOutFunc
Fixes a compiler warning:

encoding.c: In function 'xmlCharEncOutFunc__internal_alias':
encoding.c:2632:9: warning: unused variable 'output' [-Wunused-variable]
 2632 |     int output = 0;

https://gitlab.gnome.org/GNOME/libxml2/-/issues/254
2021-05-23 11:55:32 +02:00
Nick Wellnhofer
dcb80b92da Fix slow parsing of HTML with encoding errors
Under certain circumstances, the HTML parser would try to guess and
switch input encodings multiple times, leading to slow processing of
documents with encoding errors. The repeated scanning of the input
buffer when guessing encodings could even lead to quadratic behavior.

The code htmlCurrentChar probably assumed that if there's an encoding
handler, it is guaranteed to produce valid UTF-8. This holds true in
general, but if the detected encoding was "UTF-8", the UTF8ToUTF8
encoding handler simply invoked memcpy without checking for invalid
UTF-8. This still must be fixed, preferably by not using this handler
at all.

Also leave a note that switching encodings twice seems impossible to
implement correctly. Add a check when handling UTF-8 encoding errors
in htmlCurrentChar to avoid this situation, even if encoders produce
invalid UTF-8.

Found by OSS-Fuzz.
2021-02-20 21:28:56 +01:00
Xiaoming Ni
649d02eaa4 encoding: fix memleak in xmlRegisterCharEncodingHandler()
The return type of xmlRegisterCharEncodingHandler() is void. The invoker
cannot determine whether xmlRegisterCharEncodingHandler() is executed
successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the
"handler" is not added to the array "handlers". As a result, the memory
of "handler" cannot be managed and released: memory leakage.

so add "xmlfree(handler)" to fix memory leakage on the failure branch of
xmlRegisterCharEncodingHandler().

Reported-by: wuqing <wuqing30@huawei.com>
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
2020-12-07 14:38:14 +01:00
Frederik Seiffert
b516ed189e Fix building with ICU 68.
ICU 68 no longer defines the TRUE macro.

Closes #204.
2020-11-19 18:10:32 +01:00
Nick Wellnhofer
1e41e4fa8e Fix return values and documentation in encoding.c
Make xmlEncInputChunk and xmlEncOutputChunk return 0 on success and
never a positive value.

Make xmlCharEncFirstLineInt, xmlCharEncFirstLineInt and
xmlCharEncOutFunc return the number of bytes written.
2020-07-06 15:06:13 +02:00