libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-12-25 23:21:26 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	63ce5f9aed	Make some globals const	2024-04-28 17:53:39 +02:00
Nick Wellnhofer	072facc49e	encoding: Don't shrink input too early in xmlCharEncOutput Some exotic encodings like ISO646-FR don't support '#' characters, so encoding a character reference can actually fail. Don't skip the offending input in this case so the error will be reported on the next call.	2024-03-18 15:14:43 +01:00
Nick Wellnhofer	0821efc8ee	encoding: Check whether encoding handlers support input/output The "HTML" encoding handler doesn't support input which could lead to a wrong error report.	2024-01-02 19:48:23 +01:00
Nick Wellnhofer	023aecc474	encoding: Support ASCII in xmlLookupCharEncodingHandler Return our built-in ASCII handler. This was never implemented and triggered the new and stricter error checks.	2023-12-13 23:58:45 +01:00
Nick Wellnhofer	bd5ad0308d	encoding: Report malloc failures Introduce new API functions that return a separate error code if a memory allocation fails. - xmlOpenCharEncodingHandler - xmlLookupCharEncodingHandler Fix a few places where malloc failures weren't reported.	2023-12-11 22:05:47 +01:00
Nick Wellnhofer	89d19534de	encoding: Fix decoding of large chunks After `95e81a36`, we must support XML_ENC_ERR_SPACE when using built-in encoding handlers. Should fix #610.	2023-10-28 03:14:13 +02:00
Nick Wellnhofer	1734d27dca	encoding: Suppress -Wcast-align warnings	2023-10-02 15:04:18 +02:00
Nick Wellnhofer	0533daf5d2	encoding: Fix infinite loop in xmlCharEncInput Short-lived regression from `95e81a36`.	2023-09-29 12:43:46 +02:00
Nick Wellnhofer	8c084ebdc7	doc: Make apibuild.py happy	2023-09-21 22:57:33 +02:00
Nick Wellnhofer	699299cae3	globals: Stop including globals.h	2023-09-20 22:07:40 +02:00
Nick Wellnhofer	7909ff08e2	include: Remove unnecessary includes - Don't include tree.h from encoding.h - Don't include parser.h from xmlIO.h	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	507f11edf0	encoding: Remove debugging code	2023-08-16 19:50:36 +02:00
Nick Wellnhofer	95e81a360c	parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.	2023-08-08 15:21:31 +02:00
Nick Wellnhofer	4ee0815514	encoding: Move rawconsumed accounting to xmlCharEncInput	2023-08-08 15:19:51 +02:00
Nick Wellnhofer	b236b7a588	parser: Halt parser when growing buffer results in OOM Fix short-lived regression from previous commit. It might be safer to make xmlBufSetInputBaseCur use the original buffer even in case of errors. Found by OSS-Fuzz.	2023-06-08 21:59:20 +02:00
Nick Wellnhofer	db21cd5db9	malloc-fail: Handle malloc failures in xmlAddEncodingAlias Avoid memory errors if an allocation fails. See #344. Fixes #553.	2023-06-06 14:25:30 +02:00
Nick Wellnhofer	2f12e3a938	encoding: Stop calling xmlEncodingErr This invokes the global error handler which should be avoided.	2023-04-30 21:45:39 +02:00
Nick Wellnhofer	320f5084cd	parser: Improve handling of encoding and IO errors Make sure that xmlCharEncInput, xmlParserInputBufferPush and xmlParserInputBufferGrow set the correct error code in the xmlParserInputBuffer. Handle errors when calling these functions.	2023-04-30 21:31:54 +02:00
Nick Wellnhofer	3ff6abbf58	encoding: Rework error codes Use an enum instead of magic numbers. Fix a few error codes. Simplify handling of "space" and "partial" errors. See #506.	2023-04-30 16:43:29 +02:00
Nick Wellnhofer	33fb297b36	encoding: Fix compiler warning in ICU build	2023-04-17 14:59:47 +02:00
Nick Wellnhofer	a6b9e55a9e	encoding: Fix error code in asciiToUTF8 Use correct error code when invalid ASCII bytes are encountered. Found by OSS-Fuzz.	2023-03-26 15:42:02 +02:00
Nick Wellnhofer	98840d40da	parser: Rework EBCDIC code page detection To detect EBCDIC code pages, we used to switch the encoding twice and had to be very careful not to decode data after the XML declaration before the second switch. This relied on a hard-coded expected size of the XML declaration and was complicated and unreliable. Now we convert the first 200 bytes to EBCDIC-US and parse the encoding declaration manually.	2023-03-21 21:35:15 +01:00
Nick Wellnhofer	1c5e1fc194	malloc-fail: Check for malloc failure in xmlFindCharEncodingHandler Don't return encoding handlers with a NULL name. Found with libFuzzer, see #344.	2023-02-17 17:16:50 +01:00
Nick Wellnhofer	d18f9c1102	malloc-fail: Fix leak of xmlCharEncodingHandler Also free handler if its name is NULL. Found with libFuzzer, see #344.	2023-02-17 17:16:50 +01:00
Nick Wellnhofer	3cc900f098	encoding: Cast toupper argument to unsigned char Fixes undefined behavior. Also cast return value explicitly to fix implicit-integer-sign-change checks.	2023-02-17 17:16:50 +01:00
Nick Wellnhofer	2355eac59e	malloc-fail: Fix null deref if growing input buffer fails Also add some error checks. Found with libFuzzer, see #344.	2023-01-24 11:32:15 +01:00
Nick Wellnhofer	0f54af7494	encoding.c: Fix for documentation generator Top-level macro invocations throw off the documentation parser.	2022-12-08 18:40:58 +01:00
Nick Wellnhofer	53ab38408d	encoding: Make init function private	2022-11-27 02:11:07 +01:00
Nick Wellnhofer	3e9d5e4f7f	encoding: Remove unused variable xmlDefaultCharEncodingHandler	2022-11-27 02:11:07 +01:00
Nick Wellnhofer	1406b20fe9	encoding: Allocate default handlers statically	2022-11-24 19:21:01 +01:00
Nick Wellnhofer	2059df5358	buf: Deprecate static/immutable buffers	2022-11-20 21:16:03 +01:00
Nick Wellnhofer	ad338ca737	Remove explicit integer casts Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.	2022-09-01 02:33:57 +02:00
Nick Wellnhofer	0f568c0b73	Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.	2022-08-26 02:11:56 +02:00
David Kilzer	c14cac8bba	xmlBufAvail() should return length without including a byte for NUL terminator * buf.c: (xmlBufAvail): - Return the number of bytes available in the buffer, but do not include a byte for the NUL terminator so that it is reserved. * encoding.c: (xmlCharEncFirstLineInput): (xmlCharEncInput): (xmlCharEncOutput): * xmlIO.c: (xmlOutputBufferWriteEscape): - Remove code that subtracts 1 from the return value of xmlBufAvail(). It was implemented inconsistently anyway.	2022-05-25 18:25:19 -07:00
David Kilzer	21561e833a	Mark more static data as `const` Similar to `8f5710379`, mark more static data structures with `const` keyword. Also fix placement of `const` in encoding.c. Original patch by Sarah Wilkin.	2022-04-07 12:01:23 -07:00
Nick Wellnhofer	40483d0ce2	Deprecate module init and cleanup functions These functions shouldn't be part of the public API. Most init functions are only thread-safe when called from xmlInitParser. Global variables should only be cleaned up by calling xmlCleanupParser.	2022-03-06 15:59:43 +01:00
Nick Wellnhofer	f2072a8b2f	Fix memory leak in xmlFindCharEncodingHandler Fix memory leak in an unlikely error condition. Thanks to Wentao Liang for the report. Fixes #342.	2022-03-05 18:27:12 +01:00
Nick Wellnhofer	21ddad5284	Remove ICONV_CONST test We can simply cast the offending pointer to (void *).	2022-03-04 22:08:58 +01:00
Nick Wellnhofer	776d15d383	Don't check for standard C89 headers Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h	2022-03-02 00:43:54 +01:00
Nick Wellnhofer	b66ce0bba8	Don't include ICU headers in public headers There's no need to make these implementation details public.	2022-03-01 13:02:49 +01:00
Nick Wellnhofer	c41bc10da3	Fix unused variable warnings with disabled features	2022-02-22 19:57:12 +01:00
Nick Wellnhofer	346c3a930c	Remove elfgcchack.h The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.	2022-02-20 21:49:04 +01:00
Nick Wellnhofer	7abc6e6a24	Fix integer conversion warning in xmlIconvWrapper Use size_t for return value of iconv(3) to avoid an UBSan integer conversion warning.	2022-01-25 03:07:30 +01:00
Mohammad Razavi	eb4c1bf855	Fix random dropping of characters on dumping ASCII encoded XML Fix a bug in xmlCharEncOutput return value which will cause xmlNodeDumpOutput to drop characters randomly. xmlCharEncOutput returns zero if the length of the input buffer is zero but ignores the fact that it may already encoded the input buffer and the input's length is zero due to the fact that xmlEncOutputChunk returned -2 errors and underlying code tries to fix the error by encoding the input. xmlCharEncOutput is collecting the number of bytes written to the output buffer but is returning zero instead of the total number of bytes in this situation. This commit will fix this issue by returning the total number of bytes instead. So the xmlNodeDumpOutput will also continue writing and will not stop due to the fact that it mistakenly thinks the output buffer is not changed in that iteration. Fixes #314	2022-01-16 15:08:44 +01:00
David Kilzer	03bb929390	Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8(). * encoding.c: (UTF16LEToUTF8): - Fix comment to describe what the code does. (UTF16BEToUTF8): - Fix undefined behavior which was applied to UTF16LEToUTF8() in `2f9382033e`. - Add bounds check to while() loop which was applied to UTF16LEToUTF8() in `be803967db`. - Do not return -2 when (in >= inend) to fix the bug. This was applied to UTF16LEToUTF8() in `496a1cf592`. - Inline (<< 8) statements to match UTF16LEToUTF8(). Add the following tests and results: test/text-4-byte-UTF-16-BE-offset.xml test/text-4-byte-UTF-16-BE.xml test/text-4-byte-UTF-16-LE-offset.xml test/text-4-byte-UTF-16-LE.xml	2022-01-16 14:07:17 +01:00
David King	b92b16f659	Remove unused variable in xmlCharEncOutFunc Fixes a compiler warning: encoding.c: In function 'xmlCharEncOutFunc__internal_alias': encoding.c:2632:9: warning: unused variable 'output' [-Wunused-variable] 2632 \| int output = 0; https://gitlab.gnome.org/GNOME/libxml2/-/issues/254	2021-05-23 11:55:32 +02:00
Nick Wellnhofer	dcb80b92da	Fix slow parsing of HTML with encoding errors Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.	2021-02-20 21:28:56 +01:00
Xiaoming Ni	649d02eaa4	encoding: fix memleak in xmlRegisterCharEncodingHandler() The return type of xmlRegisterCharEncodingHandler() is void. The invoker cannot determine whether xmlRegisterCharEncodingHandler() is executed successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the "handler" is not added to the array "handlers". As a result, the memory of "handler" cannot be managed and released: memory leakage. so add "xmlfree(handler)" to fix memory leakage on the failure branch of xmlRegisterCharEncodingHandler(). Reported-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>	2020-12-07 14:38:14 +01:00
Frederik Seiffert	b516ed189e	Fix building with ICU 68. ICU 68 no longer defines the TRUE macro. Closes #204.	2020-11-19 18:10:32 +01:00
Nick Wellnhofer	1e41e4fa8e	Fix return values and documentation in encoding.c Make xmlEncInputChunk and xmlEncOutputChunk return 0 on success and never a positive value. Make xmlCharEncFirstLineInt, xmlCharEncFirstLineInt and xmlCharEncOutFunc return the number of bytes written.	2020-07-06 15:06:13 +02:00

1 2 3 4

182 Commits