libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-04-09 14:50:07 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	1b6362ea44	io: Fix memory lifetime issue with input buffers xmlParserInputBufferCreateMem must make a copy of the buffer. This fixes a regression from 2.11 which could cause reads from freed memory depending on the use case. Undeprecate xmlParserInputBufferCreateStatic which can avoid copying the whole buffer.	2023-12-12 23:58:33 +01:00
Nick Wellnhofer	11a1839ddd	globals: Move remaining globals back to correct header files This undoes a lot of damage.	2023-09-20 22:06:49 +02:00
Nick Wellnhofer	4e1c13ebfd	debug: Remove debugging code This is barely useful these days and only clutters the code base.	2023-09-19 17:35:09 +02:00
Nick Wellnhofer	95e81a360c	parser: Decode all data in xmlCharEncInput Even with flush set to true, xmlCharEncInput didn't guarantee to decode all data. This complicated the push parser. Remove the flush flag and always decode all available data. Also fix ICU code where the flush flag has a different meaning. Always set flush to false and retry even with empty input buffers.	2023-08-08 15:21:31 +02:00
Nick Wellnhofer	834b8123ef	parser: Stream data when reading from memory Don't create a copy of the whole input buffer. Read the data chunk by chunk to save memory. Historically, it was probably envisioned to read data from memory without additional copying. This doesn't work reliably with the current design of the XML parser which requires a terminating null byte at the end of input buffers. This lead to xmlReadMemory interfaces, which expect pointer and size arguments, being changed to make a zero-terminated copy of the input buffer. Interfaces based on xmlReadDoc, which actually expect a zero-terminated string and would make zero-copy operation work, were then simplified to rely on xmlReadMemoryi, resulting in an unnecessary copy. To avoid copying (possibly gigabytes) of memory temporarily, we now stream in-memory input just like content read from files in a chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of 250 bytes). As a side effect, we also avoid another copy of the whole input when handling non-UTF-8 data which was made possible by some earlier commits. Interfaces expecting zero-terminated strings now make use of strnlen which unfortunately isn't part of the standard C library and only mandated since POSIX 2008.	2023-08-08 15:21:28 +02:00
Nick Wellnhofer	4ee0815514	encoding: Move rawconsumed accounting to xmlCharEncInput	2023-08-08 15:19:51 +02:00
Nick Wellnhofer	ec7be50662	parser: Rework encoding detection Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set when xmlSwitchEncoding is called. The parser can use the flag to reliably detect whether an encoding was already set via user override, BOM or other auto-detection. In this case, the encoding declaration won't be used to switch the encoding. Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding and ctxt->input->buf->encoder was used. Introduce private helper functions to switch encodings used by both the XML and HTML parser: - xmlDetectEncoding which skips over the BOM, allowing to remove the BOM checks from other encoding functions. - xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns about encoding mismatches. If users override the encoding, store the declared instead of the actual encoding in xmlDoc. In this case, the actual encoding is known and the raw value from the doc is more useful. Also use the input flags to store the ISO-8859-1 fallback state. Restrict the fallback to cases where no encoding was specified. (The fallback is only useful in recovery mode and these days broken UTF-8 is probably more likely than ISO-8859-1, so it might eventually be removed completely.) The 'charset' member of xmlParserCtxt is now unused. The 'encoding' member of xmlParserInput is now unused. The 'standalone' member of xmlParserInput is renamed to 'flags'. A new parser state XML_PARSER_XML_DECL is added for the push parser.	2023-08-08 15:19:46 +02:00
Nick Wellnhofer	b230861dbd	xmlIO: Remove some calls to xmlIOErr The xmlIOErr functions use the global error handler and should be avoided if possible.	2023-04-30 21:45:39 +02:00
Nick Wellnhofer	320f5084cd	parser: Improve handling of encoding and IO errors Make sure that xmlCharEncInput, xmlParserInputBufferPush and xmlParserInputBufferGrow set the correct error code in the xmlParserInputBuffer. Handle errors when calling these functions.	2023-04-30 21:31:54 +02:00
Nick Wellnhofer	97086fd76b	malloc-fail: Fix memory leak in xmlParserInputBufferCreateMem Found with libFuzzer, see #344.	2023-02-17 17:16:50 +01:00
Nick Wellnhofer	2355eac59e	malloc-fail: Fix null deref if growing input buffer fails Also add some error checks. Found with libFuzzer, see #344.	2023-01-24 11:32:15 +01:00
Nick Wellnhofer	2059df5358	buf: Deprecate static/immutable buffers	2022-11-20 21:16:03 +01:00
Nick Wellnhofer	249cee4b2a	io: Fix a few integer overflows in I/O statistics There are still many places where arithmetic on "consumed" stats isn't checked for overflow, affecting platforms with a 32-bit long type.	2022-11-20 21:16:03 +01:00
Nick Wellnhofer	1ef4938fd0	io: Rework xmlParserInputBufferGrow with encodings Read data directly into the "raw" buffer when converting encodings. Make sure not to grow memory input buffers.	2022-11-20 21:16:03 +01:00
Nick Wellnhofer	46cd7d224e	io: Remove xmlInputReadCallbackNop In some cases, for example when using encoders, the read callback was set to NULL, in other cases it was set to xmlInputReadCallbackNop. xmlGROW only tested for xmlInputReadCallbackNop, resulting in errors when parsing large encoded content from memory. Always use a NULL callback for memory buffers to avoid ambiguities. Fixes #262.	2022-11-20 21:12:18 +01:00
Nick Wellnhofer	22d879bf0a	io: Fix "buffer full" error with certain buffer sizes Remove a useless check in xmlParserInputBufferGrow that could be triggered after changing xmlBufAvail in c14cac8b. Fixes #438.	2022-11-13 15:21:22 +01:00
Nick Wellnhofer	5bffa33a12	Stop including sys/types.h	2022-09-02 18:33:36 +02:00
Nick Wellnhofer	2cac626976	Don't use sizeof(xmlChar) or sizeof(char)	2022-09-01 03:35:19 +02:00
Nick Wellnhofer	ad338ca737	Remove explicit integer casts Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.	2022-09-01 02:33:57 +02:00
Nick Wellnhofer	0f568c0b73	Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.	2022-08-26 02:11:56 +02:00
David Kilzer	c14cac8bba	xmlBufAvail() should return length without including a byte for NUL terminator * buf.c: (xmlBufAvail): - Return the number of bytes available in the buffer, but do not include a byte for the NUL terminator so that it is reserved. * encoding.c: (xmlCharEncFirstLineInput): (xmlCharEncInput): (xmlCharEncOutput): * xmlIO.c: (xmlOutputBufferWriteEscape): - Remove code that subtracts 1 from the return value of xmlBufAvail(). It was implemented inconsistently anyway.	2022-05-25 18:25:19 -07:00
Mehltretter Karl	c1632fbd0a	fix typo in comment	2022-05-06 10:58:58 +02:00
David Kilzer	21561e833a	Mark more static data as `const` Similar to 8f5710379, mark more static data structures with `const` keyword. Also fix placement of `const` in encoding.c. Original patch by Sarah Wilkin.	2022-04-07 12:01:23 -07:00
Joey Arhar	b7b29df9c2	Add windows includes to xmlIO.c xmlIO.c calls read() and getcwd() which need io.h and direct.h respectively when compiling on windows. Otherwise, a compiler error may be raised saying that read() and getcwd() were used implicitly. This was regressed recently, I'm guessing it was due to the changes to win32config.h in commit 84085a26	2022-03-31 01:11:06 +00:00
Nick Wellnhofer	d99ddd9bd5	Improve buffer allocation scheme In most places, we really need the double-it scheme to avoid quadratic behavior. The hybrid scheme still can cause many reallocations and the bounded scheme doesn't seem to provide meaningful protection in xmlreader.c.	2022-03-06 02:26:22 +01:00
Nick Wellnhofer	776d15d383	Don't check for standard C89 headers Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h	2022-03-02 00:43:54 +01:00
Nick Wellnhofer	b094e814fa	Remove broken Windows CE support	2022-03-01 00:02:59 +01:00
Nick Wellnhofer	655cf3f46f	Always fopen files with "rb" We never want translation of newlines when reading files, so it should be safe to always specify "rb". On sane platforms, the "b" flag is simply ignored.	2022-02-28 23:39:00 +01:00
Nick Wellnhofer	3f8655db97	Remove __DJGPP__ checks Drop broken support for DJGPP.	2022-02-28 23:22:50 +01:00
Nick Wellnhofer	2489c1d024	Remove useless __CYGWIN__ checks From what I can tell, some really early Cygwin versions from around 1998-2000 used to erroneously define _WIN32. This was eventually fixed, but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is unnecessary. Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether to use __declspec.	2022-02-28 22:58:35 +01:00
Nick Wellnhofer	c41bc10da3	Fix unused variable warnings with disabled features	2022-02-22 19:57:12 +01:00
Nick Wellnhofer	346c3a930c	Remove elfgcchack.h The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.	2022-02-20 21:49:04 +01:00
David King	d7f11fd066	Fix leak in __xmlOutputBufferCreateFilename Found by Coverity. https://bugzilla.redhat.com/show_bug.cgi?id=1938806	2022-01-16 14:26:14 +01:00
Nick Wellnhofer	dea91c97de	Fix buffering in xmlOutputBufferWrite Fix a regression introduced with commit a697ed1e which caused xmlOutputBufferWrite to flush internal buffers too late. Fixes #296.	2021-07-27 16:12:54 +02:00
Nick Wellnhofer	a697ed1e24	Fix return value of xmlCharEncOutput Commit 407b393d introduced a regression caused by xmlCharEncOutput returning 0 in case of success instead of the number of bytes written. Always use its return value for nbchars in xmlOutputBufferWrite. Fixes #166.	2020-06-15 15:23:38 +02:00
Nick Wellnhofer	20c60886e4	Fix typos Resolves #133.	2020-03-08 17:41:53 +01:00
Nick Wellnhofer	c2e09f445c	Add xmlPopOutputCallbacks Add function to pop a single set of output callbacks from the stack. This was only implemented for input callbacks before. Fixes #135.	2020-02-11 11:32:23 +01:00
Nick Wellnhofer	40e00bc517	Fix integer overflow when counting written bytes Check for integer overflow when updating the `written` member of struct xmlOutputBuffer in xmlIO.c. Closes #112. Resolves !54 and !55.	2019-10-14 17:06:20 +02:00
Jared Yanovich	2a350ee9b4	Large batch of typo fixes Closes #109.	2019-09-30 18:04:38 +02:00
Nick Wellnhofer	6705f4d28e	Remove executable bit from non-executable files	2019-09-16 15:48:59 +02:00
zhouzhongyuan	4f67dbb0a1	fix memory leak in xmlAllocOutputBuffer	2019-07-30 12:43:26 +02:00
Nick Wellnhofer	f824a4bd4d	Fix memory leak in xmlAllocOutputBufferInternal error path Thanks to Anish K Kurian for the report. Closes #60.	2019-05-20 13:38:22 +02:00
Nick Wellnhofer	407b393d80	Fix return value of xmlOutputBufferWrite When using memory buffers, the total size of the buffer was added again and again, potentially leading to an integer overflow. Found by OSS-Fuzz.	2019-05-15 13:01:52 +02:00
Nick Wellnhofer	7a1bd7f649	Revert "Change calls to xmlCharEncInput to set flush false" This reverts commit 6e6ae5daa6cd9640c9a83c1070896273e9b30d14 which broke decoding of larger documents with ICU. See https://bugs.chromium.org/p/chromium/issues/detail?id=820163	2018-03-17 00:03:24 +01:00
Joel Hockey	6e6ae5daa6	Change calls to xmlCharEncInput to set flush false when not final call. Having flush incorrectly set to true causes errors for ICU.	2018-01-08 19:57:53 +01:00
Nick Wellnhofer	cb5541c9f3	Fix libz and liblzma detection If libz or liblzma are detected with pkg-config, AC_CHECK_HEADERS must not be run because the correct CPPFLAGS aren't set. It is actually not required have separate checks for LIBXML_ZLIB_ENABLED and HAVE_ZLIB_H. Only check for LIBXML_ZLIB_ENABLED and remove HAVE_ZLIB_H macro. Fixes bug 764657, bug 787041.	2017-11-27 14:33:37 +01:00
Nick Wellnhofer	86615e43bb	Fix IO callback signatures	2017-11-09 17:47:47 +01:00
Vlad Tsyrklevich	28f52fe89d	Refactor name and type signature for xmlNop Update xmlNop's name to xmlInputReadCallbackNop and its type signature to match xmlInputReadCallback. Fixes bug 786134.	2017-11-09 13:43:08 +01:00
Nick Wellnhofer	5672397477	Simplify Windows IO functions Remove "native" non-Unicode functions which were only needed for pre-NT systems like Windows 95/98. Don't redefine `stat` but use `struct _stat` and `_stat()` instead.	2017-10-09 16:52:14 +02:00
Nick Wellnhofer	e3890546d7	Fix the Windows header mess Don't include windows.h and wsockcompat.h from config.h but only when needed. Don't define _WINSOCKAPI_ manually. This was apparently done to stop windows.h from including winsock.h which is a problem if winsock2.h wasn't included first. But on MinGW, this causes compiler warnings. Define WIN32_LEAN_AND_MEAN instead which has the same effect. Always use the compiler-defined _WIN32 macro instead of WIN32.	2017-10-09 14:35:40 +02:00

1 2 3 4 5 ...

268 Commits