libxml2

mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-12-31 17:17:37 +03:00

Author	SHA1	Message	Date
Nick Wellnhofer	1406b20fe9	encoding: Allocate default handlers statically	2022-11-24 19:21:01 +01:00
Nick Wellnhofer	2059df5358	buf: Deprecate static/immutable buffers	2022-11-20 21:16:03 +01:00
Nick Wellnhofer	ad338ca737	Remove explicit integer casts Remove explicit integer casts as final operation - in assignments - when passing arguments - when returning values Remove casts - to the same type - from certain range-bound values The main motivation is that these explicit casts don't change the result of operations and only render UBSan's implicit-conversion checks useless. Removing these casts allows UBSan to detect cases where truncation or sign-changes occur unexpectedly. Document some explicit casts as truncating and add a few missing ones.	2022-09-01 02:33:57 +02:00
Nick Wellnhofer	0f568c0b73	Consolidate private header files Private functions were previously declared - in header files in the root directory - in public headers guarded with IN_LIBXML - in libxml.h - redundantly in source files that used them. Consolidate all private header files in include/private.	2022-08-26 02:11:56 +02:00
David Kilzer	c14cac8bba	xmlBufAvail() should return length without including a byte for NUL terminator * buf.c: (xmlBufAvail): - Return the number of bytes available in the buffer, but do not include a byte for the NUL terminator so that it is reserved. * encoding.c: (xmlCharEncFirstLineInput): (xmlCharEncInput): (xmlCharEncOutput): * xmlIO.c: (xmlOutputBufferWriteEscape): - Remove code that subtracts 1 from the return value of xmlBufAvail(). It was implemented inconsistently anyway.	2022-05-25 18:25:19 -07:00
David Kilzer	21561e833a	Mark more static data as `const` Similar to `8f5710379`, mark more static data structures with `const` keyword. Also fix placement of `const` in encoding.c. Original patch by Sarah Wilkin.	2022-04-07 12:01:23 -07:00
Nick Wellnhofer	40483d0ce2	Deprecate module init and cleanup functions These functions shouldn't be part of the public API. Most init functions are only thread-safe when called from xmlInitParser. Global variables should only be cleaned up by calling xmlCleanupParser.	2022-03-06 15:59:43 +01:00
Nick Wellnhofer	f2072a8b2f	Fix memory leak in xmlFindCharEncodingHandler Fix memory leak in an unlikely error condition. Thanks to Wentao Liang for the report. Fixes #342.	2022-03-05 18:27:12 +01:00
Nick Wellnhofer	21ddad5284	Remove ICONV_CONST test We can simply cast the offending pointer to (void *).	2022-03-04 22:08:58 +01:00
Nick Wellnhofer	776d15d383	Don't check for standard C89 headers Don't check for - ctype.h - errno.h - float.h - limits.h - math.h - signal.h - stdarg.h - stdlib.h - string.h - time.h Stop including non-standard headers - malloc.h - strings.h	2022-03-02 00:43:54 +01:00
Nick Wellnhofer	b66ce0bba8	Don't include ICU headers in public headers There's no need to make these implementation details public.	2022-03-01 13:02:49 +01:00
Nick Wellnhofer	c41bc10da3	Fix unused variable warnings with disabled features	2022-02-22 19:57:12 +01:00
Nick Wellnhofer	346c3a930c	Remove elfgcchack.h The same optimization can be enabled with -fno-semantic-interposition since GCC 5. clang has always used this option by default.	2022-02-20 21:49:04 +01:00
Nick Wellnhofer	7abc6e6a24	Fix integer conversion warning in xmlIconvWrapper Use size_t for return value of iconv(3) to avoid an UBSan integer conversion warning.	2022-01-25 03:07:30 +01:00
Mohammad Razavi	eb4c1bf855	Fix random dropping of characters on dumping ASCII encoded XML Fix a bug in xmlCharEncOutput return value which will cause xmlNodeDumpOutput to drop characters randomly. xmlCharEncOutput returns zero if the length of the input buffer is zero but ignores the fact that it may already encoded the input buffer and the input's length is zero due to the fact that xmlEncOutputChunk returned -2 errors and underlying code tries to fix the error by encoding the input. xmlCharEncOutput is collecting the number of bytes written to the output buffer but is returning zero instead of the total number of bytes in this situation. This commit will fix this issue by returning the total number of bytes instead. So the xmlNodeDumpOutput will also continue writing and will not stop due to the fact that it mistakenly thinks the output buffer is not changed in that iteration. Fixes #314	2022-01-16 15:08:44 +01:00
David Kilzer	03bb929390	Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk This makes the logic in UTF16BEToUTF8() match UTF16LEToUTF8(). * encoding.c: (UTF16LEToUTF8): - Fix comment to describe what the code does. (UTF16BEToUTF8): - Fix undefined behavior which was applied to UTF16LEToUTF8() in `2f9382033e`. - Add bounds check to while() loop which was applied to UTF16LEToUTF8() in `be803967db`. - Do not return -2 when (in >= inend) to fix the bug. This was applied to UTF16LEToUTF8() in `496a1cf592`. - Inline (<< 8) statements to match UTF16LEToUTF8(). Add the following tests and results: test/text-4-byte-UTF-16-BE-offset.xml test/text-4-byte-UTF-16-BE.xml test/text-4-byte-UTF-16-LE-offset.xml test/text-4-byte-UTF-16-LE.xml	2022-01-16 14:07:17 +01:00
David King	b92b16f659	Remove unused variable in xmlCharEncOutFunc Fixes a compiler warning: encoding.c: In function 'xmlCharEncOutFunc__internal_alias': encoding.c:2632:9: warning: unused variable 'output' [-Wunused-variable] 2632 \| int output = 0; https://gitlab.gnome.org/GNOME/libxml2/-/issues/254	2021-05-23 11:55:32 +02:00
Nick Wellnhofer	dcb80b92da	Fix slow parsing of HTML with encoding errors Under certain circumstances, the HTML parser would try to guess and switch input encodings multiple times, leading to slow processing of documents with encoding errors. The repeated scanning of the input buffer when guessing encodings could even lead to quadratic behavior. The code htmlCurrentChar probably assumed that if there's an encoding handler, it is guaranteed to produce valid UTF-8. This holds true in general, but if the detected encoding was "UTF-8", the UTF8ToUTF8 encoding handler simply invoked memcpy without checking for invalid UTF-8. This still must be fixed, preferably by not using this handler at all. Also leave a note that switching encodings twice seems impossible to implement correctly. Add a check when handling UTF-8 encoding errors in htmlCurrentChar to avoid this situation, even if encoders produce invalid UTF-8. Found by OSS-Fuzz.	2021-02-20 21:28:56 +01:00
Xiaoming Ni	649d02eaa4	encoding: fix memleak in xmlRegisterCharEncodingHandler() The return type of xmlRegisterCharEncodingHandler() is void. The invoker cannot determine whether xmlRegisterCharEncodingHandler() is executed successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the "handler" is not added to the array "handlers". As a result, the memory of "handler" cannot be managed and released: memory leakage. so add "xmlfree(handler)" to fix memory leakage on the failure branch of xmlRegisterCharEncodingHandler(). Reported-by: wuqing <wuqing30@huawei.com> Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>	2020-12-07 14:38:14 +01:00
Frederik Seiffert	b516ed189e	Fix building with ICU 68. ICU 68 no longer defines the TRUE macro. Closes #204.	2020-11-19 18:10:32 +01:00
Nick Wellnhofer	1e41e4fa8e	Fix return values and documentation in encoding.c Make xmlEncInputChunk and xmlEncOutputChunk return 0 on success and never a positive value. Make xmlCharEncFirstLineInt, xmlCharEncFirstLineInt and xmlCharEncOutFunc return the number of bytes written.	2020-07-06 15:06:13 +02:00
Nick Wellnhofer	2f9382033e	Fix undefined behavior in UTF16LEToUTF8 Don't perform arithmetic on null pointer. Found with libFuzzer and UBSan.	2020-06-15 21:23:54 +02:00
Nick Wellnhofer	a697ed1e24	Fix return value of xmlCharEncOutput Commit `407b393d` introduced a regression caused by xmlCharEncOutput returning 0 in case of success instead of the number of bytes written. Always use its return value for nbchars in xmlOutputBufferWrite. Fixes #166.	2020-06-15 15:23:38 +02:00
Nick Wellnhofer	20c60886e4	Fix typos Resolves #133.	2020-03-08 17:41:53 +01:00
Jared Yanovich	2a350ee9b4	Large batch of typo fixes Closes #109.	2019-09-30 18:04:38 +02:00
Andrey Bienkowski	d2293cdbc8	Remove a misleading line from xmlCharEncOutput Closes: https://bugzilla.gnome.org/show_bug.cgi?id=793028 It seams this line was accidentally copied over from xmlCharEncOutFunc. In xmlCharEncOutput output is a pointer so incrementing it by ret can point it where it wasn't supposed to be pointing. Luckily the current implementation doesn't dereference the pointer after advancing it. Signed-off-by: Daniel Veillard <veillard@redhat.com>	2018-07-23 10:21:38 +08:00
Nick Wellnhofer	772c06487b	Fix unused parameter warning without ICU	2017-11-09 17:56:31 +01:00
Joel Hockey	0b19f236a2	Fixed ICU to set flush correctly and provide pivot buffer. By always setting flush=TRUE when doing multiple reads, ICU will not correctly handle truncated utf8 chars across read boundaries. The fix is to set flush=TRUE only on final read, and to provide a pivot buffer which is maintained by libxml between calls to ucnv_convertEx.	2017-11-04 15:25:31 +01:00
Nick Wellnhofer	e5107772ff	Fix pathological performance when outputting charrefs If a character can't be represented in the output encoding, it is converted to a character reference. This used to to replace the character in the input stream by calling xmlBufAddHead or xmlBufferAddHead. These functions shifted the entire input array around, leading to quadratic performance when converting a run of non-representable characters. This is most pronounced when dumping to memory. Output the charref directly instead. Found with libFuzzer.	2017-06-19 16:06:21 +02:00
Nick Wellnhofer	c9ccbd6a6d	Deduplicate code in encoding.c Introduce static functions xmlEncInputChunk and xmlEncOutputChunk that handle the internal/iconv/ICU branching.	2017-06-19 16:06:21 +02:00
David Kilzer	4472c3a5a5	Fix some format string warnings with possible format string vulnerability For https://bugzilla.gnome.org/show_bug.cgi?id=761029 Decorate every method in libxml2 with the appropriate LIBXML_ATTR_FORMAT(fmt,args) macro and add some cleanups following the reports.	2016-05-23 15:01:07 +08:00
Gaurav	080a22c5ea	Avoid a possibility of dangling encoding handler For https://bugzilla.gnome.org/show_bug.cgi?id=711149 In Function: int xmlCharEncCloseFunc(xmlCharEncodingHandler *handler) If the freed handler is any one of handlers[i] list, then it will make that hanldlers[i] as dangling. This may lead to crash issues at places where handlers is read.	2013-11-29 23:10:50 +08:00
Denis Pauk	e28c8a1ace	#705267 - add additional defines checks for support "./configure --with-minimum" https://bugzilla.gnome.org/show_bug.cgi?id=705267	2013-08-03 22:00:17 +08:00
Daniel Veillard	bf058dce13	Fix the flushing out of raw buffers on encoding conversions https://bugzilla.gnome.org/show_bug.cgi?id=692915 the new set of converting functions tried to limit the encoding conversion of the raw buffer to the consumption one to work in a more progressive fashion. Unfortunately this was bad for performances and led to errors on progressive parsing when a very large chunk was close to the end of the document. Fix the new internal function and switch back to the old way of converting. Fix another bug in the process.	2013-02-13 18:19:42 +08:00
Petr Sumbera	6f49c73b53	Try IBM-037 when looking for EBCDIC handlers http://en.wikipedia.org/wiki/EBCDIC_037 as it is another variat of EBCDIC	2012-12-12 15:41:30 +08:00
Daniel Veillard	f8e3db0445	Big space and tab cleanup Remove all space before tabs and space and tabs at end of lines.	2012-09-11 13:26:36 +08:00
Daniel Veillard	28cc42d068	Regenerating docs and API files Various cleanups * configure.in: force regeneration of APIs in my environment * buf.c buf.h enc.h encoding.c include/libxml/tree.h include/libxml/xmlerror.h save.h tree.c: various comment cleanups pointed by apibuild * doc/apibuild.py: added the 3 new internal headers in the excludes * doc/libxml2-api.xml doc/libxml2-refs.xml: regenerated the API * doc/symbols.xml: listing new entry points for 2.9.0 * doc/devhelp/*: regenerated	2012-08-10 10:00:18 +08:00
Daniel Veillard	18d0db2503	Adding new encoding function to deal with the new structures * encoding.c: adds xmlCharEncFirstLineInput, xmlCharEncInput and xmlCharEncOutput * enc.h: the functions are not made public but added to this new header	2012-07-23 14:24:26 +08:00
Timothy Elliott	689408bd86	Prevent an infinite loop when dumping a node with encoding problems When a node is dumped with a new encoding, we may encounter characters that are not supported in the new encoding. libxml2 handles this by replacing the character with character references, but in some encodings this can result in an infinite loop when the character references themselves contain unsupported characters. This fixes the infinite loop by undoing a character reference substitution when it cannot be inserted, and returning an encoder error. This bug was noticed when looking into an infinite loop bug report for the Ruby Nokogiri project. The original bug report, "nokogiri process hangs on call to inner_html" is here: https://github.com/tenderlove/nokogiri/issues/400	2012-05-08 22:03:22 +08:00
Daniel Veillard	69f04562f7	Fix an off by one error in encoding this off by one error doesn't seems to reproduce on linux but the error is real.	2011-08-19 11:05:04 +08:00
Giuseppe Iuculano	48f7dcb724	480323 add code to plug in ICU converters by default This is not configured in by default but after some serious massaging incorporate that patch from Chromium/Chrome.	2010-11-04 17:42:42 +01:00
Daniel Veillard	ad4f0a2dc8	630140 better fix for iso995x encoding error Changing semantic of xmlCharEncInFunc() wasn't the proper way to do this, better change UTF8ToISO8859x() appropriately	2010-11-03 20:40:46 +01:00
Daniel Veillard	1cc912ec7e	Various cleanups on encoding handling Done while chasing previous bug	2010-11-03 19:26:35 +01:00
Daniel Veillard	083caf5ec8	630140 fix iso995x encoding error https://bugzilla.gnome.org/show_bug.cgi?id=630140 Fix the bug, which happen when using the embedded converters and not iconv	2010-11-03 19:24:05 +01:00
Daniel Veillard	d44b936499	A few more safety cleanup raised by scan * SAX2.c encoding.c parser.c xmlschemas.c: a few more safety checks * relaxng.c: remove an unused intitialization	2009-09-07 12:15:08 +02:00
Daniel Veillard	76d364583e	Fixing assorted potential problems raised by scan * encoding.c parser.c relaxng.c runsuite.c tree.c xmlreader.c xmlschemas.c: nothing really serious but better safe than sorry	2009-09-07 11:19:33 +02:00
Daniel Veillard	7e385bd4e2	566012 autodetected encoding and encoding conflict * encoding.c parser.c parserInternals.c: when we autodetect an encoding but it's actually not completely compatible with the one declared great care must be taken to not convert more than just the first line. Led to some refactoring, more private functions and a bit of cleanup.	2009-08-26 11:38:49 +02:00
Martin Kögler	c78988acb7	566012 Incomplete EBCDIC parsing support * encoding.c: the iconv converter is sometimes only found as "EBCDIC-US"	2009-08-24 16:47:48 +02:00
Daniel Veillard	e83e93e715	make a new kind of buffer where shrinking and adding in head can avoid * include/libxml/tree.h tree.c: make a new kind of buffer where shrinking and adding in head can avoid reallocation or full buffer memmoves * encoding.c xmlIO.c: use the new kind of buffers for output buffers Daniel svn path=/trunk/; revision=3787	2008-08-30 12:52:26 +00:00
Daniel Veillard	f124539f7a	buffer may not be large enough to convert to UCS4, patch from Christian * encoding.c: buffer may not be large enough to convert to UCS4, patch from Christian Fruth , fixes #504015 Daniel svn path=/trunk/; revision=3727	2008-04-03 09:46:34 +00:00

1 2 3 4

153 Commits