1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-04-09 14:50:07 +03:00

268 Commits

Author SHA1 Message Date
Nick Wellnhofer
1b6362ea44 io: Fix memory lifetime issue with input buffers
xmlParserInputBufferCreateMem must make a copy of the buffer.

This fixes a regression from 2.11 which could cause reads from freed
memory depending on the use case.

Undeprecate xmlParserInputBufferCreateStatic which can avoid copying
the whole buffer.
2023-12-12 23:58:33 +01:00
Nick Wellnhofer
11a1839ddd globals: Move remaining globals back to correct header files
This undoes a lot of damage.
2023-09-20 22:06:49 +02:00
Nick Wellnhofer
4e1c13ebfd debug: Remove debugging code
This is barely useful these days and only clutters the code base.
2023-09-19 17:35:09 +02:00
Nick Wellnhofer
95e81a360c parser: Decode all data in xmlCharEncInput
Even with flush set to true, xmlCharEncInput didn't guarantee to decode
all data. This complicated the push parser.

Remove the flush flag and always decode all available data.

Also fix ICU code where the flush flag has a different meaning. Always
set flush to false and retry even with empty input buffers.
2023-08-08 15:21:31 +02:00
Nick Wellnhofer
834b8123ef parser: Stream data when reading from memory
Don't create a copy of the whole input buffer. Read the data chunk by
chunk to save memory.

Historically, it was probably envisioned to read data from memory
without additional copying. This doesn't work reliably with the current
design of the XML parser which requires a terminating null byte at the
end of input buffers. This lead to xmlReadMemory interfaces, which
expect pointer and size arguments, being changed to make a
zero-terminated copy of the input buffer. Interfaces based on
xmlReadDoc, which actually expect a zero-terminated string and
would make zero-copy operation work, were then simplified to rely on
xmlReadMemoryi, resulting in an unnecessary copy.

To avoid copying (possibly gigabytes) of memory temporarily, we now
stream in-memory input just like content read from files in a
chunk-by-chunk fashion (using a somewhat outdated INPUT_CHUNK size of
250 bytes). As a side effect, we also avoid another copy of the whole
input when handling non-UTF-8 data which was made possible by some
earlier commits.

Interfaces expecting zero-terminated strings now make use of strnlen
which unfortunately isn't part of the standard C library and only
mandated since POSIX 2008.
2023-08-08 15:21:28 +02:00
Nick Wellnhofer
4ee0815514 encoding: Move rawconsumed accounting to xmlCharEncInput 2023-08-08 15:19:51 +02:00
Nick Wellnhofer
ec7be50662 parser: Rework encoding detection
Introduce XML_INPUT_HAS_ENCODING flag for xmlParserInput which is set
when xmlSwitchEncoding is called. The parser can use the flag to
reliably detect whether an encoding was already set via user override,
BOM or other auto-detection. In this case, the encoding declaration
won't be used to switch the encoding.

Before, an inscrutable mix of ctxt->charset, ctxt->input->encoding
and ctxt->input->buf->encoder was used.

Introduce private helper functions to switch encodings used by both the
XML and HTML parser:

- xmlDetectEncoding which skips over the BOM, allowing to remove the
  BOM checks from other encoding functions.
- xmlSetDeclaredEncoding, replacing htmlCheckEncodingDirect, which warns
  about encoding mismatches.

If users override the encoding, store the declared instead of the actual
encoding in xmlDoc. In this case, the actual encoding is known and the
raw value from the doc is more useful.

Also use the input flags to store the ISO-8859-1 fallback state.
Restrict the fallback to cases where no encoding was specified. (The
fallback is only useful in recovery mode and these days broken UTF-8 is
probably more likely than ISO-8859-1, so it might eventually be removed
completely.)

The 'charset' member of xmlParserCtxt is now unused. The 'encoding'
member of xmlParserInput is now unused.

The 'standalone' member of xmlParserInput is renamed to 'flags'.

A new parser state XML_PARSER_XML_DECL is added for the push parser.
2023-08-08 15:19:46 +02:00
Nick Wellnhofer
b230861dbd xmlIO: Remove some calls to xmlIOErr
The xmlIOErr functions use the global error handler and should be
avoided if possible.
2023-04-30 21:45:39 +02:00
Nick Wellnhofer
320f5084cd parser: Improve handling of encoding and IO errors
Make sure that xmlCharEncInput, xmlParserInputBufferPush and
xmlParserInputBufferGrow set the correct error code in the
xmlParserInputBuffer. Handle errors when calling these functions.
2023-04-30 21:31:54 +02:00
Nick Wellnhofer
97086fd76b malloc-fail: Fix memory leak in xmlParserInputBufferCreateMem
Found with libFuzzer, see #344.
2023-02-17 17:16:50 +01:00
Nick Wellnhofer
2355eac59e malloc-fail: Fix null deref if growing input buffer fails
Also add some error checks.

Found with libFuzzer, see #344.
2023-01-24 11:32:15 +01:00
Nick Wellnhofer
2059df5358 buf: Deprecate static/immutable buffers 2022-11-20 21:16:03 +01:00
Nick Wellnhofer
249cee4b2a io: Fix a few integer overflows in I/O statistics
There are still many places where arithmetic on "consumed" stats isn't
checked for overflow, affecting platforms with a 32-bit long type.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
1ef4938fd0 io: Rework xmlParserInputBufferGrow with encodings
Read data directly into the "raw" buffer when converting encodings.
Make sure not to grow memory input buffers.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
46cd7d224e io: Remove xmlInputReadCallbackNop
In some cases, for example when using encoders, the read callback was
set to NULL, in other cases it was set to xmlInputReadCallbackNop.
xmlGROW only tested for xmlInputReadCallbackNop, resulting in errors
when parsing large encoded content from memory.

Always use a NULL callback for memory buffers to avoid ambiguities.

Fixes #262.
2022-11-20 21:12:18 +01:00
Nick Wellnhofer
22d879bf0a io: Fix "buffer full" error with certain buffer sizes
Remove a useless check in xmlParserInputBufferGrow that could be
triggered after changing xmlBufAvail in c14cac8b.

Fixes #438.
2022-11-13 15:21:22 +01:00
Nick Wellnhofer
5bffa33a12 Stop including sys/types.h 2022-09-02 18:33:36 +02:00
Nick Wellnhofer
2cac626976 Don't use sizeof(xmlChar) or sizeof(char) 2022-09-01 03:35:19 +02:00
Nick Wellnhofer
ad338ca737 Remove explicit integer casts
Remove explicit integer casts as final operation

- in assignments
- when passing arguments
- when returning values

Remove casts

- to the same type
- from certain range-bound values

The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.

Document some explicit casts as truncating and add a few missing ones.
2022-09-01 02:33:57 +02:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
David Kilzer
c14cac8bba xmlBufAvail() should return length without including a byte for NUL terminator
* buf.c:
(xmlBufAvail):
- Return the number of bytes available in the buffer, but do not
  include a byte for the NUL terminator so that it is reserved.

* encoding.c:
(xmlCharEncFirstLineInput):
(xmlCharEncInput):
(xmlCharEncOutput):
* xmlIO.c:
(xmlOutputBufferWriteEscape):
- Remove code that subtracts 1 from the return value of
  xmlBufAvail().  It was implemented inconsistently anyway.
2022-05-25 18:25:19 -07:00
Mehltretter Karl
c1632fbd0a fix typo in comment 2022-05-06 10:58:58 +02:00
David Kilzer
21561e833a Mark more static data as const
Similar to 8f5710379, mark more static data structures with
`const` keyword.

Also fix placement of `const` in encoding.c.

Original patch by Sarah Wilkin.
2022-04-07 12:01:23 -07:00
Joey Arhar
b7b29df9c2 Add windows includes to xmlIO.c
xmlIO.c calls read() and getcwd() which need io.h and direct.h
respectively when compiling on windows. Otherwise, a compiler error may
be raised saying that read() and getcwd() were used implicitly.

This was regressed recently, I'm guessing it was due to the changes to
win32config.h in commit 84085a26
2022-03-31 01:11:06 +00:00
Nick Wellnhofer
d99ddd9bd5 Improve buffer allocation scheme
In most places, we really need the double-it scheme to avoid quadratic
behavior. The hybrid scheme still can cause many reallocations and the
bounded scheme doesn't seem to provide meaningful protection in
xmlreader.c.
2022-03-06 02:26:22 +01:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
b094e814fa Remove broken Windows CE support 2022-03-01 00:02:59 +01:00
Nick Wellnhofer
655cf3f46f Always fopen files with "rb"
We never want translation of newlines when reading files, so it should
be safe to always specify "rb". On sane platforms, the "b" flag is
simply ignored.
2022-02-28 23:39:00 +01:00
Nick Wellnhofer
3f8655db97 Remove __DJGPP__ checks
Drop broken support for DJGPP.
2022-02-28 23:22:50 +01:00
Nick Wellnhofer
2489c1d024 Remove useless __CYGWIN__ checks
From what I can tell, some really early Cygwin versions from around
1998-2000 used to erroneously define _WIN32. This was eventually fixed,
but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is
unnecessary.

Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether
to use __declspec.
2022-02-28 22:58:35 +01:00
Nick Wellnhofer
c41bc10da3 Fix unused variable warnings with disabled features 2022-02-22 19:57:12 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
David King
d7f11fd066 Fix leak in __xmlOutputBufferCreateFilename
Found by Coverity.

https://bugzilla.redhat.com/show_bug.cgi?id=1938806
2022-01-16 14:26:14 +01:00
Nick Wellnhofer
dea91c97de Fix buffering in xmlOutputBufferWrite
Fix a regression introduced with commit a697ed1e which caused
xmlOutputBufferWrite to flush internal buffers too late.

Fixes #296.
2021-07-27 16:12:54 +02:00
Nick Wellnhofer
a697ed1e24 Fix return value of xmlCharEncOutput
Commit 407b393d introduced a regression caused by xmlCharEncOutput
returning 0 in case of success instead of the number of bytes written.
Always use its return value for nbchars in xmlOutputBufferWrite.

Fixes #166.
2020-06-15 15:23:38 +02:00
Nick Wellnhofer
20c60886e4 Fix typos
Resolves #133.
2020-03-08 17:41:53 +01:00
Nick Wellnhofer
c2e09f445c Add xmlPopOutputCallbacks
Add function to pop a single set of output callbacks from the stack.
This was only implemented for input callbacks before.

Fixes #135.
2020-02-11 11:32:23 +01:00
Nick Wellnhofer
40e00bc517 Fix integer overflow when counting written bytes
Check for integer overflow when updating the `written` member of
struct xmlOutputBuffer in xmlIO.c.

Closes #112. Resolves !54 and !55.
2019-10-14 17:06:20 +02:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
6705f4d28e Remove executable bit from non-executable files 2019-09-16 15:48:59 +02:00
zhouzhongyuan
4f67dbb0a1 fix memory leak in xmlAllocOutputBuffer 2019-07-30 12:43:26 +02:00
Nick Wellnhofer
f824a4bd4d Fix memory leak in xmlAllocOutputBufferInternal error path
Thanks to Anish K Kurian for the report. Closes #60.
2019-05-20 13:38:22 +02:00
Nick Wellnhofer
407b393d80 Fix return value of xmlOutputBufferWrite
When using memory buffers, the total size of the buffer was added
again and again, potentially leading to an integer overflow.

Found by OSS-Fuzz.
2019-05-15 13:01:52 +02:00
Nick Wellnhofer
7a1bd7f649 Revert "Change calls to xmlCharEncInput to set flush false"
This reverts commit 6e6ae5daa6cd9640c9a83c1070896273e9b30d14 which
broke decoding of larger documents with ICU.

See https://bugs.chromium.org/p/chromium/issues/detail?id=820163
2018-03-17 00:03:24 +01:00
Joel Hockey
6e6ae5daa6 Change calls to xmlCharEncInput to set flush false when not final call. Having flush incorrectly set to true causes errors for ICU. 2018-01-08 19:57:53 +01:00
Nick Wellnhofer
cb5541c9f3 Fix libz and liblzma detection
If libz or liblzma are detected with pkg-config, AC_CHECK_HEADERS must
not be run because the correct CPPFLAGS aren't set. It is actually not
required have separate checks for LIBXML_ZLIB_ENABLED and HAVE_ZLIB_H.
Only check for LIBXML_ZLIB_ENABLED and remove HAVE_ZLIB_H macro.

Fixes bug 764657, bug 787041.
2017-11-27 14:33:37 +01:00
Nick Wellnhofer
86615e43bb Fix IO callback signatures 2017-11-09 17:47:47 +01:00
Vlad Tsyrklevich
28f52fe89d Refactor name and type signature for xmlNop
Update xmlNop's name to xmlInputReadCallbackNop and its type signature
to match xmlInputReadCallback.

Fixes bug 786134.
2017-11-09 13:43:08 +01:00
Nick Wellnhofer
5672397477 Simplify Windows IO functions
Remove "native" non-Unicode functions which were only needed for
pre-NT systems like Windows 95/98.

Don't redefine `stat` but use `struct _stat` and `_stat()` instead.
2017-10-09 16:52:14 +02:00
Nick Wellnhofer
e3890546d7 Fix the Windows header mess
Don't include windows.h and wsockcompat.h from config.h but only when
needed.

Don't define _WINSOCKAPI_ manually. This was apparently done to stop
windows.h from including winsock.h which is a problem if winsock2.h
wasn't included first. But on MinGW, this causes compiler warnings.
Define WIN32_LEAN_AND_MEAN instead which has the same effect.

Always use the compiler-defined _WIN32 macro instead of WIN32.
2017-10-09 14:35:40 +02:00