1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-04-09 14:50:07 +03:00

6214 Commits

Author SHA1 Message Date
Nick Wellnhofer
8afd321abd parser: Missing checks for disableSAX 2023-10-06 12:28:59 +02:00
Nick Wellnhofer
6337a14a6b tests: Handle entities in SAX tests 2023-10-06 12:28:59 +02:00
Nick Wellnhofer
713ded60ad entities: Make xmlFreeEntity public 2023-10-06 10:47:07 +02:00
Nick Wellnhofer
97e99f4112 parser: Acknowledge that entities with namespaces are broken
Entities which reference out-of-scope namespace have always been broken.
xmlParseBalancedChunkMemoryInternal tried to reuse the namespaces
currently in scope but these namespaces were ignored by the SAX handler.
Besides, there could be different namespaces in scope when expanding the
entity again. For example:

    <!DOCTYPE doc [
      <!ENTITY ent "<ns:elem/>">
    ]>
    <doc>
      <decl1 xmlns:ns="urn:ns1">
        &ent;
      </decl1>
      <decl2 xmlns:ns="urn:ns2">
        &ent;
      </decl2>
    </doc>

Add some comments outlining possible solutions to this problem.

For now, we stop copying namespaces to the temporary parser context
in xmlParseBalancedChunkMemoryInternal. This has never really worked
and the recent changes contained a partial fix which uncovered other
problems like a use-after-free with the XML Reader interface, found
by OSS-Fuzz.
2023-10-05 17:41:46 +02:00
Nick Wellnhofer
b8e03e13ed examples: Don't use sprintf
Avoids warnings on macOS.
2023-10-02 15:07:55 +02:00
Nick Wellnhofer
1734d27dca encoding: Suppress -Wcast-align warnings 2023-10-02 15:04:18 +02:00
Nick Wellnhofer
71aae4e98b dict: Compare strings with strncmp
Using memcmp can result in OOB reads.

Short-lived regression found by OSS-Fuzz.
2023-10-02 14:52:40 +02:00
Nick Wellnhofer
eb69c1d39d parser: Fix initialization of namespace data
Move initialization to xmlInitSAXParserCtxt. Also add missing XML_HIDDEN
to xmlParserNsFree.

Fixes #597.
2023-10-02 12:33:29 +02:00
Nick Wellnhofer
fc49679316 parser: Fix error handling in xmlParseQNameHashed
Short-lived regression found by OSS-Fuzz.
2023-10-02 12:05:36 +02:00
Nick Wellnhofer
6dd87f5eef malloc-fail: Fix memory leak in xmlParseBalancedChunkMemoryInternal
Short-lived regression found by OSS-Fuzz.
2023-09-30 17:11:25 +02:00
Nick Wellnhofer
f0a703dac8 dict: Fix null-deref with empty subdict
Short lived regression found by OSS-Fuzz.
2023-09-30 17:05:47 +02:00
Nick Wellnhofer
61e29b6949 malloc-fail: Grow hash tables before making allocations
Fix short-lived memory leak found by OSS-Fuzz.
2023-09-30 17:02:46 +02:00
Nick Wellnhofer
80a0580f23 xinclude: Expand comment about fuzz timeouts 2023-09-30 15:47:46 +02:00
Nick Wellnhofer
fa48187304 fuzz: Disable XML_PARSE_SAX1 option in xml fuzzer
There a no plans to fix quadratic behavior in the legacy SAX1 interface.
2023-09-30 14:45:53 +02:00
Nick Wellnhofer
5c150accba doc: Add notes about runtest to MAINTAINERS.md 2023-09-29 16:07:45 +02:00
Nick Wellnhofer
06e2f3a46e legacy: Add private declarations for stubs
Required after 8c084ebd.
2023-09-29 13:19:37 +02:00
Nick Wellnhofer
0533daf5d2 encoding: Fix infinite loop in xmlCharEncInput
Short-lived regression from 95e81a36.
2023-09-29 12:43:46 +02:00
Nick Wellnhofer
e0dd330b8f parser: Use hash tables to avoid quadratic behavior
Use a hash table to lookup namespaces by prefix. The hash table stores
an index into the namespace table. Auxiliary data for namespaces is
stored in a separate array along the main namespace table.

Use a hash table to verify attribute uniqueness. The hash table stores
an index into the attribute table.

Reuse hash value from the dictionary to avoid computing them twice.

See #346.
2023-09-29 12:43:22 +02:00
Nick Wellnhofer
e48f3d8e0a tests: Add more tests for redefined attributes 2023-09-29 12:43:08 +02:00
Nick Wellnhofer
a873191cd2 parser: Introduce xmlParseQNameHashed 2023-09-29 12:43:08 +02:00
Nick Wellnhofer
cb927e8519 parser: Don't skip CR in xmlCurrentChar
Skip over carriage returns later in xmlNextChar.
2023-09-29 12:43:08 +02:00
Nick Wellnhofer
19161bab15 dict: Internal API to look up hash values 2023-09-29 12:43:08 +02:00
Nick Wellnhofer
d147f5644e dict: Rewrite dictionary hash table code
Rewrite the dictionary hash table to use open addressing with Robin Hood
probing. See previous commit.
2023-09-29 12:41:37 +02:00
Nick Wellnhofer
4a513d5667 hash: Rewrite hash table code
This is a complete rewrite of the code in hash.c

Move from a chained hash table implementation to open addressing with
Robin Hood probing. This allows to increase the maximum fill factor and
further reduce the growth factor, saving considerable amounts of memory
without sacrificing performance.

To make this work, hash values are now cached in the table entry
also avoiding many key comparisons.

Tables are created lazily with a smaller minimum size.

Insertion functions now report an error if growing the table resulted in
a memory allocation failure.

Some string comparisons were optimized to call directly into libc
instead of using the xmlstring API.

The length of inserted keys is computed along with the hash improving
allocation performance.

Bounds checking was made more robust.

In dictionary-based mode, unneeded interning of strings is avoided.
2023-09-29 02:25:57 +02:00
Nick Wellnhofer
4f221a7748 hash: Add hash table tests
Make sure to properly test removal from hash tables.
2023-09-29 00:15:40 +02:00
Nick Wellnhofer
1425d8f67b dict: Separate RNG code 2023-09-29 00:15:40 +02:00
Nick Wellnhofer
42a0bc6d96 tests: Add ATTRIBUTE_NO_SANITIZE_INTEGER macro 2023-09-29 00:15:40 +02:00
Nick Wellnhofer
845bd99f8b string: Fix UTF-8 validation in xmlGetUTF8Char 2023-09-29 00:15:40 +02:00
Nick Wellnhofer
3e7673bc2d malloc-fail: Report malloc failure in xmlFARegExec 2023-09-29 00:15:40 +02:00
Nick Wellnhofer
b31813e60c include: Add more missing stdio.h includes 2023-09-28 15:34:08 +02:00
Nick Wellnhofer
b8961a75e9 parser: Fix reinitialization 2023-09-27 17:24:46 +02:00
James Le Cuirot
c7ff438b83
cmake: Only use pkg-config for .pc files, not for building binaries
Using `pkg_check_modules(FOO IMPORTED_TARGET foo)` with
`target_link_libraries()` leads to `INTERFACE_LINK_LIBRARIES` in the
resulting export file having `\$<LINK_ONLY:PkgConfig::FOO>` rather than
the currently expected `\$<LINK_ONLY:FOO::FOO>`, leading to breakage.
This can be worked around like so:

    target_link_libraries(UseFoo
      PUBLIC "$<BUILD_INTERFACE:PkgConfig::FOO>"
      INTERFACE "$<INSTALL_INTERFACE:FOO::FOO>"
    )

However, following some discussion, it is preferable to primarily use
find modules as before and only use `pkg_check_modules` for correctly
populating the .pc file.

Also move `find_package()` calls earlier so that builds fail faster when
dependencies are missing.
2023-09-23 16:48:57 +01:00
James Le Cuirot
9d53452206
cmake: Check whether static linking dependencies found in config files
If they were required when building libxml2 then they will also be
required when statically linking against it. Failing to find them will
just lead to undefined references later so detect this early.
2023-09-23 16:48:54 +01:00
James Le Cuirot
8617d8aa10
cmake: Find threads dep early as it may be needed for later checks 2023-09-23 16:48:51 +01:00
Nick Wellnhofer
b7d56ef7f1 malloc-fail: Report malloc failure in xmlRegEpxFromParse
Also check whether malloc failures are reported when fuzzing.
2023-09-22 19:53:11 +02:00
Nick Wellnhofer
d94f0b0ba2 doc: Update MAINTAINERS and NEWS 2023-09-22 19:01:11 +02:00
Nick Wellnhofer
84e1ffc813 doc: Don't document internal macros in xmlversion.h 2023-09-22 19:01:11 +02:00
Nick Wellnhofer
b9db3d7d02 parser: Simplify xmlStringCurrentChar
Start to move away from using this function.
2023-09-22 19:01:11 +02:00
Nick Wellnhofer
f98fa86318 regexp: Fix status codes and handle invalid UTF-8
Fixes #561.
2023-09-22 19:01:11 +02:00
Nick Wellnhofer
b94283fbda regexp: Add missing include 2023-09-22 14:23:27 +02:00
Nick Wellnhofer
bc4e82ff42 globals: Don't use thread-local storage on Darwin
It seems that thread-local storage destructors are run before pthread
thread-specific data destructors on Darwin, defeating our scheme to use
TSD to clean up TLS.

Here's an example program that reports a use-after-free when compiled
with `-fsanitize=address` on macOS:

    #include <pthread.h>

    typedef struct {
	int v;
    } my_struct;

    static _Thread_local my_struct tls;
    pthread_key_t key;

    void dtor(void *tsd) {
	my_struct *s = (my_struct *) tsd;
	/*
	 * This will crash ASan, apparently because
	 * TLS has already been freed.
	 */
	s->v = 1;
    }

    void *thread(void *p) {
	pthread_setspecific(key, &tls);
	return NULL;
    }

    int main(void) {
	pthread_key_create(&key, dtor);

	pthread_t handle;
	pthread_create(&handle, NULL, thread, NULL);
	pthread_join(handle, NULL);

	return 0;
    }
2023-09-22 13:37:28 +02:00
Nick Wellnhofer
45470611b0 error: Make xmlGetLastError return a const error
This is a slight break of the API, but users really shouldn't modify the
global error struct. The goal is to make xmlLastError use static buffers
for its strings eventually. This should warn people if they're abusing
the struct.
2023-09-22 13:29:07 +02:00
Nick Wellnhofer
fc26934eb0 memory: Fix memory debugging with Windows threads
On Windows, malloc hooks can be called after the final call to
xmlCleanupParser in various tests. This means that xmlMemMutex can still
be accessed if memory debugging is enabled, so the mutex should not be
cleaned.

This also means that tests may report spurious memory leaks on Windows.

The old implementation avoided the issue by keeping track of all
global state objects in a doubly linked list, so they could be cleaned
during xmlCleanupParser.

But as far as I can tell all memory will be freed eventually, so this is
mostly an issue with our test suite.
2023-09-21 23:29:18 +02:00
Nick Wellnhofer
6eb2a00da4 tests: Update testapi.c 2023-09-21 22:58:02 +02:00
Nick Wellnhofer
8c084ebdc7 doc: Make apibuild.py happy 2023-09-21 22:57:33 +02:00
Nick Wellnhofer
e4091bcfea doc: Allow 'unsigned' without 'int' 2023-09-21 22:54:57 +02:00
Nick Wellnhofer
46d7aaecff doc: Add ignored tokens to apibuild.py 2023-09-21 22:54:30 +02:00
Nick Wellnhofer
6c4ea468b2 python: Fix tests
Revert part of commit 138213ac.
2023-09-21 21:31:52 +02:00
Nick Wellnhofer
05135536b1 globals: Fix build --with-threads --without-output
Fixes #593.
2023-09-21 20:40:32 +02:00
Nick Wellnhofer
c5890716a6 html: Fix logic in htmlAutoClose
Note that the function is never called with a NULL newtag.

Fixes #591.
2023-09-21 17:01:35 +02:00