1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-01-27 14:03:36 +03:00

132 Commits

Author SHA1 Message Date
Nick Wellnhofer
b0fc67aa22 build: Remove --with-tree configuration option
This option would allow for a smaller, but mostly useless minimal build.
But it complicates the symbol availability logic in an insane way and
requires specialized tools like our custom C parser in doc/apibuild.py.

See #717.
2024-06-16 18:47:12 +02:00
Nick Wellnhofer
8318b5a634 parser: Fix NULL checks for output arguments 2024-06-09 15:08:43 +02:00
Nick Wellnhofer
f0d891585d entities: Unconst predefined entities
Partial revert of commit 63ce5f9a. For some reason, Chromium and WebKit
set the etype member of predefined entities. This should be fixed first.
2024-06-01 15:41:43 +02:00
Nick Wellnhofer
e75e878e02 doc: Update and fix documentation 2024-05-20 14:23:39 +02:00
Nick Wellnhofer
63ce5f9aed Make some globals const 2024-04-28 17:53:39 +02:00
Nick Wellnhofer
ee0c1f87c0 fuzz: New tree API fuzzer 2024-03-15 19:54:27 +01:00
Nick Wellnhofer
edbf1eb63b entities: Don't allow null name in xmlNewEntity 2024-03-15 19:47:08 +01:00
Nick Wellnhofer
50816b8d1a entities: Check for illegal entity types in xmlAddEntity 2024-03-15 19:47:08 +01:00
Nick Wellnhofer
ab345338a4 valid: Report malloc failure in legacy DTD serialization 2024-03-15 19:47:08 +01:00
Nick Wellnhofer
fbe10a466f save: Move DTD serialization code to xmlsave.c 2024-02-04 14:33:19 +01:00
Nick Wellnhofer
c2b3294f60 fuzz: Abort on invalid UTF-8
The parser should never generate invalid UTF-8 these days even in
recovery mode.
2024-01-04 21:20:51 +01:00
Nick Wellnhofer
d025cfbb4b parser: Always copy content from entity to target.
Make sure that references from IDs are updated.

Note that if there are IDs with the same value in a document, the last
one will now be returned. IDs should be unique, but maybe this should be
addressed.
2023-12-29 01:22:11 +01:00
Nick Wellnhofer
a1f7ecaef8 entities: Report malloc failures
Fix places where malloc failures aren't reported.

Introduce new API function xmlAddEntity that returns separate error
codes.

Don't invoke global error handler for low-level errors which should be
handled by higher layers.

Invalid redelcaration warnings will be fixed later.
2023-12-11 22:05:47 +01:00
Nick Wellnhofer
713ded60ad entities: Make xmlFreeEntity public 2023-10-06 10:47:07 +02:00
Nick Wellnhofer
699299cae3 globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
Nick Wellnhofer
9d80a2b134 entities: Don't change doc when encoding entities
doc->encoding shouldn't be touched by xmlEncodeEntitiesInternal.
2023-08-17 12:47:14 +02:00
Nick Wellnhofer
ce76ebfd13 entities: Stop counting entities
This was only used in the old version of xmlParserEntityCheck.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
463bbeeca1 entities: Rework entity amplification checks
This commit implements robust detection of entity amplification attacks,
better known as the "billion laughs" attack.

We now limit the size of the document after substitution of entities to
10 times the size before expansion. This guarantees linear behavior by
definition. There already was a similar check before, but the accounting
of "sizeentities" (size of external entities) and "sizeentcopy" (size of
all copies created by entity references) wasn't accurate.

We also need saturation arithmetic since we're historically limited to
"unsigned long" which is 32-bit on many platforms.

A maximum of 10 MB of substitutions is always allowed. This should make
use cases like DITA work which have caused problems in the past.

The old checks based on the number of entities were removed. This is
accounted for by adding a fixed cost to each entity reference.

Entity amplification checks are now enabled even if XML_PARSE_HUGE is
set. This option is mainly used to allow larger text nodes. Most users
were unaware that it also disabled entity expansion checks.

Some of the limits might be adjusted later. If this change turns out to
affect legitimate use cases, we can add a separate parser option to
disable the checks.

Fixes #294.
Fixes #345.
2022-12-21 20:19:10 +01:00
Nick Wellnhofer
f34f184f8e entities: Add "flags" member to struct xmlEntity
This will hold various flags and eventually replace the "checked"
member.
2022-12-19 15:24:53 +01:00
Nick Wellnhofer
2059df5358 buf: Deprecate static/immutable buffers 2022-11-20 21:16:03 +01:00
Nick Wellnhofer
644a89e080 [CVE-2022-40304] Fix dict corruption caused by entity reference cycles
When an entity reference cycle is detected, the entity content is
cleared by setting its first byte to zero. But the entity content might
be allocated from a dict. In this case, the dict entry becomes corrupted
leading to all kinds of logic errors, including memory errors like
double-frees.

Stop storing entity content, orig, ExternalID and SystemID in a dict.
These values are unlikely to occur multiple times in a document, so they
shouldn't have been stored in a dict in the first place.

Thanks to Ned Williamson and Nathan Wachholz working with Google Project
Zero for the report!
2022-10-14 15:02:06 +02:00
Nick Wellnhofer
2cac626976 Don't use sizeof(xmlChar) or sizeof(char) 2022-09-01 03:35:19 +02:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
f550977295 Fix documentation in entities.c 2022-02-20 22:06:16 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
ce0871e15c Only warn on invalid redeclarations of predefined entities
Downgrade the error message to a warning since the error was ignored,
anyway. Also print the name of redeclared entity. For a proper fix that
also shows filename and line number of the invalid redeclaration, we'd
have to

- pass the parser context to the entity functions somehow, or
- make these functions return distinct error codes.

Partial fix for #308.
2022-02-20 21:49:04 +01:00
Joel Hockey
bf22713507 Validate UTF8 in xmlEncodeEntities
Code is currently assuming UTF-8 without validating. Truncated UTF-8
input can cause out-of-bounds array access.

Adds further checks to partial fix in 50f06b3e.

Fixes #178
2021-04-22 11:57:32 +02:00
Nick Wellnhofer
cbe1212db6 Fix null deref introduced with previous commit
Found by OSS-Fuzz.
2021-02-09 17:07:21 +01:00
Nick Wellnhofer
01411e7c5e Check for invalid redeclarations of predefined entities
Implement section "4.6 Predefined Entities" of the XML 1.0 spec and
check whether redeclarations of predefined entities match the original
definitions.

Note that some test cases declared

    <!ENTITY lt "<">

But the XML spec clearly states that this is illegal:

> If the entities lt or amp are declared, they MUST be declared as
> internal entities whose replacement text is a character reference to
> the respective character (less-than sign or ampersand) being escaped;
> the double escaping is REQUIRED for these entities so that references
> to them produce a well-formed result.

Also fixes #217 but the connection is only tangential. The integer
overflow discovered by fuzzing was more related to the fact that various
parts of the parser disagreed on whether to prefer predefined entities
over their redeclarations. The whole situation is a mess and even
depends on legacy parser options. But now that redeclarations are
validated, it shouldn't make a difference.

As noted in the added comment, this is also one of the cases where
overly defensive checks can hide interesting logic bugs from fuzzers.
2021-02-08 21:51:26 +01:00
Nick Wellnhofer
20c60886e4 Fix typos
Resolves #133.
2020-03-08 17:41:53 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
e03f0a199a Fix hash callback signatures
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.

Fixes bug 784861.
2017-11-09 16:42:47 +01:00
Stéphane Michaut
454e397eb7 Porting libxml2 on zOS encoding of code
First set of patches for zOS
- entities.c parser.c tree.c xmlschemas.c xmlschemastypes.c xpath.c xpointer.c:
  ask conversion of code to ISO Latin 1 to avoid having the compiler assume
  EBCDIC codepoint for characters.
- xmlmodule.c: make sure we have support for modules
- xmlIO.c: zOS path names are special avoid dsome of the expectstions from
  Unix/Windows
2017-08-28 14:30:43 +02:00
David Kilzer
4472c3a5a5 Fix some format string warnings with possible format string vulnerability
For https://bugzilla.gnome.org/show_bug.cgi?id=761029

Decorate every method in libxml2 with the appropriate
LIBXML_ATTR_FORMAT(fmt,args) macro and add some cleanups
following the reports.
2016-05-23 15:01:07 +08:00
Kurt Roeckx
95ebe53b50 Fix and add const qualifiers
For https://bugzilla.gnome.org/show_bug.cgi?id=689483

It seems there are functions that do use the const qualifier for some of the
arguments, but it seems that there are a lot of functions that don't use it and
probably should.

So I created a patch against 2.9.0 that makes as much as possible const in
tree.h, and changed other files as needed.

There were a lot of cases like "const xmlNodePtr node".  This doesn't actually
do anything, there the *pointer* is constant not the object it points to. So I
changed those to "const xmlNode *node".

I also removed some consts, mostly in the Copy functions, because those
functions can actually modify the doc or node they copy from
2014-10-13 16:06:21 +08:00
Daniel Veillard
0ab8ce5302 Switched comment in file to UTF-8 encoding 2013-03-30 22:33:05 +08:00
Daniel Veillard
7651606f31 Various cleanups to avoid compiler warnings 2012-09-11 14:02:08 +08:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
7d4c529a33 Improve HTML escaping of attribute on output
Handle special cases of &{...} constructs as hinted in the spec
  http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
2012-09-05 12:11:43 +08:00
Aron Xu
baaf03f80f Fix an error in previous commit 2012-07-20 15:41:34 +08:00
Daniel Veillard
4f9fdc709c Fix entities local buffers size problems 2012-07-18 17:54:05 +08:00
Daniel Veillard
13cee4e37b Fix a bunch of scan 'dead increments' and cleanup
* HTMLparser.c c14n.c debugXML.c entities.c nanohttp.c parser.c
  testC14N.c uri.c xmlcatalog.c xmllint.c xmlregexp.c xpath.c:
  fix unused variables, or unneeded increments as well as a couple
  of space issues
* runtest.c: check for NULL before calling unlink()
2009-09-05 14:52:55 +02:00
Daniel Veillard
aa6de47ebf applied patch from Aswin to fix tree skipping fixed a comment and added a
* xmlreader.c: applied patch from Aswin to fix tree skipping
* include/libxml/entities.h entities.c: fixed a comment and
  added a new xmlNewEntity() entry point
* runtest.c: be less verbose
* tree.c: space and tabs cleanups
daniel

svn path=/trunk/; revision=3774
2008-08-25 14:53:31 +00:00
Daniel Veillard
f4f4e4853a rework the patch to avoid some ABI issue with people allocating entities
* include/libxml/entities.h entities.c SAX2.c parser.c: rework
  the patch to avoid some ABI issue with people allocating
  entities structure directly
Daniel

svn path=/trunk/; revision=3773
2008-08-25 08:57:48 +00:00
Daniel Veillard
4bf899bf1b fix for CVE-2008-3281 Daniel
* include/libxml/parser.h include/libxml/entities.h entities.c
  parserInternals.c parser.c: fix for CVE-2008-3281
Daniel

svn path=/trunk/; revision=3772
2008-08-20 17:04:30 +00:00
Daniel Veillard
a37a6ad91a trying to fix entities behaviour when using SAX, had to extend entities
* include/libxml/entities.h entities.c SAX2.c parser.c: trying to
  fix entities behaviour when using SAX, had to extend entities
  content and hack on the entities processing code, but that should
  fix the long standing bug #159219
Daniel
2006-10-10 20:05:45 +00:00
Daniel Veillard
2728f845c5 more cleanups based on coverity reports. Daniel
* SAX2.c catalog.c encoding.c entities.c example/gjobread.c
  python/libxml.c: more cleanups based on coverity reports.
Daniel
2006-03-09 16:49:24 +00:00
Daniel Veillard
5d4644ef6e revamped the elfgcchack.h format to cope with gcc4 change of aliasing
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
  format to cope with gcc4 change of aliasing allowed scopes, had
  to add extra informations to doc/libxml2-api.xml to separate
  the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
  and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
2005-04-01 13:11:58 +00:00
Daniel Veillard
316a5c3989 added xmlHashCreateDict where the hash reuses the dictionnary for internal
* hash.c include/libxml/hash.h: added xmlHashCreateDict where
  the hash reuses the dictionnary for internal strings
* entities.c valid.c parser.c: reuse that new API, leads to a decent
  speedup when parsing for example DocBook documents.
Daniel
2005-01-23 22:56:39 +00:00