1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-01-12 09:17:37 +03:00
Commit Graph

66 Commits

Author SHA1 Message Date
Nick Wellnhofer
e75e878e02 doc: Update and fix documentation 2024-05-20 14:23:39 +02:00
Nick Wellnhofer
f313848bd8 hash: Report malloc failures
Introduce new API functions that return a separate error code if a
memory allocation fails.

- xmlHashAdd
- xmlHashCopySafe
2023-12-11 22:05:47 +01:00
Nick Wellnhofer
a2b5c90a44 hash: Fix deletion of entries during scan
Functions like xmlCleanSpecialAttr scan a hash table and possibly delete
entries in the callback. xmlHashScanFull must detect such deletions and
rescan the entry.

This regressed when rewriting the hash table code in 4a513d56.

Fixes #626.
2023-11-21 15:28:59 +01:00
Nick Wellnhofer
a7b037952f doc: Minor fixes for apibuild.py 2023-11-04 19:32:48 +01:00
Nick Wellnhofer
61e29b6949 malloc-fail: Grow hash tables before making allocations
Fix short-lived memory leak found by OSS-Fuzz.
2023-09-30 17:02:46 +02:00
Nick Wellnhofer
4a513d5667 hash: Rewrite hash table code
This is a complete rewrite of the code in hash.c

Move from a chained hash table implementation to open addressing with
Robin Hood probing. This allows to increase the maximum fill factor and
further reduce the growth factor, saving considerable amounts of memory
without sacrificing performance.

To make this work, hash values are now cached in the table entry
also avoiding many key comparisons.

Tables are created lazily with a smaller minimum size.

Insertion functions now report an error if growing the table resulted in
a memory allocation failure.

Some string comparisons were optimized to call directly into libc
instead of using the xmlstring API.

The length of inserted keys is computed along with the hash improving
allocation performance.

Bounds checking was made more robust.

In dictionary-based mode, unneeded interning of strings is avoided.
2023-09-29 02:25:57 +02:00
Nick Wellnhofer
699299cae3 globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
Nick Wellnhofer
efcaeadc3e hash: Fix use-of-uninitialized-value
Short-lived regression.
2023-09-04 16:07:40 +02:00
Nick Wellnhofer
edc2dd48cb dict: Update hash function
Update hash function from classic Jenkins OAAT (dict.c) and a variant of
DJB2 (hash.c) to "GoodOAAT" taken from the SMHasher repo. This hash
function passes all SMHasher tests.
2023-09-04 16:07:23 +02:00
Nick Wellnhofer
57cfd221a6 dict: Use xoroshiro64** as PRNG
Stop using rand_r. This enables hash randomization on all platforms.
2023-09-01 14:52:04 +02:00
Nick Wellnhofer
6d7aaaa835 dict: Tune hash table growth
Introduce load factor as main trigger and increase MAX_HASH_LEN. This
should make growth behavior more predictable.

Raise size limit to INT_MAX. This avoids quadratic behavior with larger
tables.
2023-09-01 14:51:55 +02:00
Nick Wellnhofer
4b8f7cf05d hash: Fix integer overflow of nbElems 2023-09-01 14:43:08 +02:00
Nick Wellnhofer
06a2c251a1 hash: Fix possible startup crash with old libxslt versions
Call xmlInitParser in xmlHashCreate to make it work if the library
wasn't initialized yet.

Otherwise, exsltRegisterAll from libxslt 1.1.24 or older might cause
a crash.

See #534.
2023-05-06 15:28:13 +02:00
Nick Wellnhofer
8c2e508b5e gitlab-ci: Enable all "integer" sanitizers 2023-03-12 14:45:14 +01:00
Nick Wellnhofer
4499143a87 malloc-fail: Check for malloc failure in xmlHashAddEntry
Found with libFuzzer, see #344.
2023-02-27 17:18:05 +01:00
Nick Wellnhofer
ad338ca737 Remove explicit integer casts
Remove explicit integer casts as final operation

- in assignments
- when passing arguments
- when returning values

Remove casts

- to the same type
- from certain range-bound values

The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.

Document some explicit casts as truncating and add a few missing ones.
2022-09-01 02:33:57 +02:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
Nick Wellnhofer
72119afe00 Don't check for standard C89 library functions
Don't check for

- fprintf
- localtime
- printf
- rand
- sprintf
- srand
- sscanf
- strftime
- time
- vfprintf
- vsprintf

If the C99 functions snprintf and vsnprintf are missing, Trio is
enabled.
2022-03-02 01:14:08 +01:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
67c2e78b81 Fix integer conversion warnings in hash.c
Use unsigned long for temporary variable to avoid integer conversion
warnings with UBSan.

Note that this does change the computation of hash values for input
bytes larger than 0x7F. Before, these bytes were first converted to a
(typically) signed char with a negative value, then to a large unsigned
long near ULONG_MAX. I doubt that this was intentional. Input bytes
larger than 0x7F are now converted to unsigned long unchanged.
2022-01-25 03:15:12 +01:00
Nick Wellnhofer
20c60886e4 Fix typos
Resolves #133.
2020-03-08 17:41:53 +01:00
Nick Wellnhofer
b88ae6d2e1 Avoid ignored attribute warnings under GCC
GCC doesn't support the unsigned-integer-overflow sanitizer.
2019-10-14 15:40:32 +02:00
Nick Wellnhofer
44e7a0d5f7 Annotate functions with __attribute__((no_sanitize)) 2019-05-20 13:38:22 +02:00
Nick Wellnhofer
fa3166c227 Disable hash randomization when fuzzing
Use the FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION macro proposed by
libFuzzer.
2019-04-12 12:06:34 +02:00
Nick Wellnhofer
e03f0a199a Fix hash callback signatures
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.

Fixes bug 784861.
2017-11-09 16:42:47 +01:00
Nick Wellnhofer
8bbe4508ef Spelling and grammar fixes
Fixes bug 743172, bug 743489, bug 769632, bug 782400 and a few other
misspellings.
2017-06-17 16:34:23 +02:00
Gaurav Gupta
1811add768 Fix various Missing Null checks
For https://bugzilla.gnome.org/show_bug.cgi?id=732823
2014-07-14 17:50:27 +08:00
Daniel Franke
b123711189 Improve the hashing functions 2013-04-12 18:53:53 +08:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
379ebc1d77 Cleanup on randomization
tsan reported that rand() is not thread safe, so create
a thread safe wrapper, use rand_r() if available.
Consolidate the function, initialization and cleanup in
dict.c and make sure it is initialized in xmlInitParser()
2012-05-18 15:41:31 +08:00
Daniel Veillard
8973d58b74 Add hash randomization to hash and dict structures
Following http://www.ocert.org/advisories/ocert-2011-003.html
it seems that having hash randomization might be a good idea
when using XML with untrusted data
* configure.in: lookup for rand, srand and time
* dict.c: add randomization to dictionaries hash tables
* hash.c: add randomization to normal hash tables
2012-02-13 11:11:01 +08:00
Daniel Veillard
594e5dfb48 Chasing dead assignments reported by clang-scan
* SAX2.c dict.c error.c hash.c nanohttp.c parser.c python/libxml.c
  relaxng.c runtest.c tree.c valid.c xinclude.c xmlregexp.c xmlsave.c
  xmlschemas.c xpath.c xpointer.c: mostly removing unneded affectations,
  but this led to a few real bugs and some part not yet understood
  (relaxng/interleave)
2009-09-07 14:58:47 +02:00
Daniel Veillard
ac4118d5ca handle a erroneous parsing of attributes in case said attribute has been
* parser.c: handle a erroneous parsing of attributes in 
  case said attribute has been redeclared in the DTD with a
  different type
* hash.c: fix the hash scanner to not crash if a first element
  from the hash list is been removed in the callback
Daniel

svn path=/trunk/; revision=3669
2008-01-11 05:27:32 +00:00
Daniel Veillard
5d4644ef6e revamped the elfgcchack.h format to cope with gcc4 change of aliasing
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
  format to cope with gcc4 change of aliasing allowed scopes, had
  to add extra informations to doc/libxml2-api.xml to separate
  the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
  and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
2005-04-01 13:11:58 +00:00
Daniel Veillard
316a5c3989 added xmlHashCreateDict where the hash reuses the dictionnary for internal
* hash.c include/libxml/hash.h: added xmlHashCreateDict where
  the hash reuses the dictionnary for internal strings
* entities.c valid.c parser.c: reuse that new API, leads to a decent
  speedup when parsing for example DocBook documents.
Daniel
2005-01-23 22:56:39 +00:00
Daniel Veillard
e991fe958f change suggested by Anthony Carrico when unregistering a namespace prefix
* xpath.c: change suggested by Anthony Carrico when unregistering
  a namespace prefix to a context
* hash.c: be more careful about calling callbacks with NULL payloads.
Daniel
2003-10-29 11:18:37 +00:00
Daniel Veillard
092643b52d preparing a beta3 solving the ABI problems make sure the global variables
* configure.in: preparing a beta3 solving the ABI problems
* globals.c parser.c parserInternals.c testHTML.c HTMLparser.c SAX.c
  include/libxml/globals.h include/libxml/SAX.h: make sure the
  global variables for the default SAX handler are V1 ones to
  avoid ABI compat problems.
* xmlreader.c: cleanup of uneeded code
* hash.c: fix a comment
Daniel
2003-09-25 14:29:29 +00:00
Daniel Veillard
7a02cfe0d7 fixing some comments to avoid warnings from apibuild.py Daniel
* SAX2.c hash.c parser.c include/libxml/xmlexports.h
  include/libxml/xmlmemory.h include/libxml/xmlversion.h.in:
  fixing some comments to avoid warnings from apibuild.py
Daniel
2003-09-25 12:18:34 +00:00
Daniel Veillard
8e36e6a0be 2.6.0beta1 changes Fixing attribute normalization, might not be totally
* configure.in doc/* : 2.6.0beta1 changes
* SAX2.c hash.c parser.c parserInternals.c: Fixing attribute
  normalization, might not be totally fixed but this should
  make sure SAX event provide the right strings for attributes
  except entities for which libxml2 is different by default
  This should fix #109564
* result/attrib.xml.sax result/ent3.sax result/p3p.sax: minor changes
  in attribute callback values
* result/c14n/with-comments/example-4
  result/c14n/without-comments/example-4: this also fixes a subtle
  bug in the canonicalization tests.
Daniel
2003-09-10 10:50:59 +00:00
Daniel Veillard
e57ec790de Time to commit 3 days of work rewriting the parser internal,
fixing bugs and migrating to SAX2 interface by default. There
is some work letf TODO, like namespace validation and attributes
normalization (this break C14N right now)
* Makefile.am: fixed the test rules
* include/libxml/SAX2.h include/libxml/parser.h
  include/libxml/parserInternals.h SAX2.c parser.c
  parserInternals.c: changing the parser, migrating to SAX2,
  adding new interface to switch back to SAX1 or initialize a
  SAX block for v1 or v2. Most of the namespace work is done
  below SAX, as well as attribute defaulting
* globals.c: changed initialization of the default SAX handlers
* hash.c tree.c include/libxml/hash.h: added QName specific handling
* xmlIO.c: small fix
* xmllint.c testSAX.c: provide a --sax1 switch to test the old
  version code path
* result/p3p result/p3p.sax result/noent/p3p test/p3p: the new code
  pointed out a typo in a very old test namespace
Daniel
2003-09-10 10:50:59 +00:00
Daniel Veillard
6155d8aafa optimization when freeing hash tables. some tuning of buffer allocations
* dict.c hash.c: optimization when freeing hash tables.
* parser.c xmlIO.c include/libxml/tree.h: some tuning of buffer
  allocations
* parser.c parserInternals.c include/libxml/parser.h: keep a
  single allocated block for all the attributes callbacks,
  avoid useless malloc()/free()
* tree.c: do not realloc() when growing a buffer if the buffer
  ain't full, malloc/memcpy/free avoid copying memory.
Daniel
2003-08-19 15:01:28 +00:00
Daniel Veillard
01c13b5be2 code cleanup, especially the function comments. fixed a small bug when
* DOCBparser.c HTMLparser.c c14n.c debugXML.c encoding.c hash.c
  nanoftp.c nanohttp.c parser.c parserInternals.c testC14N.c
  testDocbook.c threads.c tree.c valid.c xmlIO.c xmllint.c xmlmemory.c
  xmlreader.c xmlregexp.c xmlschemas.c xmlschemastypes.c xpath.c:
  code cleanup, especially the function comments.
* tree.c: fixed a small bug when freeing nodes which are XInclude ones.
Daniel
2002-12-10 15:19:08 +00:00
Daniel Veillard
aeb258a9ca cosmetic cleanup started integrating a DTD validation layer based on the
* hash.c: cosmetic cleanup
* valid.c include/libxml/tree.h include/libxml/valid.h: started
  integrating a DTD validation layer based on the regexps
Daniel
2002-09-13 14:48:12 +00:00
Daniel Veillard
fdc9156a75 applied patch from Richard Jinks for the namespace axis + fixed a memory
* xpath.c: applied patch from Richard Jinks for the namespace
  axis + fixed a memory error.
* parser.c parserInternals.c: applied patches from Peter Jacobi
  removing ctxt->token for good.
* xmlschemas.c xmlschemastypes.c: fixed a few memory leaks
  popped out by the regression tests.
* Makefile.am: patch for threads makefile from Gary Pennington
Daniel
2002-07-01 21:52:03 +00:00
Daniel Veillard
153120c4b7 applied a patch from Peter Jacobi to solve a problem when compiling with
* hash.c: applied a patch from Peter Jacobi to solve a problem
  when compiling with the Watcom C on Win32
* result/schemas/*.err: the change of hashing algo generated
  permutations in the output
Daniel
2002-06-18 07:58:35 +00:00
Daniel Veillard
5f7f991ab7 applied patch from Sander Vesik improving the quality of the hash
* hash.c: applied patch from Sander Vesik improving the quality of
  the hash function.
Daniel
2002-06-17 17:03:00 +00:00
Daniel Veillard
34ce8bece2 preparing 2.4.18 updated and rebuilt the web site implement the new
* configure.in: preparing 2.4.18
* doc/*: updated and rebuilt the web site
* *.c libxml.h: implement the new IN_LIBXML scheme discussed with
  the Windows and Cygwin maintainers.
* parser.c: humm, changed the way the SAX parser work when
  xmlSubstituteEntitiesDefault(1) is set, it will then
  do the entity registration and loading by itself in case the
  user provided SAX getEntity() returns NULL.
* testSAX.c: added --noent to test the behaviour.
Daniel
2002-03-18 19:37:11 +00:00
Daniel Veillard
314cfa083c patch from Anthony Jones for hash.c allocation size trying to work around
* hash.c: patch from Anthony Jones for hash.c allocation size
* Makefile.am: trying to work around Yet Another Libtool Madness
  and build the 2.4.13 release finally ...
daniel
2002-01-14 17:58:01 +00:00
Daniel Veillard
cbaf399537 applied 42 documentation patches from Charlie Bozeman. Regenerated the
* *.c include/libxml/*.h doc/html/*: applied 42 documentation
  patches from Charlie Bozeman. Regenerated the HTML docs.
Daniel
2001-12-31 16:16:02 +00:00