1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 12:25:09 +03:00
Commit Graph

170 Commits

Author SHA1 Message Date
Nick Wellnhofer
ebb1797030 Remove unneeded #includes 2022-03-04 22:11:49 +01:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
2489c1d024 Remove useless __CYGWIN__ checks
From what I can tell, some really early Cygwin versions from around
1998-2000 used to erroneously define _WIN32. This was eventually fixed,
but these days, the `defined(_WIN32) && !defined(__CYGWIN__)` idiom is
unnecessary.

Now, we only check for __CYGWIN__ in xmlexports.h when deciding whether
to use __declspec.
2022-02-28 22:58:35 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
d7cb33cf44 Rework validation context flags
Use a bitmask instead of magic values to

- keep track whether the validation context is part of a parser context
- keep track whether xmlValidateDtdFinal was called

This allows to add addtional flags later.

Note that this deliberately changes the name of a public struct member,
assuming that this was always private data never to be used by client
code.
2022-02-20 21:49:04 +01:00
David King
328456bf29 Fix memory leak in xmlNewInputFromFile
Found by Coverity.

https://bugzilla.redhat.com/show_bug.cgi?id=1938806
2022-01-16 14:15:09 +01:00
Nick Wellnhofer
dcb80b92da Fix slow parsing of HTML with encoding errors
Under certain circumstances, the HTML parser would try to guess and
switch input encodings multiple times, leading to slow processing of
documents with encoding errors. The repeated scanning of the input
buffer when guessing encodings could even lead to quadratic behavior.

The code htmlCurrentChar probably assumed that if there's an encoding
handler, it is guaranteed to produce valid UTF-8. This holds true in
general, but if the detected encoding was "UTF-8", the UTF8ToUTF8
encoding handler simply invoked memcpy without checking for invalid
UTF-8. This still must be fixed, preferably by not using this handler
at all.

Also leave a note that switching encodings twice seems impossible to
implement correctly. Add a check when handling UTF-8 encoding errors
in htmlCurrentChar to avoid this situation, even if encoders produce
invalid UTF-8.

Found by OSS-Fuzz.
2021-02-20 21:28:56 +01:00
Nick Wellnhofer
438e595a8c Stop counting nbChars in parser context
The value was inaccurate and never used.
2020-08-09 15:01:45 +02:00
Nick Wellnhofer
20c60886e4 Fix typos
Resolves #133.
2020-03-08 17:41:53 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
3776cb4745 Fix memory leak in xmlSwitchInputEncodingInt error path
Found by OSS-Fuzz.
2018-11-22 16:28:46 +01:00
Nick Wellnhofer
7a1bd7f649 Revert "Change calls to xmlCharEncInput to set flush false"
This reverts commit 6e6ae5daa6 which
broke decoding of larger documents with ICU.

See https://bugs.chromium.org/p/chromium/issues/detail?id=820163
2018-03-17 00:03:24 +01:00
Joel Hockey
6e6ae5daa6 Change calls to xmlCharEncInput to set flush false when not final call. Having flush incorrectly set to true causes errors for ICU. 2018-01-08 19:57:53 +01:00
Nick Wellnhofer
cb5541c9f3 Fix libz and liblzma detection
If libz or liblzma are detected with pkg-config, AC_CHECK_HEADERS must
not be run because the correct CPPFLAGS aren't set. It is actually not
required have separate checks for LIBXML_ZLIB_ENABLED and HAVE_ZLIB_H.
Only check for LIBXML_ZLIB_ENABLED and remove HAVE_ZLIB_H macro.

Fixes bug 764657, bug 787041.
2017-11-27 14:33:37 +01:00
Nick Wellnhofer
e03f0a199a Fix hash callback signatures
Make sure that all parameters and return values of hash callback
functions exactly match the callback function type. This is required
to pass clang's Control Flow Integrity checks and to allow compilation
to asm.js with Emscripten.

Fixes bug 784861.
2017-11-09 16:42:47 +01:00
Nick Wellnhofer
e3890546d7 Fix the Windows header mess
Don't include windows.h and wsockcompat.h from config.h but only when
needed.

Don't define _WINSOCKAPI_ manually. This was apparently done to stop
windows.h from including winsock.h which is a problem if winsock2.h
wasn't included first. But on MinGW, this causes compiler warnings.
Define WIN32_LEAN_AND_MEAN instead which has the same effect.

Always use the compiler-defined _WIN32 macro instead of WIN32.
2017-10-09 14:35:40 +02:00
Nick Wellnhofer
69936b129f Revert "Print error messages for truncated UTF-8 sequences"
This reverts commit 79c8a6b which caused a serious regression in
streaming mode.

Also reverts part of commit 52ceced "Fix infinite loops with push
parser in recovery mode".

Fixes bug 786554.
2017-08-30 14:19:06 +02:00
Daniel Veillard
f19385a589 Fix a couple of misleading indentation errors
Raised by gcc as potential error, no semantic change needed but
fixed the indentation
2017-08-28 20:40:19 +02:00
Nick Wellnhofer
3aca7f31cb Fix unwanted warnings when switching encodings
Revert part of commit 46dc989 "Don't switch encoding for internal
parameter entities" that caused spurious warnings.

Fixes bug 786267.
2017-08-21 13:09:33 +02:00
Nick Wellnhofer
453dff1e3b Remove unnecessary calls to xmlPopInput
It's enough if xmlPopInput is called from xmlSkipBlankChars. Since the
replacement text of a parameter entity is surrounded with space
characters, that's the only place where the replacement can end in a
well-formed document.

This is also required to get rid of the "blanks wrapper" hack.
2017-06-20 13:19:47 +02:00
Nick Wellnhofer
aa267cd127 Simplify handling of parameter entity references
There are only two places where parameter entity references must be
handled. For the internal subset in xmlParseInternalSubset. For the
external subset or content from other external PEs in xmlSkipBlankChars.

Make sure that xmlSkipBlankChars skips over sequences of PEs and
whitespace. Rely on xmlSkipBlankChars instead of calling
xmlParsePEReference directly when in the external subset or a
conditional section.

xmlParserHandlePEReference is unused now.
2017-06-20 13:19:47 +02:00
Nick Wellnhofer
46dc989080 Don't switch encoding for internal parameter entities
This is only needed for external entities. Trying to switch the encoding
for internal entities could also cause a memory leak in recovery mode.
2017-06-17 13:23:40 +02:00
Nick Wellnhofer
79c8a6b105 Print error messages for truncated UTF-8 sequences
Before, truncated UTF-8 sequences at the end of a file were treated as
EOF. Create an error message containing the offending bytes.

xmlStringCurrentChar would also print characters from the input stream,
not the string it's working on.
2017-06-10 18:11:58 +02:00
Nick Wellnhofer
f9e7997e80 Reset parser input pointers on encoding failure
Call xmlBufResetInput before bailing out if switching the encoding
fails. Otherwise, the input pointers are left in an invalid state.
This would typically lead to an internal error in xmlGROW but could also
cause other unforeseen problems.
2017-06-10 17:50:27 +02:00
Nick Wellnhofer
45ce1ee399 Add TODO comment in xmlSwitchEncoding
It would be nice if we could recover from unsupported encodings in
external entities.
2017-06-10 17:32:44 +02:00
Nick Wellnhofer
0db8dc9ddc Stop parser on unsupported encodings
Otherwise, the push parser can loop infinitely in recover mode.

Found with libFuzzer.
2017-06-07 19:30:56 +02:00
Pranjal Jumde
0bcd05c5cd Heap-based buffer overread in htmlCurrentChar
For https://bugzilla.gnome.org/show_bug.cgi?id=758606

* parserInternals.c:
(xmlNextChar): Add an test to catch other issues on ctxt->input
corruption proactively.
For non-UTF-8 charsets, xmlNextChar() failed to check for the end
of the input buffer and would continuing reading.  Fix this by
pulling out the check for the end of the input buffer into common
code, and return if we reach the end of the input buffer
prematurely.
* result/HTML/758606.html: Added.
* result/HTML/758606.html.err: Added.
* result/HTML/758606.html.sax: Added.
* result/HTML/758606_2.html: Added.
* result/HTML/758606_2.html.err: Added.
* result/HTML/758606_2.html.sax: Added.
* test/HTML/758606.html: Added test case.
* test/HTML/758606_2.html: Added test case.
2016-05-23 15:01:07 +08:00
David Kilzer
4472c3a5a5 Fix some format string warnings with possible format string vulnerability
For https://bugzilla.gnome.org/show_bug.cgi?id=761029

Decorate every method in libxml2 with the appropriate
LIBXML_ATTR_FORMAT(fmt,args) macro and add some cleanups
following the reports.
2016-05-23 15:01:07 +08:00
David Kilzer
d433ea6c83 Integer signed/unsigned type mismatch in xmlParserInputGrow()
For https://bugzilla.gnome.org/show_bug.cgi?id=766635

* parserInternals.c:
(xmlParserInputGrow): Change 'ret' type to 'int' to match the
return type of xmlParserInputBufferGrow().
2016-05-22 09:49:50 +08:00
Daniel Veillard
fdfeecc1b7 Bug on creating new stream from entity
sometimes the entity could have a lenght of 0, i.e. it wasn't
parsed or used yet, and we ended up with an incoherent input state
2015-11-20 15:07:38 +08:00
Daniel Veillard
afd27c21f6 Avoid processing entities after encoding conversion failures
For https://bugzilla.gnome.org/show_bug.cgi?id=756527
and was also raised by Chromium team in the past

When we hit a convwersion failure when switching encoding
it is bestter to stop parsing there, this was treated as a
fatal error but the parser was continuing to process to extract
more errors, unfortunately that makes little sense as the data
is obviously corrupt and can potentially lead to unexpected behaviour.
2015-11-09 18:07:18 +08:00
Daniel Veillard
c35af8b18d Fixes for xmlInitParserCtxt
let's make sure that parser options are updated too when a corrsponding
global variable or other field of the context is set.
2014-06-11 17:00:39 +08:00
Daniel Veillard
ff76eb28c7 Clear up a potential NULL dereference
https://bugzilla.gnome.org/show_bug.cgi?id=705399

if ctxt->node_seq.buffer is null then ctxt->node_seq.maximum ought
to be zero but it's better to clarify the check in the code directly.
2013-08-03 22:25:13 +08:00
Daniel Veillard
23f05e0c33 Detect excessive entities expansion upon replacement
If entities expansion in the XML parser is asked for,
it is possble to craft relatively small input document leading
to excessive on-the-fly content generation.
This patch accounts for those replacement and stop parsing
after a given threshold. it can be bypassed as usual with the
HUGE parser option.
2013-02-19 10:21:49 +08:00
Daniel Veillard
bf058dce13 Fix the flushing out of raw buffers on encoding conversions
https://bugzilla.gnome.org/show_bug.cgi?id=692915

the new set of converting functions tried to limit the encoding
conversion of the raw buffer to the consumption one to work in
a more progressive fashion. Unfortunately this was bad for
performances and led to errors on progressive parsing when
a very large chunk was close to the end of the document. Fix
the new internal function and switch back to the old way of
converting. Fix another bug in the process.
2013-02-13 18:19:42 +08:00
Michael Wood
fb27e2cd20 Fix spelling of "length". 2012-10-30 10:18:49 +08:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
52d8ade7a7 Introduce some default parser limits
Those can be overrided by the XML_PARSE_HUGE option, they
are just default limits for Name lenght, dictionary size limits
and maximum amount of parser lookup.
* include/libxml/parserInternals.h: define the limits
* include/libxml/xmlerror.h: add a new error
* parser.c parserInternals.c: implements the new limits
2012-07-30 10:08:45 +08:00
Daniel Veillard
61551a1eb7 Cleanup function xmlBufResetInput() to set input from Buffer
This was scattered in a number of modules, xmlParserInputPtr
have usually their base, cur and end pointer set from an
xmlBuf used as input.
* buf.c buf.h: add a new function implementing this setup
* parser.c HTMLparser.c catalog.c parserInternals.c xmlreader.c
  use the new function instead of digging into the buffer in
  all those modules
2012-07-23 14:24:27 +08:00
Daniel Veillard
768eb3b82d Convert XML parser to the new input buffers
The main changes are when the internal of the buffers structure
were adressed directly, we now use routines coming from buf.h
The routine xmlParserInputRead() which wasn't used anywhere is
deprecated too.
2012-07-23 14:24:26 +08:00
Daniel Veillard
0d51cfebc9 Fix a race in xmlNewInputStream
For https://bugzilla.gnome.org/show_bug.cgi?id=643148
Reported by Bill Clarke <llib@computer.org>, it used a global variable
as a counter for the input id and this was not thread safe. To avoid the
race without adding unneeded locking in the parser path, move the id to
the parser context instead.
2012-05-15 11:18:40 +08:00
Eugene Pimenov
615904f582 Switch the HTML parser to be non-recursive
* HTMLparser.c: new htmlParseElementInternal non recursive, with
  htmlParseContentInternal and new function to handle node info
  and element end.
* include/libxml/parser.h: add new stack for element info in parser
  context
* parserInternals.c: fee element info stack
2010-03-15 15:16:02 +01:00
Daniel Veillard
7e385bd4e2 566012 autodetected encoding and encoding conflict
* encoding.c parser.c parserInternals.c: when we autodetect an encoding
  but it's actually not completely compatible with the one declared
  great care must be taken to not convert more than just the first line.
  Led to some refactoring, more private functions and a bit of cleanup.
2009-08-26 11:38:49 +02:00
Daniel Veillard
33c76c8312 Fix end of buffer char being split in XML parser
* parserInternals.c: similar patch to previous, reset cur on GROW
  in xmlNextChar and xmlCurrentChar
2009-08-25 11:30:34 +02:00
Nick Wellnhofer
2f522dc68f Fix xmlKeepBlanksDefault to not break indent
* parserInternals.c: the old compatibility function xmlKeepBlanksDefault()
  should not reset xmlIndentTreeOutput to 0 because the default is 1
2009-08-20 12:11:17 +02:00
Daniel Veillard
4bf899bf1b fix for CVE-2008-3281 Daniel
* include/libxml/parser.h include/libxml/entities.h entities.c
  parserInternals.c parser.c: fix for CVE-2008-3281
Daniel

svn path=/trunk/; revision=3772
2008-08-20 17:04:30 +00:00
Daniel Veillard
87303e3c7c applied patch from Ashwin to avoid a potential double-free Daniel
* parserInternals.c: applied patch from Ashwin to avoid a potential
  double-free
Daniel

svn path=/trunk/; revision=3741
2008-04-28 18:07:29 +00:00
Daniel Veillard
b3edafd72d avoid a warning on 64bits introduced earlier make more checking on the
* parser.c: avoid a warning on 64bits introduced earlier
* parserInternals.c: make more checking on the UTF-8 input
Daniel

svn path=/trunk/; revision=3676
2008-01-11 08:00:57 +00:00
Daniel Veillard
5addfebd06 applied patch from Marius Konitzer to avoid leaking in
* parserInternals.c: applied patch from Marius Konitzer to avoid
  leaking in xmlNewInputFromFile() in case of HTTP redirection
Daniel
2006-10-17 20:32:22 +00:00
Daniel Veillard
30e7607b7a a bunch of small cleanups based on coverity reports. Daniel
* HTMLparser.c parser.c parserInternals.c pattern.c uri.c: a bunch
  of small cleanups based on coverity reports.
Daniel
2006-03-09 14:13:55 +00:00