1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 12:25:09 +03:00
Commit Graph

144 Commits

Author SHA1 Message Date
Nick Wellnhofer
c34d0ae9cc html: Deprecate htmlIsBooleanAttr 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
a530ff125d io: Always consume encoding handler when creating output buffers
Also free encoding handler in error case.

Remove xmlAllocOutputBufferInternal which was identical to
xmlAllocOutputBuffer.
2024-07-29 14:25:39 +02:00
Nick Wellnhofer
a221cd7849 buf: Rework xmlBuf code
Always use what the old implementation called the "IO" allocation
scheme, allowing to move the content pointer past the initial
allocation. This is inexpensive and allows efficient shrinking.

Optimize xmlBufGrow, reusing shrunken memory as much as possible.

Simplify xmlBufAdd.

Make xmlBufBackToBuffer return an error on overflow.

Make "size" exclude the terminating NULL byte.

Always provide an initial size.

Reintroduce static buffers.

Remove xmlBufResize and several other functions.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
598ee0d2c6 error: Remove underscores from xmlRaiseError 2024-06-27 14:43:10 +02:00
Nick Wellnhofer
5b893fa999 encoding: Fix encoding lookup with xmlOpenCharEncodingHandler
Make xmlOpenCharEncodingHandler call xmlParseCharEncoding first so we
prefer our own handlers for names like "UTF8". Only UTF-16 needs an
exception.

Make callers check the return value. For UTF-8, a NULL encoding doesn't
mean an error.

Remove unnecessary UTF-8 check from htmlFindOutputEncoder. Don't try to
look up ASCII handler since the HTML handler is always available.

Fix return code of xmlParseCharEncoding.

Should fix #744.
2024-06-22 21:59:03 +02:00
Nick Wellnhofer
72e9267c32 html: Fix memory leak after malloc failure 2024-05-06 17:40:15 +02:00
Nick Wellnhofer
10c4ed1f2d html: Fix quadratic behavior in htmlNodeDump
Use an efficient buffer allocation scheme.
2024-03-15 19:47:08 +01:00
Nick Wellnhofer
3494aa4fd5 save: Cast return code of xmlBufNodeDump
Avoid implicit sign change.
2024-03-15 19:47:08 +01:00
Nick Wellnhofer
1d392fabb9 save: Check for output buffer errors
Report more error conditions.
2024-03-15 19:47:08 +01:00
Nick Wellnhofer
e314109ad1 save: Don't write directly to internal buffer
Make sure that OOM errors are reported.
2024-02-16 16:14:05 +01:00
Nick Wellnhofer
0821efc8ee encoding: Check whether encoding handlers support input/output
The "HTML" encoding handler doesn't support input which could lead to a
wrong error report.
2024-01-02 19:48:23 +01:00
Nick Wellnhofer
bc1e030664 save: Improve error handling
Handle malloc failrue from xmlRaiseError.

Use xmlRaiseMemoryError.

Stop using xmlGenericError.

Remove argument from memory error handler.

Remove TODO macro.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
abd74186f9 html: Report malloc failures
Fix many places where malloc failures aren't reported.

Stop checking for ctxt->instate.
2023-12-11 22:13:06 +01:00
Nick Wellnhofer
9b5cce7a71 include: Remove more unnecessary includes 2023-09-21 01:50:53 +02:00
Nick Wellnhofer
699299cae3 globals: Stop including globals.h 2023-09-20 22:07:40 +02:00
Nick Wellnhofer
76d6b0d768 html: Don't escape ASCII chars in href attributes
In several cases, href attributes can contain ASCII characters which are
illegal in URIs. Escaping them often does more harm than good.

Fixes #321.
2022-11-20 21:16:03 +01:00
Nick Wellnhofer
ad338ca737 Remove explicit integer casts
Remove explicit integer casts as final operation

- in assignments
- when passing arguments
- when returning values

Remove casts

- to the same type
- from certain range-bound values

The main motivation is that these explicit casts don't change the result
of operations and only render UBSan's implicit-conversion checks
useless. Removing these casts allows UBSan to detect cases where
truncation or sign-changes occur unexpectedly.

Document some explicit casts as truncating and add a few missing ones.
2022-09-01 02:33:57 +02:00
Nick Wellnhofer
0f568c0b73 Consolidate private header files
Private functions were previously declared

- in header files in the root directory
- in public headers guarded with IN_LIBXML
- in libxml.h
- redundantly in source files that used them.

Consolidate all private header files in include/private.
2022-08-26 02:11:56 +02:00
David Kilzer
054e46b097 Restore behavior of htmlDocContentDumpFormatOutput()
Patch by J Pascoe of Apple.

* HTMLtree.c:
(htmlDocContentDumpFormatOutput):
- Prior to commit b79ab6e6d9, xmlDoc.type was set to
  XML_HTML_DOCUMENT_NODE before dumping the HTML output, then
  restored before returning.
2022-05-14 08:56:47 -07:00
David Kilzer
21561e833a Mark more static data as const
Similar to 8f5710379, mark more static data structures with
`const` keyword.

Also fix placement of `const` in encoding.c.

Original patch by Sarah Wilkin.
2022-04-07 12:01:23 -07:00
Nick Wellnhofer
776d15d383 Don't check for standard C89 headers
Don't check for

- ctype.h
- errno.h
- float.h
- limits.h
- math.h
- signal.h
- stdarg.h
- stdlib.h
- string.h
- time.h

Stop including non-standard headers

- malloc.h
- strings.h
2022-03-02 00:43:54 +01:00
Nick Wellnhofer
346c3a930c Remove elfgcchack.h
The same optimization can be enabled with -fno-semantic-interposition
since GCC 5. clang has always used this option by default.
2022-02-20 21:49:04 +01:00
Nick Wellnhofer
92d9ab4c28 Fix whitespace when serializing empty HTML documents
The old, non-recursive HTML serialization code would always terminate
the output with a newline. The new implementation omitted the newline
if the document node had no children. Readd the newline when
serializing empty documents.

Fixes #266.
2021-06-07 15:09:53 +02:00
Nick Wellnhofer
85b1792e37 Work around lxml API abuse
Make xmlNodeDumpOutput and htmlNodeDumpFormatOutput work with corrupted
parent pointers. This used to work with the old recursive code but the
non-recursive rewrite required parent pointers to be set correctly.

Unfortunately, lxml relies on the old behavior and passes subtrees with
a corrupted structure. Fall back to a recursive function call if an
invalid parent pointer is detected.

Fixes #255.
2021-05-21 12:19:25 +02:00
Nick Wellnhofer
e6495e4789 Remove unused encoding parameter of HTML output functions
The encoding string is unused. Encodings are set by way of the output
buffer.
2021-02-07 14:39:55 +01:00
Nick Wellnhofer
0b3c64d9f2 Handle dumps of corrupted documents more gracefully
Check parent pointers for NULL after the non-recursive rewrite of the
serialization code. This avoids segfaults with corrupted documents
which can apparently be seen with lxml, see issue #187.
2020-09-29 18:08:37 +02:00
Nick Wellnhofer
c1ba6f54d3 Revert "Do not URI escape in server side includes"
This reverts commit 960f0e2756.

This commit introduced

- an infinite loop, found by OSS-Fuzz, which could be easily fixed.
- an algorithm with quadratic runtime
- a security issue, see
  https://bugzilla.gnome.org/show_bug.cgi?id=769760

A better approach is to add an option not to escape URLs at all
which libxml2 should have possibly done in the first place.
2020-08-15 18:32:29 +02:00
Nick Wellnhofer
b79ab6e6d9 Make htmlNodeDumpFormatOutput non-recursive
Fixes stack overflow with deeply nested HTML documents.

Found by OSS-Fuzz.
2020-07-28 03:44:30 +02:00
Nick Wellnhofer
20c60886e4 Fix typos
Resolves #133.
2020-03-08 17:41:53 +01:00
Jared Yanovich
2a350ee9b4 Large batch of typo fixes
Closes #109.
2019-09-30 18:04:38 +02:00
Nick Wellnhofer
d459831c1b Fix HTML serialization with UTF-8 encoding
If the encoding is specified as UTF-8, make sure to use a NULL encoding
handler.
2018-10-13 16:47:13 +02:00
Nick Wellnhofer
ee501f5449 Stop using doc->charset outside parser code
doc->charset does not specify the in-memory encoding which is always
UTF-8.
2018-10-13 16:47:01 +02:00
Shaun McCance
7607d9dd45 Allow HTML serializer to output HTML5 DOCTYPE
For https://bugzilla.gnome.org/show_bug.cgi?id=747301

Use simple HTML5 DOCTYPE for about:legacy-compat

HTML5 uses a DOCTYPE without a PUBLIC or SYSTEM identifier. It looks
like this:

<!DOCTYPE html>

I can't use XSLT to output this, because to get a DOCTYPE I have to
provide a PUBLIC or SYSTEM identifier. Luckily, the standards folks
recognized this and provided this semantically equivalent form for the
HTML DOCTYPE:

<!DOCTYPE html SYSTEM "about:legacy-compat">

But people don't like seeing the "legacy" identifier in their output.
They'd rather see the shiny new DOCTYPE. Since we know that
about:legacy-compat is defined by the W3C to be semantically equivalent
to the sans-SYSTEM DOCTYPE, we could just special-case it in the HTML
serializer in libxml2. So if you set the SYSTEM identifier to
"about:legacy-compat", you get an HTML5 short-form DOCTYPE.
2015-04-03 22:52:36 +08:00
Romain Bondue
960f0e2756 Do not URI escape in server side includes 2013-04-23 20:44:55 +08:00
Daniel Veillard
f8e3db0445 Big space and tab cleanup
Remove all space before tabs and space and tabs at end of lines.
2012-09-11 13:26:36 +08:00
Daniel Veillard
7d4c529a33 Improve HTML escaping of attribute on output
Handle special cases of &{...} constructs as hinted in the spec
  http://www.w3.org/TR/html401/appendix/notes.html#h-B.7.1
and special values as comment <!-- ... --> used for server side includes
This is limited to attribute values in HTML content.
2012-09-05 12:11:43 +08:00
Daniel Veillard
7b9b07198f Convert the HTML tree module to the new buffers
The new input buffers induced a couple of changes, the others
are related to the switch to xmlBuf in saving routines.
2012-07-23 14:24:27 +08:00
Daniel Veillard
39d027cdb7 Fix html serialization error and htmlSetMetaEncoding()
For https://bugzilla.gnome.org/show_bug.cgi?id=630682
The python tests were reporting errors, some of it was due to
a small change in case encoding, but the main one was about
htmlSetMetaEncoding(doc, NULL) being broken by not removing
the associated meta tag anymore
2012-05-11 12:38:23 +08:00
Daniel Veillard
c62efc847c Add options to ignore the internal encoding
For both XML and HTML, the document can provide an encoding
either in XMLDecl in XML, or as a meta element in HTML head.
This adds options to ignore those encodings if the encoding
is known in advace for example if the content had been converted
before being passed to the parser.

* parser.c include/libxml/parser.h: add XML_PARSE_IGNORE_ENC option
  for XML parsing
* include/libxml/HTMLparser.h HTMLparser.c: adds the
  HTML_PARSE_IGNORE_ENC for HTML parsing
* HTMLtree.c: fix the handling of saving when an unknown encoding is
  defined in meta document header
* xmllint.c: add a --noenc option to activate the new parser options
2011-05-26 11:47:37 +08:00
Daniel Veillard
8d7c1b7ab2 582913 Fix htmlSetMetaEncoding() to be nicer
* HTMLtree.c: htmlSetMetaEncoding should not destroy existing meta
  encoding elements, plus it should not change things at all if the
  encoding is the same. Also fixed htmlSaveFileFormat() to ask for
  change if outputing to UTF-8.
2009-08-12 23:03:23 +02:00
Daniel Veillard
74eb54b5b7 575875 don't output charset=html
* HTMLtree.c: don't output charset=html in htmlSetMetaEncoding()
  as this is clearly a libxml2 only thingused for import only
2009-08-12 15:59:01 +02:00
Daniel Veillard
da3fee406d Borland C fix from Moritz Both regenerate, workaround a problem for buffer
* trionan.c: Borland C fix from Moritz Both
* testapi.c: regenerate, workaround a problem for buffer testing
* xmlIO.c HTMLtree.c: new internal entry point to hide even better
  xmlAllocOutputBufferInternal
* tree.c: harden the code around buffer allocation schemes
* parser.c: restore the warning when namespace names are not absolute
  URIs
* runxmlconf.c: continue regression tests if we get the expected
  number of errors
* Makefile.am: run the python tests on make check
* xmlsave.c: handle the HTML documents and trees
* python/libxml.c: convert python serialization to the xmlSave APIs
  and avoid some horrible hacks
Daniel

svn path=/trunk/; revision=3790
2008-09-01 13:08:57 +00:00
Daniel Veillard
fcd02adb71 htmlNodeDumpFormatOutput didn't handle XML_ATTRIBUTE_NODe fixes bug
* HTMLtree.c: htmlNodeDumpFormatOutput didn't handle XML_ATTRIBUTE_NODe
  fixes bug #438390
Daniel

svn path=/trunk/; revision=3631
2007-06-12 09:49:40 +00:00
Rob Richards
417b74d0b1 Add linefeeds to error messages allowing for consistant handling.
* HTMLtree.c xmlsave.c: Add linefeeds to error messages allowing
  for consistant handling.
2006-08-15 23:14:24 +00:00
Rob Richards
77b92ff6a8 fix bug #322136 in xmlNodeBufGetContent when entity ref is a child of an
* tree.c: fix bug #322136 in xmlNodeBufGetContent when entity ref is
  a child of an element (fix by Oleksandr Kononenko).
* HTMLtree.c include/libxml/HTMLtree.h: Add htmlDocDumpMemoryFormat.
2005-12-20 15:55:14 +00:00
Daniel Veillard
b8c8016044 fixed bug #310333 with a patch close to the provided patch for HTML UTF-8
* HTMLtree.c: fixed bug #310333 with a patch close to the provided
  patch for HTML UTF-8 serialization
* result/HTML/script2.html: this changed the output of that test
Daniel
2005-08-08 13:46:45 +00:00
Daniel Veillard
5d4644ef6e revamped the elfgcchack.h format to cope with gcc4 change of aliasing
* doc/apibuild.py doc/elfgcchack.xsl: revamped the elfgcchack.h
  format to cope with gcc4 change of aliasing allowed scopes, had
  to add extra informations to doc/libxml2-api.xml to separate
  the header from the c module source.
* *.c: updated all c library files to add a #define bottom_xxx
  and reimport elfgcchack.h thereafter, and a bit of cleanups.
* doc//* testapi.c: regenerated when rebuilding the API
Daniel
2005-04-01 13:11:58 +00:00
Daniel Veillard
aa9a983dbd fixing bug 168196, <a name=""> must be URI escaped too Daniel
* HTMLtree.c: fixing bug 168196, <a name=""> must be URI escaped too
Daniel
2005-03-29 20:30:17 +00:00
Daniel Veillard
d5cc0f7f51 augmented types supported a number of new bug fixes and documentation
* gentest.py testapi.c: augmented types supported
* HTMLtree.c tree.c xmlreader.c xmlwriter.c: a number of new
  bug fixes and documentation updates.
Daniel
2004-11-06 19:24:28 +00:00
Daniel Veillard
ce244ad595 fixed the way the generator works, extended the testing, especially with
* gentest.py testapi.c: fixed the way the generator works,
  extended the testing, especially with more real trees and nodes.
* HTMLtree.c tree.c valid.c xinclude.c xmlIO.c xmlsave.c: a bunch
  of real problems found and fixed.
* entities.c: fix error reporting to go through the new handlers
Daniel
2004-11-05 10:03:46 +00:00