1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2024-10-26 12:25:09 +03:00
Commit Graph

1260 Commits

Author SHA1 Message Date
Nick Wellnhofer
0bc4608c50 html: Use hash table to check for duplicate attributes 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
c32397d51f html: Improve character class macros 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
c34d0ae9cc html: Deprecate htmlIsBooleanAttr 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
6040785ac4 html: Deprecate AutoClose API 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
188cad68a4 html: Remove obsolete content model 2024-10-06 20:04:00 +02:00
Nick Wellnhofer
462bf0b7a5 html: Rework options
Introduce htmlCtxtSetOptions, see similar changes made to XML parser.

Add HTML_PARSE_HUGE alias. Support HTML_PARSE_BIG_LINES.
2024-10-06 20:04:00 +02:00
Nick Wellnhofer
e062a4a9b3 html: Add HTML5 parser option
This option passes tokenizer output directly to the SAX callbacks,
making it possible to test the tokenizer against the html5lib test
suite.

This will produce unbalanced calls to the startElement and endElement
callbacks, but it's the only way to support a SAX like interface for
HTML5. It can be used for filtering or rewriting HTML5, for example.

A HTML5 tree builder could then be implemented on top of the SAX
callbacks.
2024-10-06 18:13:05 +02:00
Nick Wellnhofer
f9ed30e972 html: HTML5 character data states 2024-10-06 18:13:05 +02:00
Nick Wellnhofer
b1c5aa6544 xpath: Deprecate xmlXPathNAN and xmlXPath*INF
Users should simply use the C99 macros.
2024-09-19 12:50:59 +02:00
Nick Wellnhofer
c46b89e243 xpath: Deprecate xmlXPathEvalExpr
Also check the argument instead of crashing if there's no context.
2024-09-13 21:06:36 +02:00
Nick Wellnhofer
de10d4cd5f include: Check whether _MSC_VER is defined
Should fix #795.
2024-09-04 16:32:22 +02:00
makise-homura
a3043b478f threads: define _WIN32_WINNT as 0x0600 to use InitOnceExecuteOnce() 2024-08-16 22:26:07 +03:00
Nick Wellnhofer
a530ff125d io: Always consume encoding handler when creating output buffers
Also free encoding handler in error case.

Remove xmlAllocOutputBufferInternal which was identical to
xmlAllocOutputBuffer.
2024-07-29 14:25:39 +02:00
Nick Wellnhofer
aa6ca0b1d3 module: Deprecate module API
This was only used by libxslt which switched to a private
implementation.
2024-07-23 19:57:32 +02:00
Nick Wellnhofer
6a3c0b0d93 parser: Increase XML_MAX_DICTIONARY_LIMIT
This limit is somewhat arbitrary and can be reached when fuzzing
documents up to 1 MB.

Increase limit to 100 MB and disable limit if XML_PARSE_HUGE is set.
2024-07-22 12:53:00 +02:00
Nick Wellnhofer
4e93425a7f threads: Prefer Win32 over pthreads 2024-07-16 20:03:01 +02:00
Nick Wellnhofer
769e5a4a42 threads: Allocate global RMutexes statically
Avoid memory allocations during initialization.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
5d36664fc9 memory: Deprecate xmlGcMemSetup 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
79e119954c error: Make xmlLastError const 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
eb66d03ef7 io: Deprecate a few functions 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
a6f54f055b io: Fine-tune initial IO buffer size 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
34c9108f15 encoding: Add sizeOut argument to xmlCharEncInput
When push parsing, we want to convert as much of the input as possible.
When pull parsing memory buffers, we want to convert data chunk by chunk
to save memory.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
8e871a31f8 buf: Rework xmlBuffer code
Port most changes made to the xmlBuf code in f3807d76, except that
"size" still includes the terminating NULL byte.

Make xmlSetBufferAllocationScheme, xmlBufferAllocScheme and
xmlDefaultBufferSize no-ops.

Deprecate a few functions.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
a221cd7849 buf: Rework xmlBuf code
Always use what the old implementation called the "IO" allocation
scheme, allowing to move the content pointer past the initial
allocation. This is inexpensive and allows efficient shrinking.

Optimize xmlBufGrow, reusing shrunken memory as much as possible.

Simplify xmlBufAdd.

Make xmlBufBackToBuffer return an error on overflow.

Make "size" exclude the terminating NULL byte.

Always provide an initial size.

Reintroduce static buffers.

Remove xmlBufResize and several other functions.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
1cfc5b8089 entities: Rework serialization of numeric character references 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
8d1606265d entities: Rework text escaping 2024-07-16 17:42:10 +02:00
Nick Wellnhofer
e488695b1a save: Deprecate xmlSaveSet*Escape
xmlSaveSetAttrEscape never had an effect.
2024-07-16 17:42:10 +02:00
Nick Wellnhofer
e0494c0d43 io: Add some deprecation warnings 2024-07-15 16:33:38 +02:00
Nick Wellnhofer
728869809e error: Add helper functions to print errors and abort 2024-07-15 16:33:38 +02:00
Nick Wellnhofer
69f12d6d47 encoding: Deprecate xmlByteConsumed
This was only used by Chromium/WebKit to detect whether xmlParseContent
really succeeded. It's a horrible, overcomplicated hack.

See 8c5848bd and #767.
2024-07-13 15:42:02 +02:00
Nick Wellnhofer
8af55c8d20 parser: Rename new input API functions
These weren't made public yet.
2024-07-11 01:33:29 +02:00
Nick Wellnhofer
d74ca59491 parser: Rename internal xmlNewInput functions 2024-07-11 01:31:50 +02:00
Nick Wellnhofer
4f329dc524 parser: Implement xmlCtxtParseContent
This implements xmlCtxtParseContent, a better alternative to
xmlParseInNodeContext or xmlParseBalancedChunkMemory. It accepts a
parser context and a parser input, making it a lot more versatile.

xmlParseInNodeContext is now implemented in terms of
xmlCtxtParseContent. This makes sure that xmlParseInNodeContext never
modifies the target document, improving thread safety.
xmlParseInNodeContext is also more lenient now with regard to undeclared
entities.

Fixes #727.
2024-07-11 01:26:32 +02:00
Nick Wellnhofer
82e0455cf6 Undeprecate some symbols for now
- xmlKeepBlanksDefault is needed as a work-around for
  xmlParseBalancedChunk, see issue #727.
- ctxt->options already has an accessor and will be deprecated
  later.
- input->cur, input->base, input->end: See #762.
2024-07-06 20:19:51 +02:00
Nick Wellnhofer
38195cf596 parser: Don't produce names with invalid UTF-8 in recovery mode 2024-07-06 15:33:06 +02:00
Nick Wellnhofer
205e56dafe parser: Undeprecate ctxt->directory 2024-07-02 22:32:43 +02:00
Nick Wellnhofer
c127c89f98 catalog: Deprecate xmlCatalogSetDefaultPrefer 2024-07-02 22:06:53 +02:00
Nick Wellnhofer
606f410891 parser: Allow to disable catalogs with parser options
Implement XML_PARSE_NO_SYS_CATALOG and XML_PARSE_NO_CATALOG_PI.

Fixes #735.
2024-07-02 22:06:53 +02:00
Nick Wellnhofer
35146ff31c save: Implement xmlSaveSetIndentString
Allow to set indent string without using global xmlTreeIndentString.

See #736.
2024-07-02 20:03:23 +02:00
Nick Wellnhofer
7cc619d568 save: Implement save options for indenting
Implement XML_SAVE_NO_INDENT to disable and XML_SAVE_INDENT to enable
indenting regardless of the global xmlIndentTreeOutput.

Implement XML_SAVE_EMPTY to enable empty tags regardless of the global
xmlSaveNoEmptyTags.

See #736.
2024-07-02 20:03:23 +02:00
Nick Wellnhofer
30ef77554b parser: Don't use deprecated xmlCopyChar 2024-07-02 13:34:11 +02:00
Nick Wellnhofer
1167c3340e encoding: Don't include iconv.h from libxml/encoding.h 2024-07-01 18:05:40 +02:00
Nick Wellnhofer
95d3633350 encoding: Rework conversion error codes
This should match the old code more closely. Remove XML_ERR_PARTIAL.

It's unlikely that anyone is using these codes already.
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
282ec1d548 encoding: Rework xmlCharEncodingHandler layout
Reuse some of the old members.

The "input" and "output" function pointers are actually of type
xmlCharEncConvFunc, accepting an additional argument. For default
handlers, this argument is unused, so this should work with most ABIs.
For iconv handlers, these function pointers used to be NULL but now
point to a function which requires the extra argument.

"iconv_in" and "iconv_out" are made void pointers. "uconv_in" and
"uconv_out" are renamed and made void pointers. This is unlikely to
cause issues.

We now expect that the built-in conversion functions correctly report
XML_ENC_ERR_SPACE. For UTF8ToHtml and the ISO-8859-X code, this will be
done in the following commits.
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
501e5d195d encoding: Stop using XML_ENC_ERR_PARTIAL 2024-07-01 18:05:40 +02:00
Nick Wellnhofer
221df37529 parser: Support custom charset conversion implementations
Implement xmlCtxtSetCharEncConvImpl. I agree that the name is terrible.
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
c59c24494d encoding: Support custom implementations 2024-07-01 18:05:40 +02:00
Nick Wellnhofer
1e3da9f4d4 encoding: Start with callbacks 2024-07-01 18:05:40 +02:00
Nick Wellnhofer
6d8427dc97 encoding: Rework encoding lookup
Add missing xmlCharEncoding enum values.

Simplify and speed up encoding lookup by using a table mapping names to
xmlCharEncoding enums and binary search. Rearrange the default handler
table to match the enum layout.

For some encodings we now only lookup the provided or most canonical
name instead of trying several names, expecting that iconv or ICU handle
aliases:

- IBM037 (EBCDIC)
- UCS-2
- UCS-4
- Shift_JIS
2024-07-01 18:05:40 +02:00
Nick Wellnhofer
16e7ecd478 xinclude: Check URI length
Don't report long URIs as OOM errors.
2024-07-01 18:03:06 +02:00