1
0
mirror of https://gitlab.gnome.org/GNOME/libxml2.git synced 2025-02-05 05:47:00 +03:00

6305 Commits

Author SHA1 Message Date
Nick Wellnhofer
d025cfbb4b parser: Always copy content from entity to target.
Make sure that references from IDs are updated.

Note that if there are IDs with the same value in a document, the last
one will now be returned. IDs should be unique, but maybe this should be
addressed.
2023-12-29 01:22:11 +01:00
Nick Wellnhofer
6337ff793b parser: Simplify control flow in xmlParseReference 2023-12-29 01:21:45 +01:00
Nick Wellnhofer
579186f2e0 parser: Remove xmlSetEntityReferenceFunc feature
This has been deprecated for a long time.
2023-12-29 01:20:51 +01:00
Nick Wellnhofer
b848338c5a parser: More refactoring of entity loading
This sets input->entity also for general entities.
2023-12-29 01:20:08 +01:00
Nick Wellnhofer
4ecc85d2cb parser: Push general entity input streams on the stack
This allows the error handler to give more context.
2023-12-29 01:20:08 +01:00
Nick Wellnhofer
a5dcf0f422 parser: Mark more parser context members as unused 2023-12-29 01:20:08 +01:00
Nick Wellnhofer
6a9a88a17f parser: Move progressive flag into input struct 2023-12-29 01:20:08 +01:00
Nick Wellnhofer
4f14fe9cf7 parser: Remove remaining ctxt->instate checks
Now ctxt->instate is only used for push parser states.
2023-12-29 01:20:08 +01:00
Nick Wellnhofer
d944a41515 parser: Fix in-parameter-entity and in-external-dtd checks
Use in ctxt->input->entity instead of ctxt->inputNr to determine whether
we are inside a parameter entity.

Stop using ctxt->external to check whether we're in an external DTD.
This is signaled by ctxt->inSubset == 2.
2023-12-29 01:19:56 +01:00
Nick Wellnhofer
477a7ed82c html: Abort earlier on fatal errors 2023-12-28 19:43:48 +01:00
Nick Wellnhofer
5f319304c8 SAX2: Fix error code
Today I learned that the TSCII character encoding [1] can blow up the
size of text 12 times when converted to UTF-8:

    $ printf '\x82' |iconv -f TSCII -t UTF-8 |hexdump -C
    00000000  e0 ae b8 e0 af 8d e0 ae  b0 e0 af 80
    0000000c

[1] https://en.wikipedia.org/wiki/Tamil_Script_Code_for_Information_Interchange
2023-12-28 19:43:48 +01:00
Nick Wellnhofer
ab63197149 uri: Keep fragment intact when resolving filesystem paths 2023-12-28 17:07:03 +01:00
Nick Wellnhofer
b8313b589f xpath: Rewrite substring-before and substring-after
Don't use buffers. Check malloc failures.
2023-12-28 16:47:45 +01:00
Nick Wellnhofer
3874e5d0ea tests: Remove unneeded error formatting code 2023-12-28 16:47:45 +01:00
Nick Wellnhofer
2a2fbe1e5b xinclude: Only set xml:base if necessary 2023-12-28 16:47:45 +01:00
Nick Wellnhofer
8a685a3dfc xinclude: Allow empty nodesets
There's no reason to treat an empty nodeset as error.
2023-12-28 16:47:45 +01:00
Nick Wellnhofer
f3fa34dcad parser: Fix general entity parsing
Clear namespace database.

Ignore non-fatal errors.
2023-12-28 16:47:41 +01:00
Nick Wellnhofer
ecfbcc8a52 parser: Rework general entity parsing
Don't create a new parser context but reuse the existing one.

This exposes bug #601 in a more obvious way.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
c2ef78f76e io: Fix close error handling
There's no way to report error codes from closing an output buffer yet.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
6d27c549e1 io: Fix read/write error handling
Handle short reads/writes from fd. Fix stdio error handling.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
0bef93bf24 io: More refactoring and unescaping fixes
Merge Windows wrappers into relevant functions.

Remove more unnecessary unescaping.

Merge *OpenW into *Open functions.

Use unbuffered IO for output.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
331dcd6200 error: Reenable full error reports to default handler
This should make console output include some information about nodes
again. Note that this extra information must be disabled if a custom
generic error handler was set. Many downstream test suites rely on this
behavior.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
c1bddd4c26 parser: Mark 'length' member of xmlParserInput as unused 2023-12-25 23:38:40 +01:00
Nick Wellnhofer
955c177f69 parser: Stop using 'directory' struct member
This was only used as a pointless fallback for URI resolution.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
60841beba6 parser: Make XML_IO_NETWORK_ATTEMPT behave as before
Always reported to generic error, not to parser context for backward
compatibility. Several downstream test suites rely on this behavior.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
a26934105e io: Move some code from xmlIO.c to parserInternals.c
Move everything related to parser contexts to parserInternals.c.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
8ab1b122c4 Fix filename and URI handling
Many strings are passed to the library that could be either URIs or
filesystem paths. We now assume that strings are a URI if they contain
the substring "://". This means that they have a scheme and an
authority. Otherwise, URI resolution wouldn't make much sense.

Fix xmlBuildURI to work with filesystem paths. If the base URI doesn't
contain "://" it is treated as filename. The resolved URI is unescaped,
appended and the result is normalized. Rewrite xmlNormalizePath to
handle Windows quirks.

All special handling for Windows paths is removed in xmlCanonicPath.
If the path looks like an URI, only escape characters allowed in Legacy
Extended IRIs.

Make xmlPathToURI only call xmlCanonicPath. Theh additional round-trip
through URI parser and serializer seems useless.

Add a helper function xmlConvertUriToPath in xmlIO.c which checks for
file URIs and unescapes them.

Always process strings with xmlCanonicPath in xmlLoadExternalEntity.
This should be harmless now.

Should help with #334, #387, #611.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
28913232f6 uri: Clean up special parsing modes
Add function to handle unreserved check. Give flags meaningful names.
Add support to allow ucschars from Legacy Extended IRIs.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
6e3a2ac660 xinclude: Rework xml:base fixup
The xml:base fixup was broken in more complex cases.

Also avoid parsing and building the included URI multiple times.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
35a4bc50d0 xinclude: Report to xmlGenericError 2023-12-25 23:38:40 +01:00
Nick Wellnhofer
e8de3401b3 parser: Also set document properties when push parsing
Add new function xmlFinishDocument which invokes the endDocument SAX
handler and sets the document's properties.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
c73de050f5 include: Move non-generated parts from xmlversion.h.in
xmlexports.h originally only included symbol visibility macros but it's
a good place for other macros as well.
2023-12-25 23:38:40 +01:00
Nick Wellnhofer
a18d94168b Update NEWS 2023-12-24 22:11:49 +01:00
Nick Wellnhofer
229e5ff7f9 io: Remove support for HTTP POST
This feature is unlikely to be used these days.
2023-12-24 22:11:49 +01:00
Nick Wellnhofer
9c2c87b55d dict: Move local RNG state to global state
Don't use TLS variables directly.
2023-12-24 16:24:34 +01:00
Nick Wellnhofer
2e9e758d1e dict: Get random seed from system PRNG 2023-12-24 16:24:34 +01:00
Nick Wellnhofer
c49572e57d malloc-fail: Fix erroneous report in xmlStringGetNodeList
The parser can produce invalid attribute content in recovery mode.
Unless this is fixed, xmlStringGetNodeList should ignore such errors
silently.
2023-12-23 15:10:15 +01:00
Nick Wellnhofer
c8f1f4a280 doc: Improve documentation of error handlers 2023-12-21 17:36:17 +01:00
Nick Wellnhofer
882b3a8075 runtest: Fix return code in rngTest 2023-12-21 15:34:24 +01:00
Nick Wellnhofer
f0df3e6d00 tests: Try to fix RelaxNG test cases
These were added recently in ea695ac0 and 8074b881 but were a total mess
of symbolic links and apparently mixed up files.

Symbolic links don't work on Windows.

Try to salvage one of the tests.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
8cd563174a html: Don't close fd in htmlCtxtReadFd
Long-standing bug. The XML fix from 2003 was never ported to the HTML
parser. htmlReadFd was fixed with fe6890e2.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
0a658c0f0a io: Don't use "-" to read from stdin
To implement this feature on such a low level is a disaster waiting to
happen. Remove these checks from the IO code and move them to xmllint.

Note that the serialization API will still treat "-" as stdout.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
c9a46a91fe io: Rework initialization 2023-12-21 15:02:24 +01:00
Nick Wellnhofer
b75fc1ab33 io: Rearrange code 2023-12-21 15:02:24 +01:00
Nick Wellnhofer
130436917c parser: Rename xmlErrParser to xmlCtxtErr 2023-12-21 15:02:24 +01:00
Nick Wellnhofer
8d0aaf4b95 parser: Remove xmlErrEncoding
Use xmlFatalErr or xmlCtxtErrIO.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
9fbe46ba17 io: Consolidate error messages 2023-12-21 15:02:24 +01:00
Nick Wellnhofer
23345a1cb1 io: Report IO errors through xmlCtxtErrIO
This is also a new public API function to be used in external entity
loaders.
2023-12-21 15:02:24 +01:00
Nick Wellnhofer
e62b0dbde5 xzlib: Fix harmless unsigned integer overflow 2023-12-21 15:02:24 +01:00
Nick Wellnhofer
1ef3566362 io: Always use unbuffered input
Before, we often used unbuffered input via the lzma or gzip handlers,
more or less inadvertently.

Change the default file handlers from buffered (stdc FILE) to unbuffered
(POSIX fds).
2023-12-21 15:02:24 +01:00