1
0
mirror of https://github.com/samba-team/samba.git synced 2025-01-11 05:18:09 +03:00
Commit Graph

1 Commits

Author SHA1 Message Date
Douglas Bagnall
fccb105e07 pytests: check that we don't have bad format characters
Unicode has format control characters that affect the appearance —
including the apparent order — of other characters. Some of these,
like the bidi controls (for mixing left-to-right scripts with
right-to-left scripts) can be used make text that means one thing look
very much like it means another thing.

The potential for duplicity using these characters has recently been
publicised under the name “Trojan Source”, and CVE-2021-42694. A
specific example, as it affects the Rust language is CVE-2021-42574.

We don't have many format control characters in our code — in fact,
just the non-breaking space (\u200b) and the redundant BOM thing
(\ufeff), and this test aims to ensure we keep it that way.

The test uses a series of allow-lists and deny-lists to check most
text files for unknown format control characters. The filtering is
fairly conservative but not exhaustive. For example, XML and text
files are checked, but UTF-16 files are not.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Ralph Boehme <slow@samba.org>
2021-11-17 04:36:36 +00:00