IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Unicode has format control characters that affect the appearance —
including the apparent order — of other characters. Some of these,
like the bidi controls (for mixing left-to-right scripts with
right-to-left scripts) can be used make text that means one thing look
very much like it means another thing.
The potential for duplicity using these characters has recently been
publicised under the name “Trojan Source”, and CVE-2021-42694. A
specific example, as it affects the Rust language is CVE-2021-42574.
We don't have many format control characters in our code — in fact,
just the non-breaking space (\u200b) and the redundant BOM thing
(\ufeff), and this test aims to ensure we keep it that way.
The test uses a series of allow-lists and deny-lists to check most
text files for unknown format control characters. The filtering is
fairly conservative but not exhaustive. For example, XML and text
files are checked, but UTF-16 files are not.
Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Ralph Boehme <slow@samba.org>