samba-mirror

sin/samba-mirror

Fork 0

mirror of https://github.com/samba-team/samba.git synced 2024-12-23 17:34:34 +03:00

Commit Graph

Author	SHA1	Message	Date
Douglas Bagnall	dab828f63c	pytest/source_char: check for mixed direction text As pointed out in https://lwn.net/Articles/875964, forbidding bidi marker characters is not always going to be enough to avoid right-to-left vs left-to-right confusion. Consider this: $ python -c's = "b = x # 2 * n * m"; print(s); print(s.replace("x", "א").replace("n", "ח"))' b = x # 2 * n * m b = א # 2 * ח * m Those two lines are semantically the same, with the Hebrew letters "א" and "ח" replacing "x" and "n". But they look like they mean different things. It is not enough to say we only allow these scripts (or indeed non-ascii) in strings and comments, as demonstrated in this example: $ python -c's = "b = \"x#\" # n"; print(s); print(s.replace("x", "א").replace("n", "ח"))' b = "x#" # n b = "א#" # ח where the second line is visually disordered but looks valid. Any series of neutral characters between teo RTL characters will be reversed (and possibly mirrored). In practice this affects one file, which is a text file for testing unicode normalisation. I think, for the reasons shown above, we are unlikely to see legitimate RTL code outside perhaps of documentation files — but if we do, we can add those files to the allow-list. Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> Reviewed-by: Andrew Bartlett <abartlet@samba.org> Autobuild-User(master): Andrew Bartlett <abartlet@samba.org> Autobuild-Date(master): Fri Dec 3 18:53:43 UTC 2021 on sn-devel-184	2021-12-03 18:53:43 +00:00

Author

SHA1

Message

Date

Douglas Bagnall

dab828f63c

pytest/source_char: check for mixed direction text

As pointed out in https://lwn.net/Articles/875964, forbidding bidi
marker characters is not always going to be enough to avoid
right-to-left vs left-to-right confusion. Consider this:

$ python -c's = "b = x  # 2 * n * m"; print(s); print(s.replace("x", "א").replace("n", "ח"))'

b = x  # 2 * n * m
b = א  # 2 * ח * m

Those two lines are semantically the same, with the Hebrew letters
"א" and "ח" replacing "x" and "n". But they look like they mean
different things.

It is not enough to say we only allow these scripts (or indeed
non-ascii) in strings and comments, as demonstrated in this example:

$ python -c's = "b = \"x#\"  #  n"; print(s); print(s.replace("x", "א").replace("n", "ח"))'

b = "x#"  #  n
b = "א#"  #  ח

where the second line is visually disordered but looks valid. Any series
of neutral characters between teo RTL characters will be reversed (and
possibly mirrored).

In practice this affects one file, which is a text file for testing
unicode normalisation.

I think, for the reasons shown above, we are unlikely to see legitimate
RTL code outside perhaps of documentation files — but if we do, we can
add those files to the allow-list.

Signed-off-by: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Reviewed-by: Andrew Bartlett <abartlet@samba.org>

Autobuild-User(master): Andrew Bartlett <abartlet@samba.org>
Autobuild-Date(master): Fri Dec  3 18:53:43 UTC 2021 on sn-devel-184

2021-12-03 18:53:43 +00:00

1 Commits