Improve language and add region controls

This commit is contained in:
Laurenz 2022-04-13 15:30:10 +02:00
parent d025854457
commit b274155c6d
11 changed files with 279 additions and 80 deletions

229
NOTICE
View File

@ -1,5 +1,67 @@
Licenses for third party components used by this project can be found below.
================================================================================
The MIT License applies to:
* The default color set defined in `src/geom/color.rs` which is adapted from
the colors.css project
(https://clrs.cc/)
The MIT License (MIT)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
================================================================================
================================================================================
Alpha multiplication and source-over blending in `src/export/render.rs` are
ported from Skia code which can be found here:
https://skia.googlesource.com/skia/+/refs/heads/main/include/core/SkColorPriv.h
Copyright (c) 2011 Google Inc. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
================================================================================
================================================================================
The SIL Open Font License Version 1.1 applies to:
@ -103,34 +165,6 @@ FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM
OTHER DEALINGS IN THE FONT SOFTWARE.
================================================================================
================================================================================
The MIT License applies to:
* The default color set defined in `src/geom/color.rs` which is adapted from
the colors.css project
(https://clrs.cc/)
The MIT License (MIT)
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
================================================================================
================================================================================
The Apache License Version 2.0 applies to:
@ -315,6 +349,111 @@ The Apache License Version 2.0 applies to:
END OF TERMS AND CONDITIONS
================================================================================
================================================================================
The Ubuntu Font License Version 1.0 applies to:
* Ubuntu fonts in fonts/Ubuntu*.ttf
(https://design.ubuntu.com/font/)
-------------------------------
UBUNTU FONT LICENCE Version 1.0
-------------------------------
PREAMBLE
This licence allows the licensed fonts to be used, studied, modified and
redistributed freely. The fonts, including any derivative works, can be
bundled, embedded, and redistributed provided the terms of this licence
are met. The fonts and derivatives, however, cannot be released under
any other licence. The requirement for fonts to remain under this
licence does not require any document created using the fonts or their
derivatives to be published under this licence, as long as the primary
purpose of the document is not to be a vehicle for the distribution of
the fonts.
DEFINITIONS
"Font Software" refers to the set of files released by the Copyright
Holder(s) under this licence and clearly marked as such. This may
include source files, build scripts and documentation.
"Original Version" refers to the collection of Font Software components
as received under this licence.
"Modified Version" refers to any derivative made by adding to, deleting,
or substituting -- in part or in whole -- any of the components of the
Original Version, by changing formats or by porting the Font Software to
a new environment.
"Copyright Holder(s)" refers to all individuals and companies who have a
copyright ownership of the Font Software.
"Substantially Changed" refers to Modified Versions which can be easily
identified as dissimilar to the Font Software by users of the Font
Software comparing the Original Version with the Modified Version.
To "Propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification and with or without charging
a redistribution fee), making available to the public, and in some
countries other activities as well.
PERMISSION & CONDITIONS
This licence does not grant any rights under trademark law and all such
rights are reserved.
Permission is hereby granted, free of charge, to any person obtaining a
copy of the Font Software, to propagate the Font Software, subject to
the below conditions:
1) Each copy of the Font Software must contain the above copyright
notice and this licence. These can be included either as stand-alone
text files, human-readable headers or in the appropriate machine-
readable metadata fields within text or binary files as long as those
fields can be easily viewed by the user.
2) The font name complies with the following:
(a) The Original Version must retain its name, unmodified.
(b) Modified Versions which are Substantially Changed must be renamed to
avoid use of the name of the Original Version or similar names entirely.
(c) Modified Versions which are not Substantially Changed must be
renamed to both (i) retain the name of the Original Version and (ii) add
additional naming elements to distinguish the Modified Version from the
Original Version. The name of such Modified Versions must be the name of
the Original Version, with "derivative X" where X represents the name of
the new work, appended to that name.
3) The name(s) of the Copyright Holder(s) and any contributor to the
Font Software shall not be used to promote, endorse or advertise any
Modified Version, except (i) as required by this licence, (ii) to
acknowledge the contribution(s) of the Copyright Holder(s) or (iii) with
their explicit written permission.
4) The Font Software, modified or unmodified, in part or in whole, must
be distributed entirely under this licence, and must not be distributed
under any other licence. The requirement for fonts to remain under this
licence does not affect any document created using the Font Software,
except any version of the Font Software extracted from a document
created using the Font Software may only be distributed under this
licence.
TERMINATION
This licence becomes null and void if any of the above conditions are
not met.
DISCLAIMER
THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT OF
COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM OTHER
DEALINGS IN THE FONT SOFTWARE.
================================================================================
================================================================================
The GUST Font License Version 1.0 applies to:
@ -764,37 +903,3 @@ licenses.
Creative Commons may be contacted at creativecommons.org.
================================================================================
================================================================================
Alpha multiplication and source-over blending in `src/export/render.rs` are
ported from Skia code which can be found here:
https://skia.googlesource.com/skia/+/refs/heads/main/include/core/SkColorPriv.h
Copyright (c) 2011 Google Inc. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in
the documentation and/or other materials provided with the
distribution.
* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
================================================================================

BIN
fonts/Ubuntu-Regular.ttf Normal file

Binary file not shown.

View File

@ -1,24 +1,30 @@
use crate::eval::Value;
use crate::geom::Dir;
/// A natural language.
/// A code for a natural language.
#[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
pub struct Lang([u8; 2]);
pub struct Lang([u8; 3], u8);
impl Lang {
/// The code for the english language.
pub const ENGLISH: Self = Self(*b"en");
pub const ENGLISH: Self = Self(*b"en ", 2);
/// Construct a language from a two-byte ISO 639-1 code.
/// Construct a language from a two- or three-byte ISO 639-1/2/3 code.
pub fn from_str(iso: &str) -> Option<Self> {
let mut bytes: [u8; 2] = iso.as_bytes().try_into().ok()?;
bytes.make_ascii_lowercase();
Some(Self(bytes))
let len = iso.len();
if matches!(len, 2 ..= 3) && iso.is_ascii() {
let mut bytes = [b' '; 3];
bytes[.. len].copy_from_slice(iso.as_bytes());
bytes.make_ascii_lowercase();
Some(Self(bytes, len as u8))
} else {
None
}
}
/// Return the language code as a string slice.
/// Return the language code as an all lowercase string slice.
pub fn as_str(&self) -> &str {
std::str::from_utf8(&self.0).unwrap_or_default()
std::str::from_utf8(&self.0[.. usize::from(self.1)]).unwrap_or_default()
}
/// The default direction for the language.
@ -35,5 +41,34 @@ castable! {
Lang,
Expected: "string",
Value::Str(string) => Self::from_str(&string)
.ok_or("expected two letter language code")?,
.ok_or("expected two or three letter language code (ISO 639-1/2/3)")?,
}
/// A code for a region somewhere in the world.
#[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)]
pub struct Region([u8; 2]);
impl Region {
/// Construct a region from its two-byte ISO 3166-1 alpha-2 code.
pub fn from_str(iso: &str) -> Option<Self> {
if iso.is_ascii() {
let mut bytes: [u8; 2] = iso.as_bytes().try_into().ok()?;
bytes.make_ascii_uppercase();
Some(Self(bytes))
} else {
None
}
}
/// Return the region code as an all uppercase string slice.
pub fn as_str(&self) -> &str {
std::str::from_utf8(&self.0).unwrap_or_default()
}
}
castable! {
Region,
Expected: "string",
Value::Str(string) => Self::from_str(&string)
.ok_or("expected two letter region code (ISO 3166-1 alpha-2)")?,
}

View File

@ -65,8 +65,10 @@ impl TextNode {
/// The bottom end of the text bounding box.
pub const BOTTOM_EDGE: TextEdge = TextEdge::Metric(VerticalFontMetric::Baseline);
/// An ISO 639-1 language code.
/// An ISO 639-1/2/3 language code.
pub const LANG: Lang = Lang::ENGLISH;
/// An ISO 3166-1 alpha-2 region code.
pub const REGION: Option<Region> = None;
/// The direction for text and inline objects. When `auto`, the direction is
/// automatically inferred from the language.
#[property(resolve)]

View File

@ -406,9 +406,9 @@ fn collect<'a>(
ParChild::Quote(double) => {
let prev = full.len();
if styles.get(TextNode::SMART_QUOTES) {
// TODO: Also get region.
let lang = styles.get(TextNode::LANG);
let quotes = Quotes::from_lang(lang.as_str(), "");
let region = styles.get(TextNode::REGION);
let quotes = Quotes::from_lang(lang, region);
let peeked = iter.peek().and_then(|(child, _)| match child {
ParChild::Text(text) => text.chars().next(),
ParChild::Quote(_) => Some('"'),

View File

@ -1,3 +1,4 @@
use super::{Lang, Region};
use crate::parse::is_newline;
/// State machine for smart quote subtitution.
@ -91,9 +92,10 @@ impl<'s> Quotes<'s> {
/// Norwegian.
///
/// For unknown languages, the English quotes are used.
pub fn from_lang(language: &str, region: &str) -> Self {
let (single_open, single_close, double_open, double_close) = match language {
"de" if matches!(region, "CH" | "LI") => ("", "", "«", "»"),
pub fn from_lang(lang: Lang, region: Option<Region>) -> Self {
let region = region.as_ref().map(Region::as_str);
let (single_open, single_close, double_open, double_close) = match lang.as_str() {
"de" if matches!(region, Some("CH" | "LI")) => ("", "", "«", "»"),
"cs" | "da" | "de" | "et" | "is" | "lt" | "lv" | "sk" | "sl" => {
("", "", "", "")
}

View File

@ -1,4 +1,5 @@
use std::ops::Range;
use std::str::FromStr;
use rustybuzz::{Feature, UnicodeBuffer};
@ -372,6 +373,7 @@ fn shape_segment<'a>(
// Fill the buffer with our text.
let mut buffer = UnicodeBuffer::new();
buffer.push_str(text);
buffer.set_language(language(ctx.styles));
buffer.set_direction(match ctx.dir {
Dir::LTR => rustybuzz::Direction::LeftToRight,
Dir::RTL => rustybuzz::Direction::RightToLeft,
@ -613,3 +615,14 @@ fn tags(styles: StyleChain) -> Vec<Feature> {
tags
}
/// Process the language and and region of a style chain into a
/// rustybuzz-compatible BCP 47 language.
fn language(styles: StyleChain) -> rustybuzz::Language {
let mut bcp: EcoString = styles.get(TextNode::LANG).as_str().into();
if let Some(region) = styles.get(TextNode::REGION) {
bcp.push('-');
bcp.push_str(region.as_str());
}
rustybuzz::Language::from_str(&bcp).unwrap()
}

BIN
tests/ref/text/lang.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 58 KiB

After

Width:  |  Height:  |  Size: 64 KiB

39
tests/typ/text/lang.typ Normal file
View File

@ -0,0 +1,39 @@
// Test setting the document language.
---
// Ensure that setting the language does have effects.
#set text(hyphenate: true)
#grid(
columns: 2 * (20pt,),
gutter: 1fr,
text(lang: "en")["Eingabeaufforderung"],
text(lang: "de")["Eingabeaufforderung"],
)
---
// Test that the language passed to the shaper has an effect.
#set text("Ubuntu")
// Some lowercase letters are different in Serbian Cyrillic compared to other
// Cyrillic languages. Since there is only one set of Unicode codepoints for
// Cyrillic, these can only be seen when setting the language to Serbian and
// selecting one of the few fonts that support these letterforms.
Бб
#text(lang: "uk")[Бб]
#text(lang: "sr")[Бб]
---
// Error: 17-21 expected string, found none
#set text(lang: none)
---
// Error: 17-20 expected two or three letter language code (ISO 639-1/2/3)
#set text(lang: "ӛ")
---
// Error: 17-20 expected two or three letter language code (ISO 639-1/2/3)
#set text(lang: "😃")
---
// Error: 19-24 expected two letter region code (ISO 3166-1 alpha-2)
#set text(region: "hey")

View File

@ -1,7 +1,7 @@
// Test smart quotes.
---
#set page(width: 200pt)
#set page(width: 250pt)
// Test simple quotations in various languages.
#set text(lang: "en")
@ -10,7 +10,10 @@
#set text(lang: "de")
"Das Pferd frisst keinen Gurkensalat" war der erste jemals am 'Fernsprecher' gesagte Satz.
#set text(lang: "fr")
#set text(lang: "de", region: "CH")
"Das Pferd frisst keinen Gurkensalat" war der erste jemals am 'Fernsprecher' gesagte Satz.
#set text(lang: "fr", region: none)
"Le cheval ne mange pas de salade de concombres" est la première phrase jamais prononcée au 'téléphone'.
#set text(lang: "fi")