28ba53c076
scripts/mkutf8data is used only when regenerating utf8data.h, which never happens in the normal kernel build. However, it is irrespectively built if CONFIG_UNICODE is enabled. Moreover, there is no good reason for it to reside in the scripts/ directory since it is only used in fs/unicode/. Hence, move it from scripts/ to fs/unicode/. In some cases, we bypass build artifacts in the normal build. The conventional way to do so is to surround the code with ifdef REGENERATE_*. For example, - 7373f4f83c71 ("kbuild: add implicit rules for parser generation") - 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped") I rewrote the rule in a more kbuild'ish style. In the normal build, utf8data.h is just shipped from the check-in file. $ make [ snip ] SHIPPED fs/unicode/utf8data.h CC fs/unicode/utf8-norm.o CC fs/unicode/utf8-core.o CC fs/unicode/utf8-selftest.o AR fs/unicode/built-in.a If you want to generate utf8data.h based on UCD, put *.txt files into fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line. The mkutf8data tool will be automatically compiled to generate the utf8data.h from the *.txt files. $ make REGENERATE_UTF8DATA=1 [ snip ] HOSTCC fs/unicode/mkutf8data GEN fs/unicode/utf8data.h CC fs/unicode/utf8-norm.o CC fs/unicode/utf8-core.o CC fs/unicode/utf8-selftest.o AR fs/unicode/built-in.a I renamed the check-in utf8data.h to utf8data.h_shipped so that this will work for the out-of-tree build. You can update it based on the latest UCD like this: $ make REGENERATE_UTF8DATA=1 fs/unicode/ $ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped Also, I added entries to .gitignore and dontdiff. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
The utf8data.h file in this directory is generated from the Unicode Character Database for version 12.1.0 of the Unicode standard. The full set of files can be found here: http://www.unicode.org/Public/12.1.0/ucd/ Note! The URL's listed below are not stable. That's because Unicode 12.1.0 has not been officially released yet; it is scheduled to be released on May 8, 2019. We taking Unicode 12.1.0 a few weeks early because it contains a new Japanese character which is required in order to specify Japenese dates after May 1, 2019, when Crown Prince Naruhito ascends to the Chrysanthemum Throne. (Isn't internationalization fun? The abdication of Emperor Akihito of Japan is requiring dozens of software packages to be updated with only a month's notice. :-) We will update the URL's (and any needed changes to the checksums) after the final Unicode 12.1.0 is released. Individual source links: https://www.unicode.org/Public/12.1.0/ucd/CaseFolding-12.1.0d2.txt https://www.unicode.org/Public/12.1.0/ucd/DerivedAge-12.1.0d3.txt https://www.unicode.org/Public/12.1.0/ucd/extracted/DerivedCombiningClass-12.1.0d2.txt https://www.unicode.org/Public/12.1.0/ucd/DerivedCoreProperties-12.1.0d2.txt https://www.unicode.org/Public/12.1.0/ucd/NormalizationCorrections-12.1.0d1.txt https://www.unicode.org/Public/12.1.0/ucd/NormalizationTest-12.1.0d3.txt https://www.unicode.org/Public/12.1.0/ucd/UnicodeData-12.1.0d2.txt md5sums (verify by running "md5sum -c README.utf8data"): 900e76da1d822a160fd6b8c0b1d70094 CaseFolding.txt 131256380bff4fea8ad4a851616f2f10 DerivedAge.txt e731a4089b30002144e107e3d6f8d1fa DerivedCombiningClass.txt a47c9fbd7ff92a9b261ba9831e68778a DerivedCoreProperties.txt fcab6dad15e440879d92f315978f93d3 NormalizationCorrections.txt f9ff1c55a60decf436100f791b44aa98 NormalizationTest.txt 755f6af699f8c8d2d958da411f78f6c6 UnicodeData.txt sha1sums (verify by running "sha1sum -c README.utf8data"): dc9245f6803c4ac99555c361f5052e0b13eb779b CaseFolding.txt 3281104f237184cdb5d869e86eb8573678ada7da DerivedAge.txt 2f5f995ccb96e0fa84b15151b35d5e2681535175 DerivedCombiningClass.txt 5b8698a3fcd5018e1987f296b02e2c17e696415e DerivedCoreProperties.txt cd83935fbc012345d8792d2c704f69497e753835 NormalizationCorrections.txt ea419aae505b337b0d99a83fa83fe58ddff7c19f NormalizationTest.txt dc973c0fc93d6f09d9ab9f70d1c9f89c447f0526 UnicodeData.txt To update to the newer version of the Unicode standard, the latest released version of the UCD can be found here: http://www.unicode.org/Public/UCD/latest/ Then, build under fs/unicode/ with REGENERATE_UTF8DATA=1: make REGENERATE_UTF8DATA=1 fs/unicode/ After sanity checking the newly generated utf8data.h file (the version generated from the 12.1.0 UCD should be 4,109 lines long, and have a total size of 324k) and/or comparing it with the older version of utf8data.h_shipped, rename it to utf8data.h_shipped. If you are a kernel developer updating to a newer version of the Unicode Character Database, please update this README.utf8data file with the version of the UCD that was used, the md5sum and sha1sums of the *.txt files, before checking in the new versions of the utf8data.h and README.utf8data files.