Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does GoNotoCurrent not render Korean glyphs whereas GoNotoCJKCore does? #39

Open
xplip opened this issue May 9, 2022 · 20 comments
Open
Labels
bug Something isn't working workaround available An alternative way to solve the issue

Comments

@xplip
Copy link

xplip commented May 9, 2022

Thank you for providing this great library!
I am currently trying to render text in various languages with the pygame library and it seems that when I am using GoNotoCurrent, I can render Japanese and Chinese glyphs just fine, but Korean glyphs are only rendered as empty boxes. When I am using GoNotoCJKCore, Korean is rendered properly as well, so I am wondering what the main difference between the two is.
I can get around the issue by rendering my texts with the Pillow library and a libraqm layout engine which builds on harfbuzz, but this is horribly slow, so I'd prefer to keep using pygame and get it to work with GoNotoCurrent. Do you have an idea why rendering Korean might not work in my setup?

@satbyy satbyy added the bug Something isn't working label May 9, 2022
@satbyy
Copy link
Owner

satbyy commented May 9, 2022

Hi Phillip, thanks for the bug report.

The reason is that GoNotoCurrent does not include "Hangul Syllables" Unicode block (U+AC00 to U+D7AF) whereas GoNotoCJKCore does. This block contains about 11000+ codepoints and at least as many glyphs. However, GoNotoCurrent is currently at ~61000 glyphs in the font file, the maximum limit being 64K (this limit is imposed by spec). Hence there is not really enough "glyph space" for including all of Hangul syllables. So, there is really not much that can be done.

One option is to find a smaller subset (say ~2500 glyphs) of the 11K codepoints and include them in GoNotoCurrent, still honouring the 64K limit. Obviously this leaves out a large chunk of the Korean repertoire, so it is of little practical utility.

@dscorbett
Copy link
Collaborator

Many precomposed syllables are not actually used in Korean. You could use KS X 1001’s list of 2,350 common Hangul syllables.

@satbyy
Copy link
Owner

satbyy commented May 9, 2022

@dscorbett I gave it a try on my local machine (KSX1001 subset), but now we hit the cmap format 4 table limit of 65535. Such subsetting causes fragmentation of "Hangul Syllables" block (U+AC00 to U+D7AF) -- the subset ttf's cmap 4 table is about 13000 length whereas GoNotoCurrent is already at 64706, so the total 77666 > 65535

@satbyy
Copy link
Owner

satbyy commented May 9, 2022

Or maybe I'm looking at this the wrong way. Attached below is the rendering of Korean wikipedia homepage, using GoNotoCurrent.ttf. It seems that the initial + final components are not combined/stacked correctly. Am I dropping some tables unknowingly?

korean-wiki

@dscorbett
Copy link
Collaborator

I forgot about 'cmap' fragmentation. I guess that idea won’t work.

The syllables are exploded because the lookups that join them together are not applied when the language system is Korean. I’m not sure why.

@xplip
Copy link
Author

xplip commented May 10, 2022

Thanks a lot for the explanations and taking a stab at it already! I wasn’t aware Korean relied so heavily on the precomposed syllables. If the glyph limit is reached then I suppose there is not so much that can be done.

I think for my personal use case, having Korean in the font is more important than the Math, Music, and Symbol Fonts, though. I quickly tried to rebuild the GoNotoCurrent font without those four (NotoSansSymbols-Regular.ttf, NotoSansSymbols2-Regular.ttf, NotoSansMath-Regular.ttf, NotoMusic-Regular.ttf) in the categories.sh and with this file https://raw.githubusercontent.com/sozysozbot/korean_hanja_sound/master/KSX1001.txt passed to pyftsubset via the --unicodes-file flag in create_korean_hangul_subset().

Out came a font file that seems to render my Korean sample texts fine. The command otfinfo -g GoNotoCurrent.ttf | wc -l returns 65251, so it looks like it didn't go over the glyph limit. I'm not really confident any of this was the correct approach, though, so I would appreciate it a lot if you could double-check this :)

@satbyy
Copy link
Owner

satbyy commented May 10, 2022

@xplip Yes, that is a good approach and that's all there is to it. Enjoy your new font!

@satbyy satbyy added the workaround available An alternative way to solve the issue label May 10, 2022
@rxsto
Copy link

rxsto commented Feb 26, 2023

Hey there, sorry for bringing this topic up again, I originally thought I could just follow the steps proposed by @xplip and generate a GoNotoCurrent file with increased support for Korean Hangul syllables, but when trying to run the temporal_fonts.sh after both editing categories.sh (to remove the symbols, math and music fonts) and injecting the KSX1001.txt via the unicodes file flag in helper.sh at line 254, the process just randomly crashes.

The stacktrace is as follows:

Traceback (most recent call last):
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/bin/pyftmerge", line 8, in <module>
    sys.exit(main())
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/misc/loggingTools.py", line 372, in wrapper
    return func(*args, **kwds)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/merge/__init__.py", line 201, in main
    font.save(outfile)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/ttLib/ttFont.py", line 185, in save
    writer_reordersTables = self._save(tmp)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/ttLib/ttFont.py", line 225, in _save
    self._writeTable(tag, writer, done, tableCache)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/ttLib/ttFont.py", line 654, in _writeTable
    self._writeTable(masterTable, writer, done, tableCache)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/ttLib/ttFont.py", line 654, in _writeTable
    self._writeTable(masterTable, writer, done, tableCache)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/ttLib/ttFont.py", line 658, in _writeTable
    tabledata = self.getTableData(tag)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/ttLib/ttFont.py", line 680, in getTableData
    return self.tables[tag].compile(self)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/ttLib/tables/_g_l_y_f.py", line 132, in compile
    glyphData = glyph.compile(self, recalcBBoxes)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/ttLib/tables/_g_l_y_f.py", line 673, in compile
    data = data + self.compileComponents(glyfTable)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/ttLib/tables/_g_l_y_f.py", line 903, in compileComponents
    data = data + compo.compile(more, haveInstructions, glyfTable)
  File "/mnt/c/Users/oskar/Storage/code/projects/hydra/go-noto-universal/venv_fonty/lib/python3.10/site-packages/fontTools/ttLib/tables/_g_l_y_f.py", line 1469, in compile
    return struct.pack(">HH", flags, glyphID) + data
struct.error: 'H' format requires 0 <= number <= 65535

From what I can tell this exception gets thrown whilst trying to merge the base font files into the big single font file.

Since I am unfortunately pretty new to this field I am quite clueless on what to do in order to fix this issue. The last logs before this exception happens are always different, so there's nothing that would help debugging it. The first issue I was thinking of was that maybe there might somehow be too many glyphs to fit into the font file. Confusingly enough this exception occurred even after removing more fonts from the categories.sh file.

I am running the temporal_fonts.sh file on WSL2 22.04, and I think it could potentially be related to that, since the crashes appear so inconsistently.

Any help or hint on how to get this working would be greatly appreciated! Thanks so much for the awesome work :)

@rubiomiguel06
Copy link

I am facing the same issue as @rxsto . I am running the script on macOS Ventura, and after following the steps proposed by @xplip, I am getting the exact same stacktrace. Where you able to fix it, @rxsto ?

@rubiomiguel06
Copy link

For the record:

I have managed to fix the issue I was facing. Basically, there were more glyphs than what the spec allows (64K). Thus, the error struct.error: 'H' format requires 0 <= number <= 65535.

@xplip explanation is good, but, to make it clearer and easier, I would change the following line:

... and with this file https://raw.githubusercontent.com/sozysozbot/korean_hanja_sound/master/KSX1001.txt passed to pyftsubset via the --unicodes-file flag in create_korean_hangul_subset().

for:

In helper.sh, inside the method create_korean_hangul_subset() add the following codepoints:

codepoints+="U+AC00-D7A3," # Hangul syllables

That way all the Hangul syllables are added to the korean subset font and the glyph count limit is respected.

I hope I am not skipping any important glyphs for Korean. All my tests were successful, so I don't think so.

@stephen0z
Copy link

stephen0z commented May 28, 2023

AFAIK, usually open source fonts projects, especially large fonts with many glyphs, have their fonts made in 2 files.

Take "Hanazono fonts" as example:
https://osdn.net/projects/hanazono-font/

They release their font Hanazono in 2 files:
HanaMinA.ttf
HanaMinB.ttf

HanaMinA.ttf are font containing CJK glyphs, which are more commonly used, and HanaMinB are font with less used glyphs.

Most systems nowadays - Windows, *nix, Android can be set to use them as a pair.

2 files each 65536 glyphs should be enough for daily uses.

@satbyy
Copy link
Owner

satbyy commented May 28, 2023

@stephen1864 Thanks, that is a good idea to create two "A" and "B" fonts, one with Korean glyphs and one without them. I could work on it in the coming days or weeks.

@user6905
Copy link

I also trapped in the issue of the Korean symbols missing.
But as I'm not much experienced with font creation (what must be in/ what not), I could not follow all the discussions here.

I think the workaround of xplip is the one I need (I can easly skip Math, Music, and Symbol Fonts, but I need Korean) , but currently I have no idea how to create the font correctly?
May it be a idea to provide that font too (or provide that to me by some way)? This will help me very much.

On the other hand the separation to GoNotoCurrent A and B Font may help, if the A font is similar the GoNotoCurrent with Korean.

As I like to use the Font for embedding a PDF, I think to use it as a pair may not be a working idea. I need to use one TTF font.

@stephen0z
Copy link

@user6905
For embedding a PDF, it is best to put what is only needed, othewise the PDF will grow extremely large. If you don't need Math, Music, and Symbol Fonts but complete Korean, you may go directly to Noto Font, which is the source of this project, and choose one useful:

https://fonts.google.com/noto/fonts?noto.lang=ko_Kore&noto.continent=Asia&noto.script=Kore

@user6905
Copy link

@stephen: That does not help in my case.
I must be as universal as possibel, because or international use. But ist limited to technical conversation. For this reason I can Math, Music, and Symbol Fonts. But for Korean, I now its used. So a single Noto font makes no sense. I search for a better replacement of the UniFont. So GoNotCurrent is perfect (much better than UniFont), if it supports Korean.
Genrally in PDF ists not that bad, as I can subset the font and and because of some pictures, the font is not the only reason why the PDF gets a bit bigger. and anyway the font can be subsetted in the PDF.
So I really need an universal Font like GoNotoCurrent with Korean symbols.

@rubiomiguel06
Copy link

rubiomiguel06 commented Jul 27, 2023

@user6905 here's the font I've created back when I participated in this thread. Feel free to use it and test it in your specific scenario.
GoNotoCJKCore.zip

I don't remember the details of what IS and what IS NOT included. But you can check by yourself.

@user6905
Copy link

user6905 commented Jul 27, 2023

Thank you Miguel.
Meanwhile I installed Ubuntu und was able to use your fix.

  • with adding
    codepoints+="U+AC00-D7A3," # Hangul syllables
  • and removing
    symbols, math and music fonts

from GoNotoCurrent, I build a own Font based on that.

@satbyy: May you consider to include that Font in your collection? I think that can be helpful for some others too.

@satbyy: BTW - Is GoNoto... a correct name for the fonts? According to OFL License I thought you must not use reserved names (RFNs). And Noto is a TM of Google.

@satbyy
Copy link
Owner

satbyy commented Jul 31, 2023

@user6905 and all, can you please download the font from the CI pipeline? Now there are two variants:

  • Go Noto Kurrent (with a K, for Korean) with full Hangul syllables but removes symbols/emoji/math.
  • Go Noto Current (existing as it is) with poor Korean support but includes symbols/emoji/math.

If you are satisfied, I will close this issue and make a new release.

@user6905
Copy link

Generally the scirpt on Ubuntu works well and the created font included the Korean signs - I can confirm that. Thanks a lot.
I don't have a full test suite but everything looks fine for several Asian languages.

I only wonder that GoNotoCurrent-Regular.ttf from your zip file has only 14.669.722 Bytes. Mine have 15.485.612 Bytes and 64623 Glyphs.
I did not tested your font from the the zip so far.

@evilaliv3
Copy link

Amazing thank you @satbyy and @xplip

We are using this receipe and specifically the Kurrent font within the @globaleaks project all together with the FPDF2 library.

This makes us possible to print PDF able to render texts coming by any international user!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working workaround available An alternative way to solve the issue
Projects
None yet
Development

No branches or pull requests

8 participants