Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

country.sys: codepage 667/790/991 (Mazovia) support #52

Open
mateuszviste opened this issue Feb 26, 2024 · 11 comments
Open

country.sys: codepage 667/790/991 (Mazovia) support #52

mateuszviste opened this issue Feb 26, 2024 · 11 comments
Labels
enhancement New feature or request

Comments

@mateuszviste
Copy link

The EDR country.sys supports Poland NLS under codepage 852. This mimics what MS-DOS did back in the day, but it wasn't a popular choice in Poland since Poles did not wait for Microsoft and created their own codepage named Mazovia that was far superior to the CP852 proposition.

FreeDOS (and SvarDOS) uses Mazovia for LANG=PL, so perhaps it would be possible to port this from their country.sys into the EDR country.sys? Alternatively, #51 would make this subject a non-issue.

@boeckmann boeckmann added the enhancement New feature or request label Feb 26, 2024
@boeckmann
Copy link
Collaborator

As a short to medium term solution I will port this from the FreeDOS country.sys.

@mateuszviste
Copy link
Author

I'm myself interested in Mazovia only, but there are probably other examples where the FreeDOS country.sys is better/more complete than the EDR ersatz. I wonder how difficult it would be to translate FreeDOS country.sys into an EDR-format country.sys? either parsing the compiled version of FreeDOS' country.sys, or maybe mangling its source code to make it output something that EDR could understand?

@boeckmann
Copy link
Collaborator

It would be awesome if the country.sys could be generated in multiple forms from a database-like "thing". That would be an interesting project in its own. All the information should be there already.

Regarding cp 667/790/991, can you show me the commands how you configure your DOS environment? I am currently confused, because I have not found a reference to cp 667/790/991 in FreeDOS country.sys source at https://github.com/FDOS/country/blob/master/country.asm.

@mateuszviste
Copy link
Author

Regarding cp 667/790/991, can you show me the commands how you configure your DOS environment? I am currently confused, because I have not found a reference to cp 667/790/991 in FreeDOS country.sys

Ha, you got me. I checked it on the FreeDOS PC that is configured with LANG=PL and I only now realize that my configuration is dysfunctional. Until now I used this:

=== CONFIG.SYS ===
COUNTRY=048,,C:\FREEDOS\BIN\COUNTRY.SYS

=== AUTOEXEC.BAT ===
DISPLAY CON=(EGA,,1)
MODE CON CP PREP=((991) C:\FREEDOS\CPI\EGA10.CPX)
MODE CON CP SEL=991
KEYB PL

It boots alright, but on closer examination I see that sorting and up-casing are definitely not right. I guess it must default to either CP437 or CP852. I did not notice this sooner because this configuration happen to work for some of the polish-specific glyphs and since I'm not a huge fan of 8-bit characters in filenames I never got to hit the cases that do not work.

So the question would be: is there any reason to include in EDR something that does not exist in other DOSes? It's nothing really important, proof being that I used a malfunctioning configuration for decades without realizing it...

If you'd be keen on adding support for Mazovia, I'd be happy to prepare the necessary upcase & sorting tables.

@boeckmann
Copy link
Collaborator

It's nothing really important

It is a useful addition not causing that much work, so we will add it :-) If you could help with providing the tables, that would be great. Will happily add this to the EDR country.sys. Do you plan to make these tables "by hand" or extract them from anywhere? I

When that is done, there probably should also be an issue opened at the FreeDOS country repo.

@mateuszviste
Copy link
Author

Cool, give me some time, I will make these table by hand because I do not know of any source that could have them already. Will be back soon. :)

@mateuszviste
Copy link
Author

I worked on the tables today. Here are my results. Feel free to skip the boring parts. :)

Mazovia (CP 667 / CP 790) is based on CP437 and patches only a couple of bytes:

86h = ą
8Dh = ć
8Fh = Ą
90h = Ę
91h = ę
92h = ł
95h = Ć
98h = Ś
9Ch = Ł
9Eh = ś
A0h = Ź
A1h = Ż
A3h = Ó
A4h = ń
A5h = Ń
A6h = ź
A7h = ż

The "Mazovia-zł" variant (known as CP 991 in FreeDOS) redefines one extra character:
9Bh = zł (instead of the original "US cent" symbol)

This variant does not require any change of upcasing nor sorting, the tables are identical as for CP 667/790.

Regarding the COLLATE table, this is the difference with CP437 (ą, Ą, Ę, ń and Ń are already collated properly in CP437):

8Dh = ć -> collate as 'c' (67)
91h = ę -> collate as 'e' (69)
92h = ł -> collate as 'l' (76)
95h = Ć -> collate as 'c' (66)
98h = Ś -> collate as 's' (83)
9Ch = Ł -> collate as 'l' (76)
9Eh = ś -> collate as 's' (83)
A0h = Ź -> collate as 'z' (90)
A1h = Ż -> collate as 'z' (90)
A3h = Ó -> collate as 'o' (79)
A6h = ź -> collate as 'z' (90)
A7h = ż -> collate as 'z' (90)

So the full COLLATE table is this (based on CP437 from FreeDOS COUNTRY.SYS):

pl_collate_maz db 0FFh,"COLLATE"
       dw 256
db   0,   1,   2,   3,   4,   5,   6,   7
db   8,   9,  10,  11,  12,  13,  14,  15
db  16,  17,  18,  19,  20,  21,  22,  23
db  24,  25,  26,  27,  28,  29,  30,  31
db  32,  33,  34,  35,  36,  37,  38,  39
db  40,  41,  42,  43,  44,  45,  46,  47
db  48,  49,  50,  51,  52,  53,  54,  55
db  56,  57,  58,  59,  60,  61,  62,  63
db  64,  65,  66,  67,  68,  69,  70,  71
db  72,  73,  74,  75,  76,  77,  78,  79
db  80,  81,  82,  83,  84,  85,  86,  87
db  88,  89,  90,  91,  92,  93,  94,  95
db  96,  65,  66,  67,  68,  69,  70,  71
db  72,  73,  74,  75,  76,  77,  78,  79
db  80,  81,  82,  83,  84,  85,  86,  87
db  88,  89,  90, 123, 124, 125, 126, 127
db  67,  85,  69,  65,  65,  65,  65,  67
db  69,  69,  69,  73,  73,  67,  65,  65
db  69,  69,  76,  79,  79,  66,  85,  85
db  83,  79,  85,  36,  76,  36,  36,  36
db  90,  90,  79,  79,  78,  78,  90,  90
db  63, 169, 170, 171, 172,  33,  34,  34
db 176, 177, 178, 179, 180, 181, 182, 183
db 184, 185, 186, 187, 188, 189, 190, 191
db 192, 193, 194, 195, 196, 197, 198, 199
db 200, 201, 202, 203, 204, 205, 206, 207
db 208, 209, 210, 211, 212, 213, 214, 215
db 216, 217, 218, 219, 220, 221, 222, 223
db 224,  83, 226, 227, 228, 229, 230, 231
db 232, 233, 234, 235, 236, 237, 238, 239
db 240, 241, 242, 243, 244, 245, 246, 247
db 248, 249, 250, 251, 252, 253, 254, 255

Now, UPCASING:

86h = ą -> 143 // already good in CP437
8Dh = ć -> 149
8Fh = Ą -> 143 // already good in CP437
90h = Ę -> 144 // already good in CP437
91h = ę -> 144
92h = ł -> 156
95h = Ć -> 149
98h = Ś -> 152
9Ch = Ł -> 156 // already good in CP437
9Eh = ś -> 152
A0h = Ź -> 160
A1h = Ź -> 161
A3h = Ó -> 163
A4h = ń -> 165 // already good in CP437
A5h = Ń -> 165 // already good in CP437
A6h = ź -> 160
A7h = ż -> 161

So the UPCASE table for Mazovia would look like this:

ucase_maz db 0FFh,"UCASE  "
  dw 128
db 128, 154,  69,  65, 142,  65, 143, 128
db  69,  69,  69,  73,  73, 149, 142, 143
db 144, 144, 156,  79, 153, 149,  85,  85
db 152, 153, 154, 155, 156, 157, 152, 159
db 160, 161,  79, 163, 165, 165, 160, 161
db 168, 169, 170, 171, 172, 173, 174, 175
db 176, 177, 178, 179, 180, 181, 182, 183
db 184, 185, 186, 187, 188, 189, 190, 191
db 192, 193, 194, 195, 196, 197, 198, 199
db 200, 201, 202, 203, 204, 205, 206, 207
db 208, 209, 210, 211, 212, 213, 214, 215
db 216, 217, 218, 219, 220, 221, 222, 223
db 224, 225, 226, 227, 228, 229, 230, 231
db 232, 233, 234, 235, 236, 237, 238, 239
db 240, 241, 242, 243, 244, 245, 246, 247
db 248, 249, 250, 251, 252, 253, 254, 255

Is this usable to you? Let me know if you'd need it in another format or so.

@boeckmann
Copy link
Collaborator

Looks good :-) I'll include it today afternoon.

@boeckmann
Copy link
Collaborator

I am almost don with it. Question: can I substitute zł as zl for the currency in codepage 437?

@mateuszviste
Copy link
Author

mateuszviste commented Feb 28, 2024

I'm not sure I understand the question. Is this about "what currency system should return when configured with COUNTRY=048,437,COUNTRY.SYS" ?

If yes, then instead of "zl" I would rather suggest either "PLZ" (if we want to keep living in the 90s) or "PLN".

Point being that I've never seen "zl" before so it looks very strange. Similar as to seeing "€" replaced by "E" (even though this I have seen already).

@boeckmann
Copy link
Collaborator

Yes, it is about the currency symbol. According to Wikipedia, PLN is the currency code, not the symbol. But I will happily follow your judgement here and set PLN as currency for 437.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants