Если вы хотите прочитать данную информацию на русском, то пожалуйста нажмите здесь.

Background

I was recently in the position to help someone convert a bunch of old Mac Microsoft Word 5.1 documents to a more modern format. The catch is that they had a lot of cyrillic text. The cyrillic in them was using the STROGIJ font, which in turn uses CYRILSCII encoding. This encoding never seemed to take off except with the STROGIJ font.

Luckily we still had Mac Microsoft Word 5.1 available on an OS X box that had the OS 9 compatibility bits installed. Saving files with text of this encoding as RTF works fine in OS X if you install the STROGIJ font - that is, you'll be able to see the text in TextEdit or Pages or whatever. However, typing with the STROGIJ font selected will yield latin letters, and malfunction when trying to use Russian input mode. So, you'll be able to see your text but not add new text in the same font - you'd need to add new text in a different font or copy and paste the old letters around. Beyond that, you wont be able to copy and paste text in the STROGIJ font into other programs or email or whatever, since they are in the unknown CYRILSCII encoding.

So, in an effort to prevent the data from rotting or becoming useless, a method of conversion was required.

Conversion Information

I used Word on the mac to save the files as RTF, and then developed and ran a python script on all the RTF files to convert the CYRILSCII character sequences into a proper encoding (CP-1251), so that it would load cleanly into wordpad (PC), pages (Mac), textedit (Mac), etc..., using a modern font and encoding.

I've made that python script available here, with no guarantee that it isn't bug free or fit for the purpose. It's a python script, and worked fine on the OS X box and the linux box I ran it from. By default it will write it's output to stdout. If you have a bunch of RTF files you want to convert in a batch, do:

$ ./converter.py -p *.rtf

Which will in-place convert all the RTF files. Back them up first.

CYRILSCII Chart

For reference, the upper half of the CYRILSCII encoding. The lower half is assumed to be identical to the MAC-Roman encoding, but I've not verified that. Note that it's basically in alphabetical order with alternating case and gaps that are occupied by non-Russian cyrillic characters and a large portion of the normal MAC-Roman character set.

0123 4567 89ab cdef
0x8. Аа Бб Вв Гг Ґґ Дд Ее Ёё
U+0410 U+0430 U+0411 U+0431 U+0412 U+0432 U+0413 U+0433 U+0490 U+0491 U+0414 U+0434 U+0415 U+0435 U+0401 U+0451
0x9. Єє Жж Зз Ии Іі Її Йй Кк
U+0404 U+0454 U+0416 U+0436 U+0417 U+0437 U+0418 U+0438 U+0406 U+0456 U+0407 U+0457 U+0419 U+0439 U+041a U+043a
0xa. Лл Мм Нн Оо Пп Рр Сс « (Æ)» (Ø)
U+041b U+043b U+041c U+043c U+041d U+043d U+041e U+043e U+041f U+043f U+0420 U+0440 U+0421 U+0441 U+00ab U+00bb
0xb. - (∞)± Б (≥) ¥µ π ª ºΩ æø
U+002d U+00b1 U+2264 U+0411 U+00a5 U+00b5 U+2202 U+2211 U+220f U+03c0 U+222b U+00aa U+00ba U+03a9 U+00e6 U+00f8
0xc.   (¿)¡ ¬ ƒ Δ« »  À ÃÕ Œ  (œ)
U+00a0 U+00a1 U+00ac U+221a U+0192 U+2248 U+0394 U+00ab U+00bb U+2026 U+00a0 U+00c0 U+00c3 U+00d5 U+0152 U+00a0
0xd. ÷ ÿŸ   (⁄)
U+2013 U+2014 U+201c U+201d U+2018 U+2019 U+00f7 U+25ca U+00ff U+0178 U+00a0 U+20ac U+2039 U+203a U+fb01 U+fb02
0xe. Тт Уу Ўў Фф Хх Цц Чч Шш
U+0422 U+0442 U+0423 U+0443 U+040e U+045e U+0424 U+0444 U+0425 U+0445 U+0426 U+0446 U+0427 U+0447 U+0428 U+0448
0xf. Щщ Ъъ Ыы Ьь Ээ Юю Яя „ (˛)' (ˇ)
U+0429 U+0449 U+042a U+044a U+042b U+044b U+042c U+044c U+042d U+044d U+042e U+044e U+042f U+044f U+201e U+0027

Characters from 0xae to 0xdf are mostly MAC-Roman standard characters. The red cells differ from their normal MAC-Roman encoding. The normal encoding is in parenthesis. If the cell appears empty, it's a non-breaking space. For reference, here is a screenshot of the characters in the gaps. 0xb3 actually seems to be a small statured Б, but I don't think a similar character exists in unicode.

Для тех, кто говорит по-русски

Если что-нибудь на этом странице неправильно написано, то простите меня, я изучал мало технических слов, и всё-таки я ещё изучаю русский язык. Если у вас старые документы в формате "Macintosh Microsoft Word 5.1", в которых есть кириллический текст в шрифте "strogij", и вы можете превратить документы в формат "RTF", то можно превратить кириллический текст документов в современный шрифт, который работает на PC и Mac. Надо скачать файл сценария.

После того, как вы скачали файл сценария (python), поместите в пустую папку те файлы, что вы хотите превратить (между прочим, пожалуйста создайте резерв), и туда тоже поместите файл сценария. Потом, наберите в командной строке (в новой папке):

$ ./converter.py -p *.rtf

Я сам успешно использую его на PC (linux) и Mac (OS X). Не надо использовать файл сценария на компьютере, которым вы создали "RTF" файлы.