Encryption for non alphabetic languages
Moderator: Alyrium Denryle
- mr friendly guy
- The Doctor
- Posts: 11235
- Joined: 2004-12-12 10:55pm
- Location: In a 1960s police telephone box somewhere in Australia
Encryption for non alphabetic languages
I subscribe to sci show, and they had a nice video on encryption techniques. However these applied to English and by extension languages which have alphabets.
https://www.youtube.com/watch?v=-yFZGF8FHSg
How would one encrypt a language that doesn't have an alphabet, for example Chinese.
https://www.youtube.com/watch?v=-yFZGF8FHSg
How would one encrypt a language that doesn't have an alphabet, for example Chinese.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.
Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
Re: Encryption for non alphabetic languages
Just off the top of my head, without having viewed the link, encipher (frinstance) the Unicode representation - as far as encipherment's concerned, it's a string of bytes. Unicode imposes a mapping on top of that, so as long as it round trips - same bytes come out after decipherment that were fed into encipherment - afaik, you're good.
Of course, I could be talking utter bollocks - someone who knows more, please correct me.
Of course, I could be talking utter bollocks - someone who knows more, please correct me.
A mad person thinks there's a gateway to hell in his basement. A mad genius builds one and turns it on. - CaptainChewbacca
Re: Encryption for non alphabetic languages
You have the right if it. To a computer everything is bits and because of this encryption doesn't really care what the data going in means. You litterally just do some transformative steps on incoming strings of bits and then use a decryption key at the other end.fnord wrote:Just off the top of my head, without having viewed the link, encipher (frinstance) the Unicode representation - as far as encipherment's concerned, it's a string of bytes. Unicode imposes a mapping on top of that, so as long as it round trips - same bytes come out after decipherment that were fed into encipherment - afaik, you're good.
Of course, I could be talking utter bollocks - someone who knows more, please correct me.
- mr friendly guy
- The Doctor
- Posts: 11235
- Joined: 2004-12-12 10:55pm
- Location: In a 1960s police telephone box somewhere in Australia
Re: Encryption for non alphabetic languages
So you guys are saying you can encrypt non alphabetic languages with computers? Ok, what about before computers. One of the encryption techniques mentioned was developed in the 16th century, and I can't see how that would apply to non alphabetic languages.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.
Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
- Terralthra
- Requiescat in Pace
- Posts: 4741
- Joined: 2007-10-05 09:55pm
- Location: San Francisco, California, United States
Re: Encryption for non alphabetic languages
Ciphers in Chinese and other logographic languages could be encrypted with grille ciphers, secret sharing (cutting a message into vertical strips, thus breaking up sentences), mixing and matching the syllables involved in logographs to include a separate message (steganography, in other words), and a couple other techniques. You're right that alphabet-substitution techniques like Caesar and Vigenere ciphers would not work particularly well.
- mr friendly guy
- The Doctor
- Posts: 11235
- Joined: 2004-12-12 10:55pm
- Location: In a 1960s police telephone box somewhere in Australia
Re: Encryption for non alphabetic languages
Before the advent of computers, how well would these techniques work? For example if we have 2 equally good mathematicians, one who could only speak English and the other only knows Chinese, who would find it easier to crack a simple message with the equivalent number of words in their respective language using a cipher method like the caesar and vigenere for English, and one of those methods for Chinese.Terralthra wrote:Ciphers in Chinese and other logographic languages could be encrypted with grille ciphers, secret sharing (cutting a message into vertical strips, thus breaking up sentences), mixing and matching the syllables involved in logographs to include a separate message (steganography, in other words), and a couple other techniques. You're right that alphabet-substitution techniques like Caesar and Vigenere ciphers would not work particularly well.
I know its going to be hard to answer, but I thought I would try.
Never apologise for being a geek, because they won't apologise to you for being an arsehole. John Barrowman - 22 June 2014 Perth Supernova.
Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
Countries I have been to - 14.
Australia, Canada, China, Colombia, Denmark, Ecuador, Finland, Germany, Malaysia, Netherlands, Norway, Singapore, Sweden, USA.
Always on the lookout for more nice places to visit.
- Terralthra
- Requiescat in Pace
- Posts: 4741
- Joined: 2007-10-05 09:55pm
- Location: San Francisco, California, United States
Re: Encryption for non alphabetic languages
Eh....that is hard to answer. Vigenere polyciphers were effectively unbroken until the age of Charles Babbage and Kasinski, by which time the underpinnings of cryptanalysis using mathematics (which would go on to be computer-assisted) began to be known. By the 1850s, a shorter key-length Vigenere could be broken by hand (and was), but longer-key Vigeneres were effectively just another way of saying "one time pad" and were more or less unbreakable (and still are, barring weakness in the random number generator). Caesar ciphers were solved problems using frequency analysis by the 9th Century CE.
Grille ciphers and steganography rely as much on the encryption being undetected as the actual message involved. If I give you a large piece of paper with a bunch of Chinese characters, and the characters are written in multiple colors, there are effectively an arbitrarily large number of messages hidden in it, based solely on the holes in the paper and what color filter is in those holes. How easy is that to solve? I dunno. Depends a lot on what outside information you have on what sort of knowledge you're seeking. Secret sharing likewise relies a lot on the words not being assembleable without outside knowledge - if you have all n strips and some basic idea what the message is, I can't imagine it being too hard. Steganography, likewise, if you know there's a message hidden, it's not hard to find.
Grille ciphers and steganography rely as much on the encryption being undetected as the actual message involved. If I give you a large piece of paper with a bunch of Chinese characters, and the characters are written in multiple colors, there are effectively an arbitrarily large number of messages hidden in it, based solely on the holes in the paper and what color filter is in those holes. How easy is that to solve? I dunno. Depends a lot on what outside information you have on what sort of knowledge you're seeking. Secret sharing likewise relies a lot on the words not being assembleable without outside knowledge - if you have all n strips and some basic idea what the message is, I can't imagine it being too hard. Steganography, likewise, if you know there's a message hidden, it's not hard to find.
Re: Encryption for non alphabetic languages
Not quite mentioned, but before the advent of computers, you could still find codebooks, that translated your plaintext into another form. Sometimes they were secret, and sometimes not (commercial codes typically weren't). The usual output of a codebook would be in a alphabet of sorts. A common use for these would be for telegraphy. In fact, there's a standard Chinese telegraph code, which maps characters to 4 digit numbers. From there, you could do your standard encryption algorithms. Note: this is now equivalent to getting the Unicode equivalent of the characters and encrypting that, but that's because Unicode is a non-secret codebook.Terralthra wrote:Ciphers in Chinese and other logographic languages could be encrypted with grille ciphers, secret sharing (cutting a message into vertical strips, thus breaking up sentences), mixing and matching the syllables involved in logographs to include a separate message (steganography, in other words), and a couple other techniques. You're right that alphabet-substitution techniques like Caesar and Vigenere ciphers would not work particularly well.
"preemptive killing of cops might not be such a bad idea from a personal saftey[sic] standpoint..." --Keevan Colton
"There's a word for bias you can't see: Yours." -- William Saletan
"There's a word for bias you can't see: Yours." -- William Saletan
- Ziggy Stardust
- Sith Devotee
- Posts: 3114
- Joined: 2006-09-10 10:16pm
- Location: Research Triangle, NC
Re: Encryption for non alphabetic languages
Not to be pedantic, but technically in Chinese specifically wouldn't secret sharing entail cutting the message into HORIZONTAL strips, since they traditionally wrote vertically?Terralthra wrote:Ciphers in Chinese and other logographic languages could be encrypted with grille ciphers, secret sharing (cutting a message into vertical strips, thus breaking up sentences), mixing and matching the syllables involved in logographs to include a separate message (steganography, in other words), and a couple other techniques. You're right that alphabet-substitution techniques like Caesar and Vigenere ciphers would not work particularly well.
- Terralthra
- Requiescat in Pace
- Posts: 4741
- Joined: 2007-10-05 09:55pm
- Location: San Francisco, California, United States
Re: Encryption for non alphabetic languages
Yes, it would, and yes, it's pedantic.Ziggy Stardust wrote:Not to be pedantic, but technically in Chinese specifically wouldn't secret sharing entail cutting the message into HORIZONTAL strips, since they traditionally wrote vertically?Terralthra wrote:Ciphers in Chinese and other logographic languages could be encrypted with grille ciphers, secret sharing (cutting a message into vertical strips, thus breaking up sentences), mixing and matching the syllables involved in logographs to include a separate message (steganography, in other words), and a couple other techniques. You're right that alphabet-substitution techniques like Caesar and Vigenere ciphers would not work particularly well.
- Sea Skimmer
- Yankee Capitalist Air Pirate
- Posts: 37390
- Joined: 2002-07-03 11:49pm
- Location: Passchendaele City, HAB
Re: Encryption for non alphabetic languages
Digital information is a 1 or a 0. What a human reads as a script is irrelevant to encryption method, code in your operating system handles the conversion from the 1/0 crap to a language. The point is how you scramble the 1/0 stuff.
Now if your talking about older precmuter material then yeah, it can get annoying, but really all a language like Mandarin Chinese means is that your code book will be much thicker for any given method vs English. Any number of strategies will work (against a non computerized enemy) to provide a useful cypher. Once computers are involved the language really doesn't matter, the complexity of possible cypher methods is far greater then that of the languages.
Now if your talking about older precmuter material then yeah, it can get annoying, but really all a language like Mandarin Chinese means is that your code book will be much thicker for any given method vs English. Any number of strategies will work (against a non computerized enemy) to provide a useful cypher. Once computers are involved the language really doesn't matter, the complexity of possible cypher methods is far greater then that of the languages.
"This cult of special forces is as sensible as to form a Royal Corps of Tree Climbers and say that no soldier who does not wear its green hat with a bunch of oak leaves stuck in it should be expected to climb a tree"
— Field Marshal William Slim 1956
— Field Marshal William Slim 1956
- Zixinus
- Emperor's Hand
- Posts: 6663
- Joined: 2007-06-19 12:48pm
- Location: In Seth the Blitzspear
- Contact:
Re: Encryption for non alphabetic languages
Computers have to have "alphabetized" non-alphabetic characters like Chinese characters, they have to in order for them to be rendered at all. To a computer, they are just characters. Encryption, whether digital or not, should actually be relatively easier because you have more raw variety of information to jumble around (which is roughly what encryption is). It means messages would be bigger but that's already a given with such writing systems.
Credo!
Chat with me on Skype if you want to talk about writing, ideas or if you want a test-reader! PM for address.
Chat with me on Skype if you want to talk about writing, ideas or if you want a test-reader! PM for address.
- Sea Skimmer
- Yankee Capitalist Air Pirate
- Posts: 37390
- Joined: 2002-07-03 11:49pm
- Location: Passchendaele City, HAB
Re: Encryption for non alphabetic languages
The message size is kinda irrelevant, good encryption methods always employed lots of padding so that the enemy cannot infer the message meaning by its length or format, or easily exploit partly broken codes. Classic human processed example of how to do that is to attach a bunch of names from the phone book to each end of the original text, easily ignored once decrypted. Prior to fully computerized systems though one's ability to use padding was more constrained though, because for important communications encryption/decryption time begins to matter, say morse radio communications between Admirals at Sea during operations. Errors also become a problem.
If your working by hand or simple machine Chinese ect... style characters are going to be a pain in the ass to work with as a practical manner, which will increase the probability of errors. Or code operators doing things they shouldn't like using the same code book page each day to make life easier. Japan was incredibly bad at this in WW2. Codes themselves were pretty good, but operator discipline was very poor, particularly on civilian ships. Amusingly though they were also pretty bad at precise navigation, so many US decrypts of ship positions proved to be useless to US submarines, because the ship was that wrong about where it was!
Once you go digital computer, problems like this are much lessened, but not eliminated.
If your working by hand or simple machine Chinese ect... style characters are going to be a pain in the ass to work with as a practical manner, which will increase the probability of errors. Or code operators doing things they shouldn't like using the same code book page each day to make life easier. Japan was incredibly bad at this in WW2. Codes themselves were pretty good, but operator discipline was very poor, particularly on civilian ships. Amusingly though they were also pretty bad at precise navigation, so many US decrypts of ship positions proved to be useless to US submarines, because the ship was that wrong about where it was!
Once you go digital computer, problems like this are much lessened, but not eliminated.
"This cult of special forces is as sensible as to form a Royal Corps of Tree Climbers and say that no soldier who does not wear its green hat with a bunch of oak leaves stuck in it should be expected to climb a tree"
— Field Marshal William Slim 1956
— Field Marshal William Slim 1956
Re: Encryption for non alphabetic languages
You could just make a list of characters that assigns a numeric code to each character, and then freely distribute that list, and just encrypt the sequence of numeric codes that make up a message. I would assume that making a list of all relevant characters would be a pain, but probably worth it for secure communications.
Wow, 4000 characters. The list is going to be a small book on its own.
Wow, 4000 characters. The list is going to be a small book on its own.
I'm a cis-het white male, and I oppose racism, sexism, homophobia, and transphobia. I support treating all humans equally.
When fascism came to America, it was wrapped in the flag and carrying a cross.
That which will not bend must break and that which can be destroyed by truth should never be spared its demise.
When fascism came to America, it was wrapped in the flag and carrying a cross.
That which will not bend must break and that which can be destroyed by truth should never be spared its demise.
Re: Encryption for non alphabetic languages
Chinese telegraphic code? 7000 characters in a 100 page book. It's organized similarly to Chinese dictionaries.Zeropoint wrote:You could just make a list of characters that assigns a numeric code to each character, and then freely distribute that list, and just encrypt the sequence of numeric codes that make up a message. I would assume that making a list of all relevant characters would be a pain, but probably worth it for secure communications.
Wow, 4000 characters. The list is going to be a small book on its own.
"preemptive killing of cops might not be such a bad idea from a personal saftey[sic] standpoint..." --Keevan Colton
"There's a word for bias you can't see: Yours." -- William Saletan
"There's a word for bias you can't see: Yours." -- William Saletan
Re: Encryption for non alphabetic languages
There was also the 'shared codebook' route, which would work for non-alphabetic languages. Use a famous book of poetry, or a treatise that wouldn't seem out of place in anyone's home or office. The code is a series of numbers that refers to the page and word/character on that page, or perhaps a whole phrase, which then has hidden meanings of its own.
Nitram, slightly high on cough syrup: Do you know you're beautiful?
Me: Nope, that's why I have you around to tell me.
Nitram: You -are- beautiful. Anyone tries to tell you otherwise kill them.
"A life is like a garden. Perfect moments can be had, but not preserved, except in memory. LLAP" -- Leonard Nimoy, last Tweet
Me: Nope, that's why I have you around to tell me.
Nitram: You -are- beautiful. Anyone tries to tell you otherwise kill them.
"A life is like a garden. Perfect moments can be had, but not preserved, except in memory. LLAP" -- Leonard Nimoy, last Tweet
- U.P. Cinnabar
- Sith Marauder
- Posts: 3943
- Joined: 2016-02-05 08:11pm
- Location: Aboard the RCS Princess Cecile
Re: Encryption for non alphabetic languages
"To Serve Man" is an excellent cookbook.LadyTevar wrote:There was also the 'shared codebook' route, which would work for non-alphabetic languages. Use a famous book of poetry, or a treatise that wouldn't seem out of place in anyone's home or office. The code is a series of numbers that refers to the page and word/character on that page, or perhaps a whole phrase, which then has hidden meanings of its own.
"Beware the Beast, Man, for he is the Devil's pawn. Alone amongst God's primates, he kills for sport, for lust, for greed. Yea, he will murder his brother to possess his brother's land. Let him not breed in great numbers, for he will make a desert of his home and yours. Shun him, drive him back into his jungle lair, for he is the harbinger of Death.."
—29th Scroll, 6th Verse of Ape Law
"Indelible in the hippocampus is the laughter. The uproarious laughter between the two, and their having fun at my expense.”
---Doctor Christine Blasey-Ford
Re: Encryption for non alphabetic languages
From a purely analytical standpoint, as was already said, generally as long as a language can be encoded into purely numeric sequences, then you can encrypt it, even prior to the advent of computers and formalized methods of representing text in binary sequences.
Once that encoding method is standardized, then encrypting any language would basically be the same.
So, for the example of the Vignere algorithm, you'd simply need a method to encode any language into numbers, after which you can apply the same encryption algorithm.
Like Sea Skimmer said, this becomes increasingly impractical when fast encryption and decryption is required, because the additional time overhead performing the encoding and decoding (which is in addition to the encryption/decryption) could become a hindrance during military operations of any era. So, preferably, the encoding would have to be simple.
For example, it would be possible to simplify the language-to-numerals encoding by skipping written characters and assigning numeric values to spoken syllables (even taking into account intonation) for Chinese. However there is a chance of misinterpretation; with the recipient using the context of entire words, phrases and/or sentences, this is unlikely, but still possible.
I suppose the Chinese telegraphic code would be a lossless, but more time consuming (if done without computers) method of encoding?
For the English alphabet, the encoding method for letters is obvious and simple: simply replacing each letter with its position number in the alphabet.
With the simple encoding out of the way, the actual encryption itself is the the Vignere algorithm itself (adding each plaintext character's sequence number with the corresponding position character in the key text, wrapping around).
After all, once encoded, both plaintext and key text are both just sequences of numbers.
I imagine that in a real situation, this might add just a little bit more obfuscation to counter-intelligence and codebreaking efforts, particularly if one of the language encoding methods is not a commonly known accepted standard.
Once that encoding method is standardized, then encrypting any language would basically be the same.
So, for the example of the Vignere algorithm, you'd simply need a method to encode any language into numbers, after which you can apply the same encryption algorithm.
Like Sea Skimmer said, this becomes increasingly impractical when fast encryption and decryption is required, because the additional time overhead performing the encoding and decoding (which is in addition to the encryption/decryption) could become a hindrance during military operations of any era. So, preferably, the encoding would have to be simple.
For example, it would be possible to simplify the language-to-numerals encoding by skipping written characters and assigning numeric values to spoken syllables (even taking into account intonation) for Chinese. However there is a chance of misinterpretation; with the recipient using the context of entire words, phrases and/or sentences, this is unlikely, but still possible.
I suppose the Chinese telegraphic code would be a lossless, but more time consuming (if done without computers) method of encoding?
For the English alphabet, the encoding method for letters is obvious and simple: simply replacing each letter with its position number in the alphabet.
With the simple encoding out of the way, the actual encryption itself is the the Vignere algorithm itself (adding each plaintext character's sequence number with the corresponding position character in the key text, wrapping around).
Another good thing is that, as long as the language-to-numbers encoding method for each language is already established (either being well-known, or simply understood by both parties), the plaintexts and the codebook doesn't even have to be the same language (going as far as to even use languages that don't even use the same written characters at all).LadyTevar wrote:There was also the 'shared codebook' route, which would work for non-alphabetic languages. Use a famous book of poetry, or a treatise that wouldn't seem out of place in anyone's home or office. The code is a series of numbers that refers to the page and word/character on that page, or perhaps a whole phrase, which then has hidden meanings of its own.
After all, once encoded, both plaintext and key text are both just sequences of numbers.
I imagine that in a real situation, this might add just a little bit more obfuscation to counter-intelligence and codebreaking efforts, particularly if one of the language encoding methods is not a commonly known accepted standard.
"..history has shown the best defense against heavy cavalry are pikemen, so aircraft should mount lances on their noses and fly in tight squares to fend off bombers". - RedImperator
"ha ha, raping puppies is FUN!" - Johonebesus
"It would just be Unicron with pew pew instead of nom nom". - Vendetta, explaining his justified disinterest in the idea of the movie Allspark affecting the Death Star
"ha ha, raping puppies is FUN!" - Johonebesus
"It would just be Unicron with pew pew instead of nom nom". - Vendetta, explaining his justified disinterest in the idea of the movie Allspark affecting the Death Star