The CS/DSP Nest of Kuei-Ti Lu: cryptography

The following ciphertext came from Edgar Allan Poe's The Gold-Bug.

53‡‡†305))6*;4826)4‡.)4‡);806*;48†8
¶60))85;;]8*;:‡*8†83(88)5*†;46(;88*96
*?;8)*‡(;485);5*†2:*‡(;4956*2(5*—4)8
¶8*;4069285);)6†8)4‡‡;1(‡9;48081;8:8‡
1;48†85;4)485†528806*81(‡9;48;(88;4
(‡?34;48)4‡;161;:188;‡?;

It was generated using a simple substitution algorithm. It is nothing new, but I tried solving it without looking at a known solution.

To decrypt this message, try using the relative frequencies of the letters. The relative frequencies of the letters are as follows (letter frequency):

The most frequently occurring letter in English is e. Therefore, assume that one of

8, ;, and 4

stands for e.

ee is among the most common repeated letters. Therefore, try using the counts of

88, ;;, and 44:

88 5
;; 1
44 0

Given the counts, assume

8 stands for e.

Replace

8 by e,

and the text now looks as follows:

53‡‡†305))6*the26)h‡.)h‡)te06*the†e
¶60))e5tt]e*t:‡*e†e3(ee)5*†th6(tee*96
*?te)*‡(the5)t5*†2:*‡(th956*2(5*—h)e
¶e*th0692e5)t)6†e)h‡‡t1(‡9the0e1te:e‡
1the†e5th)he5†52ee06*e1(‡9thet(eeth
(‡?3hthe)h‡t161t:1eet‡?t

The second most frequently occurring letter in English is t. Therefore, assume that one of

; and 4

stands for t.

The most frequently occurring trigram in English is "the." Therefore, try using the counts of trigrams starting with

; or 4

and ending with e:

;4e 7
;ee 1
;(e 1
;]e 1
4)e 1

Given the counts, assume

; stands for t and 4 stands for h.

Replace

; by t and by h,

and the text now looks as follows:

53‡‡†305))6*;h826)h‡.)h‡);806*;h8†8
¶60))85;;]8*;:‡*8†83(88)5*†;h6(;88*96
*?;8)*‡(;h85);5*†2:*‡(;h956*2(5*—h)8
¶8*;h069285);)6†8)h‡‡;1(‡9;h8081;8:8‡
1;h8†85;h)h85†528806*81(‡9;h8;(88;h
(‡?3h;h8)h‡;161;:188;‡?;

Based on English letter frequencies,

), ‡, *, 5, and 6

are likely to stand for a, o, i, n, s, or r (sorted by frequency). Moreover, excluding th, he, and er, the most frequently occurring digrams in English are in and an, both of which end with n. Therefore, try finding the letter standing for n. To do so, try using the counts of digrams containing only

), ‡, *, 5, and 6:

6* 5
5* 3
5) 3
)) 2
*‡ 2
‡‡ 2
)6 2
...(each of the rest has only one count)

Given the counts and letter frequencies, assume

* stands for n, 6 stands for i, and 5 stands for a.

Replace

* by n, 6 by i, and 5 by a,

and the text now looks as follows:

a3‡‡†30a))inthe2i)h‡.)h‡)te0inthe†e
¶i0))eatt]ent:‡ne†e3(ee)an†thi(teen9i
n?te)n‡(thea)tan†2:n‡(th9ain2(an—h)e
¶enth0i92ea)t)i†e)h‡‡t1(‡9the0e1te:e‡
1the†eath)hea†a2ee0ine1(‡9thet(eeth
(‡?3hthe)h‡t1i1t:1eet‡?t

Excluding the, tha, and ent, the most frequently occurring trigram in English is "and." Therefore, try using the counts of trigrams starting with an to find the letter standing for d:

an† 2
an— 1

d's relative frequency in English text is approximately 0.043. Therefore, assume

† stands for d.

Replace

† by d,

and the text now looks as follows:

a3‡‡d30a))inthe2i)h‡.)h‡)te0inthede
¶i0))eatt]ent:‡nede3(ee)andthi(teen9i
n?te)n‡(thea)tand2:n‡(th9ain2(an—h)e
¶enth0i92ea)t)ide)h‡‡t1(‡9the0e1te:e‡
1thedeath)heada2ee0ine1(‡9thet(eeth
(‡?3hthe)h‡t1i1t:1eet‡?t

Go back to the list of

), ‡, *, 5, and 6

and a, o, i, n, s, and r. Remove

*, 5, and 6

and a, i, and n.

) and ‡

are likely to stand for o, s, or r. Among o, s, and r, o is often seen in pairs, and so is s, while r is not. Therefore, try using the counts of

)) and ‡‡:

)) 2
‡‡ 2

Given the counts, assume that none of

) and ‡

stands for r.

Look at the first time when

)) or ‡‡

occurs:

a3‡‡d30a...

Assume the first a is an article. Given

‡‡d,

assume

‡ stands for o

instead of s.

Replace

‡ by o and ) by s,

and the text now looks as follows (with guessed spaces):

a 3ood 30ass in the 2isho.shoste0 in the de
¶i0sseatt]ent:onede3(ees and thi(teen9i
n?tesno(th east and 2:no(th9ain2(an—hse
¶enth0i92 east side shoot 1(o9the0e1te:eo
1 the deaths head a2ee0ine1(o9thet(eeth
(o?3h the shot 1i1t:1eeto?t

Look at

...thi(teen9i...

Assume

thi(teen

stands for thirteen - assume

( stands for r.

Replace

( by r,

and the text now looks as follows (with guessed spaces):

a 3ood 30ass in the 2isho.shoste0 in the de
¶i0sseatt]ent:onede3rees and thirteen 9i
n?tes northeast and 2: north 9ain2ran—hse
¶enth0i92 east side shoot 1ro9the0e1te:eo
1 the deaths head a2ee0ine1ro9thetreeth
ro?3h the shot 1i1t:1eeto?t

The most frequently occurring digram in English whose letters have not both be found is ou. Try using the counts of digrams starting with o:

o9 2
o? 2
o1 1
o. 1

Assume one of

9 or ?

stands for u. Look at

...9ain2ran—hse...

It is highly unlikely that

9 stands for u.

Therefore, assume

? stands for u.

Replace

? by u,

and the text now looks as follows(with guessed spaces):

a 3ood 30ass in the 2isho.shoste0 in the de
¶i0sseatt]ent:onede3rees and thirteen 9inutes
northeast and 2: north 9ain2ran—hse
¶enth0i92 east side shoot 1ro9 the 0e1te:eo
1 the deaths head a2ee0ine1ro9 the
tree throu3h the shot 1i1t:1eetout

Look at

...throu3h...

Assume

throu3h

stands for through - assume

3 stands for g.

Replace

3 by g,

and the text now looks as follows (with guessed spaces):

a good g0ass in the 2isho.shoste0 in the de
¶i0sseatt]ent:one degrees and thirteen 9inutes
northeast and 2: north 9ain2ran—hse
¶enth0i92 east side shoot 1ro9the0e1te:eo1
the deaths head a2ee0ine1ro9
the tree through the shot 1i1t:1eetout

Look at

...9inutes...

Assume

9inutes

stands for minutes - assume

9 stands for m.

Replace

9 by m,

and the text now looks as follows (with guessed spaces):

a good g0ass in the 2isho.shoste0 in the
de¶i0s seat t]ent: one degrees and thirteen minutes
northeast and 2: north main2ran—hse
¶enth0im2 east side shoot 1romthe0e1te:eo1
the deaths head a2ee0ine1rom
the tree through the shot 1i1t:1eetout

Look at

...one degrees...

Based on grammar, assume some part of

t]ent: one

stands for a number. Assume it stands for twenty-one (- not included in the text) - assume

] stands for w and : stands for y.

Replace

] by w and : by y,

and the text now looks as follows (with guessed spaces):

a good g0ass in the 2isho.shoste0 in the
de¶i0s seat twentyone degrees and thirteen minutes
northeast and 2y northmain2ran—hse
¶enth0im2 east side shoot 1rom the 0e1t eye o1
the deaths head a2ee0ine 1rom
the tree through the shot 1i1ty1eetout

Assume

1rom

stands for from - assume

1 stands for f.

Replace

1 by f,

and the text now looks as follows (with guessed spaces):

a good g0ass in the 2isho.shoste0 in the
de¶i0s seat twentyone degrees and thirteen minutes
northeast and 2y north main2ran—hse
¶enth0im2 east side shoot from the 0eft eye of
the deaths head a2ee0ine from
the tree through the shot fifty feet out

Assume

0eft

stands for left - assume

0 stands for l.

Replace

0 by l,

and the text now looks as follows (with guessed spaces):

a good glass in the 2isho.shostel in the
de¶ils seat twentyone degrees and thirteen minutes
northeast and 2y north main 2ran—hse
¶enthlim2 east side shoot from the left eye of
the deaths head a2eeline from
the tree through the shot fifty feet out

Assume

de¶ils

stands for devils - assume

¶ stands for v.

Replace

¶ by v,

and the text now looks as follows (with guessed spaces):

a good glass in the 2isho.shostel in the
devils seat twentyone degrees and thirteen minutes
northeast and 2y north main 2ran—h seventh
lim2 east side shoot from the left eye of
the deaths head a2eeline from
the tree through the shot fifty feet out

Given that there is "eye" in the text,

lim2

is likely to stand for limb. Therefore, assume

2 stands for b.

Replace

2 by b,

and the text now looks as follows (with guessed spaces):

a good glass in the bisho.s hostel in the
devils seat twentyone degrees and thirteen minutes
northeast and by north main bran—h seventh
limb east side shoot from the left eye of
the deaths head a bee line from
the tree through the shot fifty feet out

Assume

bisho.s stands for bishops

and

bran—h stands for branch -

assume

. stands for p and — stands for c.

Replace

. by p and — by c,

and we obtain the decoded message:

A good glass in the bishop's hostel in the devil's seat twenty-one degrees and thirteen minutes northeast and by north main branch seventh limb east side shoot from the left eye of the death's head a bee line from the tree through the shot fifty feet out.

Course Name:
Cryptography I
Link:
https://www.coursera.org/course/crypto
Institution:
Stanford University
Instructor:
Dan Boneh

-----

Some Personal Experience:

I have taken this course twice. The last time was in 2012, when I was still working toward my Bachelor's degree. I was busy with the schoolwork and had little time working on the course, but I still tried to at least learn the concepts. I did not complete any programming assignment except one and did not take the final. The result is:

The above is the certificate for the session back that time. Although the result met my personal goal at that time, it can clearly be seen that improvement can be made.

As a result, I took the course again. The result looks more brilliant than that in 2012. This time, I completed all programming assignments and took the final. The percentage I got for the course is now 91.7 %, with distinction (the new certificate can be found at https://drive.google.com/folderview?id=0B53jbj-6Ew8QNEtMbDJUbUdfSlk&usp=sharing).

Lectures:

The lectures for the two sessions I have taken are similar. The materials covered are similar. Ciphers (including some stream ciphers and block ciphers) and attacks were introduced. How to and how not to use the ciphers, as well as why, were covered. Some message integrity mechanisms were introduced, which were combined with block ciphers for authenticated encryption. Attacks on message integrity and incorrect use of authenticated encryption mechanisms were covered.

The flow for the ciphers and attacks was quite similar through the course, so it was easy to get used to the lecture flow. Something more out of the flow was the introduction to the number theory that would be used for public key encryption, which certainly could not follow the mechanism-attack flow (because the number theory is not a cipher at all).

The most difficult part, in my opinion, was the math involved, especially the number theory, but math is almost always cute in my opinion, so I did not have any issue with it throughout the course.

As for the professor, his pronunciation was clear, so with the captions, there was no issue for me to get what he said.

There were lecture slides to view online or download, which is convenient for visual learners.

Problem Sets and Final:

The problems sets included both multiple choice and non-multiple-choice problems. Usually, math was used to examine security of certain usages of ciphers. The problems usually required thinking (instead of simply memorizing and pick). However, since they were based on the lectures, it was not difficult to answer them if one understands well the materials covered.

The types of problems in the final was similar to those in the problem sets.

Programming Assignments:

They extended from the lectures. Most were about attacking (virtually) on something that was known broken in terms of security. One could choose the language he/she liked for the assignments. For the number theory programming assignment, some special algorithms were the tasks. The programming assignments, like the problem sets, were not difficult if one understands the materials covered. The instruction sometimes contained useful information. The most tricky part, in my opinion, was familiarizing oneself with the libraries of the language used. (DO NOT implement your own cipher.) However, using the Internet to find the documentation and other informational articles or discussions, one should be able to use those libraries in terms of completing the assignments.

For the number theory programming assignment, I used Java to make use of BigInteger class and encountered one issue, which I solved (I wrote about this in another article: http://csdspnest.blogspot.com/2014/03/bigintegersqrt.html).

Discussion Forums:

As in other MOOCs, discussion forums are often helpful. There were other students sharing useful information such as proving certain probability in the lectures that the professor did not prove in the lectures.

Summary:

The course was not very different from other MOOCs in terms of teaching methods. The video quality and quality of materials were fine.

(I do not recommend this course to anyone who does not have any programming knowledge unless the person does not plan to complete the programming assignments.)

The CS/DSP Nest of Kuei-Ti Lu

Pages

Sunday, January 11, 2015

My Solution to Decrypting a Ciphertext from Allan Poe's The Gold-Bug

Sunday, March 16, 2014

MOOC Review: Cryptography I