Sunday, January 11, 2015

My Solution to Decrypting a Ciphertext from Allan Poe's The Gold-Bug

The following ciphertext came from Edgar Allan Poe's The Gold-Bug.

53‡‡†305))6*;4826)4‡.)4‡);806*;48†8
¶60))85;;]8*;:‡*8†83(88)5*†;46(;88*96
*?;8)*‡(;485);5*†2:*‡(;4956*2(5*—4)8
¶8*;4069285);)6†8)4‡‡;1(‡9;48081;8:8‡
1;48†85;4)485†528806*81(‡9;48;(88;4
(‡?34;48)4‡;161;:188;‡?;

It was generated using a simple substitution algorithm. It is nothing new, but I tried solving it without looking at a known solution.

To decrypt this message, try using the relative frequencies of the letters. The relative frequencies of the letters are as follows (letter frequency): 

8 0.167
; 0.132
4 0.093
) 0.078
‡ 0.074
* 0.069
5 0.059
6 0.054
( 0.044
† 0.039
1 0.034
0 0.029
2 0.025
9 0.025
3 0.020
: 0.020
? 0.015
¶ 0.010
. 0.005
— 0.005
] 0.005

The most frequently occurring letter in English is e. Therefore, assume that one of
8, ;, and 4 
stands for e.

ee is among the most common repeated letters. Therefore, try using the counts of
88, ;;, and 44: 

88 5
;; 1
44 0

Given the counts, assume
8 stands for e. 

Replace
8 by e, 
and the text now looks as follows:

53‡‡†305))6*the26)h‡.)h‡)te06*the†e
¶60))e5tt]e*t:‡*e†e3(ee)5*†th6(tee*96
*?te)*‡(the5)t5*†2:*‡(th956*2(5*—h)e
¶e*th0692e5)t)6†e)h‡‡t1(‡9the0e1te:e‡
1the†e5th)he5†52ee06*e1(‡9thet(eeth
(‡?3hthe)h‡t161t:1eet‡?t

The second most frequently occurring letter in English is t. Therefore, assume that one of
; and 4 
stands for t.

The most frequently occurring trigram in English is "the." Therefore, try using the counts of trigrams starting with
; or 4 
and ending with e:

;4e 7
;ee 1
;(e 1
;]e 1
4)e 1

Given the counts, assume
; stands for t and 4 stands for h. 

Replace
; by t and by h, 
and the text now looks as follows:

53‡‡†305))6*;h826)h‡.)h‡);806*;h8†8
¶60))85;;]8*;:‡*8†83(88)5*†;h6(;88*96
*?;8)*‡(;h85);5*†2:*‡(;h956*2(5*—h)8
¶8*;h069285);)6†8)h‡‡;1(‡9;h8081;8:8‡
1;h8†85;h)h85†528806*81(‡9;h8;(88;h
(‡?3h;h8)h‡;161;:188;‡?;

Based on English letter frequencies,
), ‡, *, 5, and 6 
are likely to stand for a, o, i, n, s, or r (sorted by frequency). Moreover, excluding th, he, and er, the most frequently occurring digrams in English are in and an, both of which end with n. Therefore, try finding the letter standing for n. To do so, try using the counts of digrams containing only
), ‡, *, 5, and 6: 

6* 5
5* 3
5) 3
)) 2
*‡ 2
‡‡ 2
)6 2
...(each of the rest has only one count) 

Given the counts and letter frequencies, assume
* stands for n, 6 stands for i, and 5 stands for a. 

Replace
* by n, 6 by i, and 5 by a, 
and the text now looks as follows:

a3‡‡†30a))inthe2i)h‡.)h‡)te0inthe†e
¶i0))eatt]ent:‡ne†e3(ee)an†thi(teen9i
n?te)n‡(thea)tan†2:n‡(th9ain2(an—h)e
¶enth0i92ea)t)i†e)h‡‡t1(‡9the0e1te:e‡
1the†eath)hea†a2ee0ine1(‡9thet(eeth
(‡?3hthe)h‡t1i1t:1eet‡?t

Excluding the, tha, and ent, the most frequently occurring trigram in English is "and." Therefore, try using the counts of trigrams starting with an to find the letter standing for d:

an† 2
an— 1

d's relative frequency in English text is approximately 0.043. Therefore, assume
† stands for d. 

Replace
† by d, 
and the text now looks as follows:

a3‡‡d30a))inthe2i)h‡.)h‡)te0inthede
¶i0))eatt]ent:‡nede3(ee)andthi(teen9i
n?te)n‡(thea)tand2:n‡(th9ain2(an—h)e
¶enth0i92ea)t)ide)h‡‡t1(‡9the0e1te:e‡
1thedeath)heada2ee0ine1(‡9thet(eeth
(‡?3hthe)h‡t1i1t:1eet‡?t

Go back to the list of
), ‡, *, 5, and 6 
and a, o, i, n, s, and r. Remove
*, 5, and 6 
and a, i, and n.
) and ‡ 
are likely to stand for o, s, or r. Among o, s, and r, o is often seen in pairs, and so is s, while r is not. Therefore, try using the counts of
)) and ‡‡: 

)) 2
‡‡ 2

Given the counts, assume that none of
) and ‡ 
stands for r.

Look at the first time when
)) or ‡‡ 
occurs:

a3‡‡d30a... 

Assume the first a is an article. Given
‡‡d, 
assume
‡ stands for o 
instead of s.

Replace
‡ by o and ) by s, 
and the text now looks as follows (with guessed spaces):

a 3ood 30ass in the 2isho.shoste0 in the de
¶i0sseatt]ent:onede3(ees and thi(teen9i
n?tesno(th east and 2:no(th9ain2(an—hse
¶enth0i92 east side shoot 1(o9the0e1te:eo
1 the deaths head a2ee0ine1(o9thet(eeth
(o?3h the shot 1i1t:1eeto?t

Look at
...thi(teen9i... 
Assume
thi(teen 
stands for thirteen - assume
( stands for r. 

Replace
( by r, 
and the text now looks as follows (with guessed spaces):

a 3ood 30ass in the 2isho.shoste0 in the de
¶i0sseatt]ent:onede3rees and thirteen 9i
n?tes northeast and 2: north 9ain2ran—hse
¶enth0i92 east side shoot 1ro9the0e1te:eo
1 the deaths head a2ee0ine1ro9thetreeth
ro?3h the shot 1i1t:1eeto?t

The most frequently occurring digram in English whose letters have not both be found is ou. Try using the counts of digrams starting with o:

o9 2
o? 2
o1 1
o. 1

Assume one of
9 or ? 
stands for u. Look at
...9ain2ran—hse... 
It is highly unlikely that
9 stands for u. 
Therefore, assume
? stands for u. 

Replace
? by u, 
and the text now looks as follows(with guessed spaces):

a 3ood 30ass in the 2isho.shoste0 in the de
¶i0sseatt]ent:onede3rees and thirteen 9inutes
northeast and 2: north 9ain2ran—hse
¶enth0i92 east side shoot 1ro9 the 0e1te:eo
1 the deaths head a2ee0ine1ro9 the
tree throu3h the shot 1i1t:1eetout

Look at
...throu3h... 
Assume
throu3h 
stands for through - assume
3 stands for g. 

Replace
3 by g, 
and the text now looks as follows (with guessed spaces):

a good g0ass in the 2isho.shoste0 in the de
¶i0sseatt]ent:one degrees and thirteen 9inutes
northeast and 2: north 9ain2ran—hse
¶enth0i92 east side shoot 1ro9the0e1te:eo1
the deaths head a2ee0ine1ro9
the tree through the shot 1i1t:1eetout

Look at
...9inutes... 
Assume
9inutes
stands for minutes - assume
9 stands for m. 

Replace
9 by m, 
and the text now looks as follows (with guessed spaces):

a good g0ass in the 2isho.shoste0 in the
de¶i0s seat t]ent: one degrees and thirteen minutes
northeast and 2: north main2ran—hse
¶enth0im2 east side shoot 1romthe0e1te:eo1
the deaths head a2ee0ine1rom
the tree through the shot 1i1t:1eetout

Look at
...one degrees... 
Based on grammar, assume some part of
t]ent: one
stands for a number. Assume it stands for twenty-one (- not included in the text) - assume
] stands for w and : stands for y. 

Replace
] by w and : by y, 
and the text now looks as follows (with guessed spaces):

a good g0ass in the 2isho.shoste0 in the
de¶i0s seat twentyone degrees and thirteen minutes
northeast and 2y northmain2ran—hse
¶enth0im2 east side shoot 1rom the 0e1t eye o1
the deaths head a2ee0ine 1rom
the tree through the shot 1i1ty1eetout

Assume
1rom 
stands for from - assume
1 stands for f. 
Replace
1 by f, 
and the text now looks as follows (with guessed spaces):

a good g0ass in the 2isho.shoste0 in the
de¶i0s seat twentyone degrees and thirteen minutes
northeast and 2y north main2ran—hse
¶enth0im2 east side shoot from the 0eft eye of
the deaths head a2ee0ine from
the tree through the shot fifty feet out

Assume
0eft 
stands for left - assume
0 stands for l. 
Replace
0 by l, 
and the text now looks as follows (with guessed spaces):

a good glass in the 2isho.shostel in the
de¶ils seat twentyone degrees and thirteen minutes
northeast and 2y north main 2ran—hse
¶enthlim2 east side shoot from the left eye of
the deaths head a2eeline from
the tree through the shot fifty feet out

Assume
de¶ils 
stands for devils - assume
¶ stands for v. 
Replace
¶ by v, 
and the text now looks as follows (with guessed spaces):

a good glass in the 2isho.shostel in the
devils seat twentyone degrees and thirteen minutes
northeast and 2y north main 2ran—h seventh
lim2 east side shoot from the left eye of
the deaths head a2eeline from
the tree through the shot fifty feet out

Given that there is "eye" in the text,
lim2 
is likely to stand for limb. Therefore, assume
2 stands for b. 

Replace
2 by b, 
and the text now looks as follows (with guessed spaces):

a good glass in the bisho.s hostel in the
devils seat twentyone degrees and thirteen minutes
northeast and by north main bran—h seventh
limb east side shoot from the left eye of
the deaths head a bee line from
the tree through the shot fifty feet out

Assume
bisho.s stands for bishops 
and
bran—h stands for branch - 
assume
. stands for p and — stands for c. 

Replace
. by p and — by c, 
and we obtain the decoded message:

A good glass in the bishop's hostel in the devil's seat twenty-one degrees and thirteen minutes northeast and by north main branch seventh limb east side shoot from the left eye of the death's head a bee line from the tree through the shot fifty feet out. 

4 comments:

  1. 62218))6*;4826)4‡‡‡.)4ööô);806*;48†8¶60))85;;]8*;:/*8†ññ3 [88)5*;46([;88*96*?;8)*[;485);5*2:*;4956*2[5*)88* ;4069285);)68)4;1[9;48081;8:81;48†85;4)485528806*81[9;48;[88;4[?34;48)4;161;:188;?;

    how to decrypt it?

    ReplyDelete
  2. Very good information. It will be helpful for too many people that are looking for this topic. Keep doing awesome rock blog.

    Best PTE institute in ambala
    Study visa consultants in ambala,
    Best IELTS Institute in Ambala

    ReplyDelete