已收录 272378 条政策
 政策提纲
  • 暂无提纲
Computational Decipherment of Unknown Scripts Open Access
[摘要] Algorithmic decipherment is a prime example of a truly unsupervised problem. This thesis presents several algorithms developed for the purpose of decrypting unknown alphabetic scripts representing unknown languages. We assume that symbols in scripts which contain no more than a few dozen unique characters roughly correspond to the phonemes of a language, and model such scripts as monoalphabetic substitution ciphers. We further allow that an unknown transposition scheme could have been applied to the enciphered text, resulting in arbitrary scrambling of letters within words (anagramming). We also consider the possibility that the underlying script is an abjad, in which only consonants are explicitly represented. Our decryption system is composed of three steps. The first step in the decipherment process is the identification of the encrypted language. We propose three methods for determining the source language of a document enciphered with a monoalphabetic substitution cipher. The best method achieves 97% accuracy on 380 languages. The second step is to map each symbol of the ciphertext to the corresponding letter in the identified language. We propose a novel approach to deciphering short monoalphabetic substitution ciphers which combines both character-level and word-level language models. Our method achieves a significant improvement over the state of the art on a benchmark suite of short ciphers. The third step is to decode the resulting anagrams into readable text, which may involve the recovery of unwritten vowels. Our approach obtains an average decryption word accuracy of 93% on a set of 50 ciphertexts in 5 languages. Finally, we apply our new techniques to the Voynich manuscript, a centuries-old document written in an unknown script, which has resisted decipherment despite decades of study.
[发布日期]  [发布机构] University of Alberta
[效力级别] Natural Language Processing [学科分类] 
[关键词]  [时效性] 
   浏览次数:3      统一登录查看全文      激活码登录查看全文