Last month, some remarkable news surfaced: the fabled Voynich manuscript had finally been decoded by artificial intelligence. But it turns out that the content of the cryptic 600-year-old, 240-page document—written in a language nobody has ever seen before or since—may still be a mystery.
Scholars and Voynich experts quickly sought to set the record straight, expressing doubts over the accuracy of the research methodology applied in the 2016 paper that claimed to have cracked the ancient puzzle.
Most scholars agree that the manuscript is written in a substation cipher, a simple code in which certain letters of the alphabet are interspersed with made-up ones. The problem that has confounded researchers for centuries is that nobody knows what language (or alphabet) the document was originally written in.
In Decoding Anagrammed Texts Written in an Unknown Language and Script, written by computer science professor Grzegorz Kondrak and graduate student Bradley Hauer of the University of Alberta in Canada, the researchers developed a methodology for finding the source language of ciphered texts and then tested the algorithm they developed on the ancient manuscript. In the end, Kondrak and Hauer concluded that the Voynich was originally written in Hebrew. But scholars are questioning the methodology they used to arrive at this conclusion.
Kondrak and Hauer wrote an algorithm to identify certain patterns in texts such as how often each letter or combination of letters appears. Next they used the Universal Declaration of Human Rights (which is translated into 380 languages) as a sample text to teach the algorithm to identify the original language of a text encrypted with substitution ciphers—which worked—but when they turned the algorithm on the Voynich manuscript, problems began to emerge with some of their underlying assumptions.
Lisa Fagin Davis, executive director of the Medieval Academy of America, told The Verge that an algorithm trained to identify modern languages cannot reliably be used to identify the language of a document that has been carbon dated to the 15th century. “The grammar, spelling, and vocabulary would have been quite different, especially for a manuscript like the Voynich that is scientific in nature,” Davis said.
In addition, Kondrak and Hauer’s algorithm merely produces suggestions for potential matches but doesn’t evaluate the likelihood of these matches. Beyond that, the researchers based their analysis on a debatable theory that the Voynich is also encoded in anagrams, a hypothesis that has been suggested before but which is not supported by scholarly consensus.
Kondrak and Hauer admit in the paper that they had to make some adjustments for the translation to make sense in Hebrew, writing that the first attempt was “not quite coherent.” The adjustments included making “a couple of spelling corrections” before using Google Translate to convert the text into English. “Any time you have to resort to Google Translate over someone who has actually studied the language, you’re going to lose some credibility,” Fagin said.
Speaking to The Verge, professor Shlomo Argamon, a computational linguist at the Illinois Institute of Technology, concluded that “their method… gives them huge latitude in doing this sort of impressionistic interpretation. They take this decoded sentence, squint at it through thick eyeglasses, and say that’s good enough for us.” Nick Pelling, a Voynich expert who’s written several books on the mysterious document goes even further, saying that the paper’s likelihood of being correct is “So close to 0% as makes no practical difference.”