|
More on
Translation
and Interpretation
|
|
|
|
M A C H I N
E T R A N S L
A T I O N
What is machine translation?
Machine translation (MT) is the use of computer software to translate text or speech from one
natural language into another. Like translation done by humans, MT does not simply involve
substituting words in one language for another, but the application of complex linguistic
knowledge: morphology (how words are built from smaller units of meaning), syntax (grammar),
semantics (meaning), and understanding of concepts such as ambiguity.
Research and development of machine translation has been going on since the 1950s, engaging
some of the best minds in computing, linguistics and artificial intelligence. Steve Silberman writes: The dream of translation by computer is older than the high tech industry itself. Before email,
before word processing, before command-line interfaces, machine translation - or MT - was one of the first two computer applications designed to act upon words instead of numbers (the other was code breaking)…But it turns out that really good MT is so hard to pull off that the task exhausted the top-end computing resources of every generation attempting it. Regardless, machine translation R&D is going stronger than ever, fired up by the globalization of the Net. Today, all over the world, software designers, programmers, hardware engineers, neural-network experts, AI specialists, linguists, and cognitive scientists are enlisted in the effort to teach computers how to port words and ideas from language to language.
("Hello,
World," Wired, May 2000)
As our environment becomes more networked and connected internationally, the call for MT increases. Researchers predict that in the very near future English will no longer be the mother tongue of the majority of Internet users. According to
Wired, Forrester Research estimates that by 2003 Americans will account for only one third of Internet users worldwide.
("Hello,
World,"
Wired, May 2000)
Already the amount of material needed in different language versions is too vast for human translation alone, according to
Systran, one of the oldest machine translation companies. MT is a long way from being able to replace human translation, and many experts feel it may never do so. But it can reduce the amount of work for human translators by taking over translations where accuracy is not essential, and by assisting humans
with more important translation jobs.
MT offers some real advantages: according to Systran, MT is much faster than human translation (humans can translate 2000 - 3000 words a day, while Systran’s MT software can translate 3700 words a minute). MT is much cheaper than human translation. MT software has a better memory than human translators: it can store translated documents and re-use phrases that have already been translated.
The accuracy of MT is much lower than competent human translation, but can be improved in certain ways – for example, by ensuring that spelling and punctuation are all correct in the original text. When used in conjunction with human translators – to provide a first draft which is then given to a human for polishing, MT can save time and money.
The following resources offer a good general introduction to machine translation:
-
Steve Silberman,
"Talking to
Strangers," Wired, May 2000. A good history of the conception, development and current state of machine translation.
-
"Machine Translation’s Past and
Future," Wired, May 2000. A timeline of the history and future of machine translation.
-
"Universal
Translators," Wired, May 2000. A listing of machine translation research and development hubs worldwide.
-
D.J. Arnold, Lorna Balkan, Siety Meijer, R.Lee Humphreys and Louisa Sadler,
Machine Translation: an Introductory
Guide, Londong: Blackwells-NCC, 1994. A comprehensive book about machine translation, available online.
-
Links on MT.
Research centers, products and software.
-
Links on MT.
Research centers, companies and articles online.
How close are we?
For the last several years text-to-text translation software has been available from a variety of companies, and many offer free online services.
Wired has a directory of translation tools online: "Sites+Sounds,"
May 2000. Currently available software has limited functionality: the garbled English resulting from messages translated from English through a couple other languages and back to English has provided material for many jokes.
Machine translation has proved useful in two fields primarily: as an aid for human translators, and for translating material on a restricted subject matter.
First, as an aid for human translators working on material which
must be accurately translated, MT can save time by producing a first draft.
Second, MT can produce fairly accurate translations when the domain of discourse is highly restricted: when syntax is simplified, vocabulary is predictable and each word is likely to mean one and only one thing: technical documents, equipment maintenance manuals, weather reports, etc. “The classic example of MT that works is the Météo system, developed in Montreal, which has been translating Canada's weather bulletins between English and French on a daily basis since 1977. In the world of Météo discourse, ‘front’ always means a weather system. The translation of forecasts was so boring that before Météo took over, the Canadian government had a hard time keeping translators on the job for more than a couple of months.” (Steve
Silberman, "Talking to
Strangers," Wired, May 2000)
The current situation is far from the perfect and instantaneous translations most people have in mind when they think of MT. However,
creating a “universal translator” may be impossible and the key to success in machine translation may lie in setting more realistic goals. Experts disagree about just what is possible:
Many experts believe that instantaneous MT will arrive in less than 10 years, as humans coevolve with the technology and adapt to its inherent weaknesses. Others are convinced that only sweeping breakthroughs in computer architecture will turn our PCs and PDAs into Universal Translators. In the meantime - thanks to innovations in speech recognition products like Dragon Systems' NaturallySpeaking, even better MT technologies, and continuing R&D at places like AT&T and Carnegie Mellon - we are inching closer to the kind of seamless MT that was first envisioned nearly half a century ago.” (Steve
Silberman, "Hello,
World,"
Wired, May 2000)
Problems with machine translation
Machine translation works quite well for translating predictable technical texts
– texts which never go beyond the expected domain of discourse. But this is little help in the domains where people want translation the most: for spontaneous conversations, in person, on the telephone, and on the Internet.
Computers just do not have the ability to deal adequately with the various complexities of language than humans handle naturally: ambiguity, syntactic irregularity, multiple word meanings and the influence of context. A classic example is illustrated in the following pair of sentences:
Time flies like an arrow.
Fruit flies like an apple.
The sentence construction is parallel, but the meanings are entirely different: the first is a figure of speech involving a metaphor and the second is a literal description. And the identical words in the sentences
- flies and like - are used in different grammatical categories. A computer can be programmed to understand either of these examples, but not to distinguish between them.
A computer translation is similar to a translation done by a human without a deep knowledge of the target language. Grammatical rules can be memorised, or programmed. But without real knowledge of a language, a human or a computer simply looks up words in a dictionary and has no way to select between alternate meanings. Alan Melby,
professor of linguistics at Brigham Young University, points out that “Being a native or near-native speaker involves more than just memorizing lots of facts about words. It includes having an understanding of the culture that is mixed with the language. It also includes an ability to deal with new situations appropriately. No dictionary can contain all the solutions since the problem is always changing as people use words in unusual ways.”
("Why Can’t a Computer Translate More Like a Person?")
Another classic example of the difficulties of MT was provided in 1960 by Bar-Hillel, an early machine translation researcher. With the seemingly simple sentence
The box is in the pen he pointed out “that to decide whether the sentence is talking about a writing instrument pen or a child's play pen, it would be necessary for a computer to know about the relative sizes of objects in the real world… The point is that accurate translation requires an understanding of the text, which includes an understanding of the situation and an enormous variety of facts about the world in which we live.” Computers cannot translate like humans because they do not learn like humans. (Alan
Melby, "Why Can’t a Computer Translate More Like a Person?")
Silberman quotes Martin Kay, an MT developer: MT is an “AI-complete problem.” You have to solve all of the various difficulties of imbuing computers with the kind of knowledge that humans naturally harvest from experience before you can tackle the essential problem of MT. “When you want to hire a translator,” Kay explains, “you ask, ‘How good is your Chinese? How good is your French?’ You don't ask, ‘Have you been around much in the world? The problem is, machines haven't. In order to understand a sentence, your knowledge of linguistics is a relatively minor matter. Your knowledge of the world is incredibly important.”
("Talking to
Strangers," Wired, May 2000) Computers not only lack the knowledge of the world to deal with word choice, but they also lack the knowledge necessary for cultural sensitivity. Melby writes that translation needs to be “sensitive to total context, including the intended audience of the translation. Meaning is not some abstract object that is independent of people and culture.” As an example of the damage that can be done by culturally ignorant and insensitive translation, even by humans, he describes his investigation of the translation of a remark made by Nikita Khrushchev in Moscow on November 19, 1956: Khrushchev was then the head of the Soviet Union and had just given a speech on the Suez Canal crisis. Nassar of Egypt threatened to deny passage through the canal. The United States and France moved to occupy the canal. Khrushchev complained loudly about the West. Then, after the speech, Khrushchev made an off-hand remark to a diplomat in the back room. That remark was translated “We will bury you” and was burned into the minds of my generation as a warning that the Russians would invade the United States and kill us all if they thought they had a chance of winning…Several months ago, I became curious to find out what Russian words were spoken by Khrushchev and whether they were translated appropriately…In Soviet Communist rhetoric, it is common to claim that history is on the side of Communism, referring back to Marx who argued that Communism was historically inevitable. Khrushchev then added that Communism does not need to go to war to destroy Capitalism. Continuing with the thought that Communism is a superior system and that Capitalism will self-destruct, he said, rather than what was reported by the press, something along the lines of ‘Whether you like it or not, we will be present at your burial,’ clearly meaning that he was predicting that Communism would outlast Capitalism. Although the words used by Khrushchev could be literally translated as “We will bury you,” (and, unfortunately, were translated that way) we have already seen that the context must be taken into consideration. The English translator who did not take into account the context of the remark, but instead assumed that the Russian word for “bury” could only be translated one way, unnecessarily raised tensions between the United States and the Soviet Union and perhaps needlessly prolonged the Cold War.
("Why Can’t a Computer Translate More Like a Person?") Melby believes that the reason computers cannot translate like humans lies in one factor: their lack of agency. By agency, he means the capacity to make real choices by exercising our will, ethical choices for which we are responsible… A computer has no real choice in what it will do next. Its next action is an unavoidable consequence of the machine language it is executing and the values of data presented to it…Without agency, information is meaningless. So a computer that is to handle language like a human must first be given agency. But we should be careful, because if we give agency to a computer it may be hard to get it back and the computer, even if it chooses to learn a second language, may exercise its agency and refuse to translate for us. Douglas Robinson (1992) puts it well. He asks whether a machine translation system that can equal the work of a human might not "wake up some morning feeling more like watching a Charlie Chaplin movie than translating a weather report or a business letter.”
("Why Can’t a Computer Translate More Like a Person?")
Relevance to diplomacy
Translation software has an obvious relevance to diplomacy. However, due to the sensitive nature of much of diplomatic communication, accuracy in translation is of high importance and
it is not likely to be trusted to computers in the near future. However, machine translation systems designed to deal with limited domains of discourse and limited vocabulary may
become useful for diplomacy and international organisations in the near future. Silberman describes a hand-held machine translation system called “Diplomat,” under development by Carnegie Mellon for translating directions through a minefield. In this environment accuracy is a matter of life and death, but the vocabulary is highly limited. Diplomat is rapid-deployment speech-to-speech MT for the front lines in a world of volatile hot zones. Running on a lightweight Pentium notebook, Diplomat was Carnegie Mellon's answer to a challenge from Darpa to develop MT systems for new language pairs that could be up and running in a couple of weeks, when there's not enough time for constructing an elaborate world model or coding in thousands of linguistic rules. There was a particular language pair at the top of Darpa's agenda: Croatian and English. The system had to translate in both directions. It had to have a memory footprint small enough to be wedged into a portable device. And the interface had to be comprehensible by someone who had never seen a computer - a Bosnian farmer, for instance.
("Talking to
Strangers," Wired, May 2000)
|