The
Open Internet Lexicon
(OIL) is an initiative to build a dictionary of Web terms (words and short phrases) in many languages. Our goal is to reflect current Internet and Web usage in many countries. The dictionary will be open for all who are building multilingual web sites or single-language web sites. Look at
what we have done so far
.
The combination of a language and a country is known as a "locale," e.g., Portuguese-Brazil or French-Canada. The Open Internet Lexicon initiative is looking for skilled translators in each locale who would like to work as "localizers." We hope to have at least one person in each locale translating a large collection, perhaps a few thousand, words and phrases.
|
Localizers should be fluent in English. Usage comments for a particular phrase are written in English. They must also know the idiomatic language of the locale, of course. The final qualification is that they must be familiar with current Internet and Web terminology in that locale. Preferably they reside in the locale, and have a good Internet connection there.
|
The Open Internet Lexicon runs on a database-backed web site using
skyBuilders.com
timeLines technology. skyBuilders
webWare
has an interface in which every text element - button labels, column headings, row names, hyperlink text, captions, etc. - is stored in the database in languages for multiple locales. The multilingual text tables are part of the industry-standard Open DataBase Model (ODBM). For further information, go to
www.opendatabasemodel.com
.
Commentaries, criticisms, and suggestions are welcome for new words and phrases which may be needed by web developers who want their software to operate in many countries. Web developers and software developers are free to use the resulting text in their work. They may download the Open Internet Lexicon terminology database in a spreadsheet or text file format. Various database formats are also available, as well as a set of SQL commands that create and populate the database tables.
Open Internet Lexicon is no substitute for good dictionaries, machine translation programs, and human translators. It is simply a database repository for the short phrases that are needed on a web page to label buttons and indicate actions. The shorter the phrase the better. Particular words are likely to be highly idiomatic, even with the intense pressure to standardize web terminology around the world.
|
|
Dictionaries
There are several multilingual dictionaries on the web. Many have some web terminology. None include today's rapidly changing set of web page terms and phrases. The
EuroDicAutom
is a massive project of the European Commission. It has 5.5 million entries in twelve languages (Danish, Dutch, English, Finnish, French, German, Greek, Italian, Latin, Portuguese, Spanish, and Swedish) grouped by field (Agriculture, Aviation, etc.). The
Logos
dictionary is being compiled in over 30 languages by localizers around the world. It now has 7.5 million entries, but there are very few basic Web terms. Some excellent Internet/Web specific efforts are the French-English
Terminologie d'Internet
, and
NetGlos
, which stopped development in 1997.
Glossaries
(subject-area specific dictionaries)
The
Human-Languages Page
, by Tyler Chambers, has links to nearly 2000 language-related web sites, including a few hundred glossaries. The
Translator's Home Companion
also has hundreds of glossaries.
Babylon.com
translates single words into 16 languages and has glossaries of many subjects in multiple languages.
GlossPost
is a searchable list of glossary web addresses (URLs) maintained by
Maria Eug�nia Farr�
.
A trilingual Internet-specific glossary is available at the
Canadian Bureau of Translation
. There have been many great software localization projects in the past, usually specific to a particular computer language or operating system. Their glossaries are also valuable references for Internet and Web. Check out
Apple
,
Microsoft
,
SGI
, and
Sun
. There are language-specific localization tools for
Perl
, for C, and for
Java
.
Translators
The most comprehensive online source for information about translators is Gabe Bokor's
Translation Journal
. It has
hotlinks
to
translators organizations
,
databases of translators
,
on-line glossaries and dictionaries
,
discussion groups
, and much more. The
American Translators Association
and the
Northern California Translators Association
list thousands of translators, searchable by language pair and subject area. The pioneer in translation memory tools, TRADOS, sponsors
www.translationzone.com
. It lists hundreds of freelance translators and localizers. In Europe,
Aquarius
is a portal to translators.
Glenn's Guide to Translation Agencies
lists a few hundred translation agencies.
Resources for Translators
is a database of translation rates, with testimonials and complaints about 5000 different translation agencies worldwide and reviews of translation tools.
Localizers
There are many companies who specialize in localizing web pages. Some have proprietary tools that produce multiple-language draft ("gist") machine translations (MT). All of them have many human translators (some in-house, some freelance) who finish the localizations. Among the world leaders are
Berlitz GlobalNET
,
Lionbridge
,
Lernout & Houspie
, and
Bowne
. But the biggest of them has less than a one percent share of the $10 billion world wide business of web localization and translation.
Yahoo
lists some 70 web translation services and tools.
Multilingual
magazine has a similar number of language translation vendors. The smaller companies generally do not have proprietary tools. They use industry-standard tools like Trados with translation memory (TM) that can be retained by their clients (in TMX format) so they are not captive to any one localization vendor.
Globalizers
Beyond localization, there is
internationalization
(abbreviated as I18N, localization is L10N). Internationalization implies one system that can respond to requests in large numbers of locales. This capability is often marketed as Globalization (G11N). Globalization companies promise multilingual web site management that will some day handle every locale. Leading globalizers, like
eTranslate.com
,
Idiom Technologies
, and
GlobalSight
, use content management tools to distribute the translation workflow to their partner localizers around the world.
Surprisingly, only one of the large companies boasting globalization technology has a web site that responds to a browser request in a language other than English (and that one only in French,
try it
- you must
set your browser to French
).
There are 139 two-letter language codes in
ISO 639
and 239 country codes in
ISO 3166
. An expanded list of
three-letter language codes
is available at GlossPost. So there can be a very large number of locales. Probably 50 of them account for 98% of
web activity today
. Will web globalization technology be able to handle them all?
|
The World Wide Web has broken down the barriers of space and time.
The only remaining barrier is language.
|
Babel
|
There are two directions we can go to break down the language barrier. One is to realize the panglossian dream of an ideal universal language understandable by all. The other is to translate everything of importance - at least the gist in time and maybe someday le mot juste in time - into every language that has interested readers. In
Europe today
some are lobbying for English as that universal language, in computers and communications especially, and in the economics and business of globalized world markets. Others look to simultaneous over-the-Internet machine-assisted translations into many languages.
Machine Translation
Babel, a joint I18N project between
The Internet Society
and
Alis Technologies
, is an ambitious effort to allow the browser to work in any language with just-in-time translations and character code conversions (to the 16-bit unicode needed by non-Western languages). Alis "Gist-in-time®" is incorporated in the
Netscape 6 browser
, and their other components make the Windows OS multilingual.
Babelfish
(now
babelfish.altavista.com
) is the best known machine translation service on the web. It provides immediate online translations from French or English into 5 European languages. You can test their translations of some basic web phrases
here
. The underlying translation technology is by
Systran
. It can be purchased for use on your web site.
Transparent Language
also offers immediate online translations. Another online translation service is
InterTran
from TranslationExperts, Ltd. They offer a few dozen language pairs, and an interesting sentence diagram, with optional translations for key words and word rearrangement.
GPLTrans
is an open-source translation engine.
The best portal site with access to 22 machine translation sites working in more than 50 languages is
foreignword.com
. They also offer links to 178 online dictionaries, 1001 glossaries, and hundreds of translators.
Most of the free web translation sites also let you enter a URL and they will translate a whole web page for you. Many millions of web pages have been translated by all these free services. They will play a big role in the future of the multilingual Web.
At the other end of the spectrum from free web machine translations, large corporations are buying huge "enterprise translation servers." These will be centrally located and provide over-the-Internet translations to their users from any browser. Large translation companies hope to sell real-time web-based translation services, either by translating web pages on-the-fly as users request them, or by returning translated web pages to the smaller company's web server. Typical rates are pennies per word for the raw machine translations, and $.25 per word for human-corrected text. These companies have armies (thousands) of translators, who will be able to receive the gist machine translation and return a human translation to the server (or to a client) with very fast turnaround times (note that many translators proficient in the source and target languages find the bland computer-speak gist a waste of their time). The entire translation business - quotations, scheduling, translations, approvals, and billing, will be conducted over the web. Translation suppliers may never meet their customers. Translation will be a universal web application, working on web application servers, at application service providers (ASPs).
Systran Enterprise
costs about $5000 for a single language pair and five users, $32,000 for eight language pairs and 20 users.
Lernout&Hauspie
say their iTranslator Enterprise is coming soon. The
Transparent Language
MT engine is called TranscendRT. Their Enterprise Translation Server is priced starting at about $17,000. The
IBM WebSphere Translation Server
offers translations at 500 words per second from English to FIGS and CJK languages for $10,000 per language pair per CPU.
Wordstream
is developing a multilingual translator for the Internet called ClearText. It can translate a stream of text (like a news feed) into multiple languages in real time. No prices have been announced.
All these efforts at
machine translation
(MT) have received
mixed reviews
. Leading companies that have invested heavily in MT have
done badly
as the Internet frenzy has cooled and dot-com investments are on hold by many venture capitalists. Over a billion dollars has been invested in machine translation, mostly by the U.S. Defense Department and CIA. Much work has been done in universities.
Carnegie Mellon University
is a recognized leader in MT. The European Commission Systran MT (a distant relative of U.S.
Systran
) machine translates about a million pages a year into its 11 official languages. Apart from those with large government contracts, and science-fiction fans looking for
Douglas Adams' Babelfish
or
Star Trek
fans after a Universal Translator, knowledgeable observers doubt that machine translation will ever translate the subtle nuances in everyday language. Machines are now seen as aids to humans for
Computer-Aided Translations
(CAT). Machines can provide the gist of a document for a localizer who does not know the source language.
Desktop Translation Programs
Even the most inexpensive desktop translation products can usually provide the gist. They also can be installed in your web browser. Lernout & Hauspie
Power Translator Pro
(formerly from Globalink) translates into 7 languages.
LanguageForce Universal Translator 2000
translates into 40 languages, with software keyboard support for each one. At Open Internet Lexicon we used these low-cost ($150) desktop tools to make the first draft translation in each language. We also consulted Babelfish and our dictionaries frequently.
To help us break the language barrier, you need a lot of knowledge.
Try reading some of the
great books
on localization, internationalization, and multilingual software.
|
|
By comparison with all the above, Open Internet Lexicon is much lower-level technology. It is just a simple and immediately useful web dictionary. All the words and concise phrases are Internet terminology suitable for web pages. A second difference is that Open Internet Lexicon is being developed and supported over the web, by web developers, and for web developers. This makes it possible for native speakers familiar with the evolving web in their cultures to keep the dictionary fresh and relevant in Internet time. Third, our database-backed system can easily handle hundreds of locales. Finally, it's open and free for all to use.
To join us as a localizer and get your locale added to our efforts, you must register with Open Internet Lexicon. Then fill out an application indicating your skills and interest. If you are approved, you will be given editing privileges in the terminology database for your locale. You will also have a web page on our site which you can edit to describe your work. And you will have a listing in our searchable database of OIL localizers.
If you want to join a team of localizers for a popular locale, you will be considered by the existing localizers. In any case, as an Open Internet Lexicon team member you will have privileges to comment and criticize the work in any locale.
Important Localization/Translation References
W3C Translations
W3C Translators Mailing List
W3C Translators Mailing List Archives
Localization Industry Standards Association
LISA Web Localization SIGs
(dated but some good references)
Books and Magazines