
The University Research Program for Google Translate provides researchers, in the field of automatic machine translation, tools to help compare and contrast with, and build on top of, Google's statistical machine translation system.
The research program provides a programmatic interface (API) allowing researchers to submit text for translation, and then receive back detailed results of that translation, including many hypothesis translations or word alignment information.
Detailed translations can be very useful to the research community in providing the ability to do detailed comparison of Google's system with other research systems as well as providing a platform on which other researchers can build.
The program supports all languages available publicly at translate.google.com.
Also note that in order to provide a reliable service, for each approved application, researchers may use this service for 2 years. After that period, researchers are certainly welcome to reapply.
For those who are anxious to get started, use this quick start guide. For a better understanding in how the research API works, continue reading the rest of the guide.
unzip translate_api_java.zip unzip gdata.java-1.12.0.zip unzip javamail-1_4.zip unzip jaf-1_1-fr.zip
translate_api directory
cd translate_api
javac *.java -classpath ../jaf-1.1/activation.jar:../javamail-1.4/mail.jar:../gdata/java/lib/gdata-client-1.0.jar:lib/translate-1.0.jar:.
java -classpath ../jaf-1.1/activation.jar:../javamail-1.4/mail.jar:../gdata/java/lib/gdata-client-1.0.jar:lib/translate-1.0.jar:. \
SimpleClient en ar "This is a test."
SimpleClient will ask you for your username and
password, use your Google account that was granted access to
the research API. Be sure to keep your Google account private. Do not share that account with others.
java -classpath ../jaf-1.1/activation.jar:../javamail-1.4/mail.jar:../gdata/java/lib/gdata-client-1.0.jar:lib/translate-1.0.jar:. \
DetailedClient en ar 10 "This is a test."
DetailedClient will ask you for your username and
password, use your Google account that was granted access to
the research API. Be sure to keep your Google account private. Do not share that account with others.
This document is intended for researchers participating in the University Research Program for Google Translate. Researchers in this program may use the research API described to write client applications that can interact with Google Translate.
Each example in this document first describes how the HTTP-level protocol works when requesting translations, then shows how to use the Java client library to issue those requests. If your client is written in another language, it is possible to use the HTTP-level protocol with any language that can handle HTTP requests and responses, but it may be more complicated to do so. For that reason, we highly recommend using the provided client library.
You agree to abide by the University Research Program for Google Translate Terms of Use when using the Google Translate Research API.
Translation requests are issued as a query (with a GET request) and the response is returned as an XML feed with either a single entry containing the translation (simple translations) or multiple entries containing each hypothesis translation (detailed translations).
This section will show you how to issue these queries and access the results, using either the HTTP-level protocol or the Google data API client library.
Using the Translate Research API:
Before you can access the Translate Research API you must create an account and request access to the Translate Research API for your account. Access to the research API is only granted to participants in the University Research Program for Google Translate. To participate in this program, please submit a proposal.
The Translate Research API uses Google Account Authentication to authenticate users and allow access to the research API only to approved participants in the University Research Program for Google Translate.
It is important to remember to keep your Google account secure when using the research API.
You can authenticate using the HTTP-level protocol or using the client library. The client library is highly recommended as it simplifies all aspects of using the research API.
To authenticate at the protocol level, send a POST to the following URL:
https://www.google.com/accounts/ClientLogin
The POST body should contain a set of query parameters, as
described in the following table. They should look like parameters passed
by an HTML form, using the application/x-www-form-urlencoded
content type.
| Parameter | Description |
|---|---|
Email |
The user's email address. |
Passwd | The user's password. |
source |
Identifies your client application. Should take the form
companyName-applicationName-versionID; below,
we'll use the name exampleCo-exampleApp-1. |
service |
The string rs2, which is the service name for Google
Translation. |
If the authentication request fails, you'll receive an HTTP 403
Forbidden status code.
If it succeeds, then the response from the service is an HTTP 200
OK status code, plus three long alphanumeric codes in the body of
the response: SID, LSID, and Auth. The Auth value is the authorization
token that you'll send to the Translate Research API with your request, so keep a copy of
that value. You can ignore the SID and LSID values.
Here is an example using curl to get an authentication
token:
curl -d "Email=username@domain&Passwd=password&service=rs2" https://www.google.com/accounts/ClientLogin
Make sure you remember to substitute in your
username@domain
and password. Also, be warned that your username
and
password may be stored in your history file (e.g.,
.bash_history) and you should take precautions to
remove it when finished.
Using the java client library, authentication is as simple as defining a TranslationService and adding your user credentials to that service.
// Create a TranslationService and include information about your
// application, "companyName-applicationName-versionID"
TranslationService service = new TranslationService("exampleCo-exampleApp-1");
// Create reader to get login info from user
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
// Get login information
System.out.print("Login: ");
String login = br.readLine();
System.out.print("Password: ");
String password = br.readLine();
// specify the user credentials requesting access
service.setUserCredentials(login, password);
When setUserCredentials is called it issues the request to
https://www.google.com/accounts/ClientLogin and then stores
your Auth value for later use. Then, when using the TranslationService to make
requests later on (covered in the next section),
your Auth value is automatically included.
Simple translations can be requested, which will provide you Google's best translation of your request text. You may issue simple translation requests using either the HTTP-level protocol or using the client library.
Using the HTTP-level protocol, simple translation requests are performed by sending a HTTP GET request to the following URL:
http://translate.google.com/researchapi/translate
The GET request should contain a set of query parameters, as
described in the following table:
| Parameter | Description |
|---|---|
sl |
The language of the source text (en|ar|zh). |
tl | The requested language to which to translate (en|ar|zh). |
q |
The source text to have translated. |
In addition to the standard query parameters to the GET request, authentication information must also be provided. Authentication information must be provided in the HTTP-header as follows:
Authorization: GoogleLogin auth=AUTH_TOKEN
where AUTH_TOKEN is a 160 character string returned from the ClientLogin URL as the Auth value, described above.
An example using curl to issue a translation request:
curl -H "Authorization: GoogleLogin auth=yourAuthToken" "http://translate.google.com/researchapi/translate?sl=en&tl=ar&q=This+is+a+test."
where yourAuthToken is the Auth value, described above
Following a successful translation, the research API will return an HTTP
200 OK status code and a feed containing a single entry which
includes the translated text. An example response might look something like
this:
<feed>
<id>http://translate.google.com/researchapi/translate</id>
<updated>2006-08-31T17:32:11.434Z</updated>
<title type="text">Translation Feed</title>
<gt:translation lang="en">This is a test</gt:translation>
<entry>
<id>http://translate.google.com/researchapi/translate/do_not_use</id>
<updated>2006-08-31T17:32:12.615Z</updated>
<title type="text">Translation</title>
<gt:translation lang="ar">هذا هو الاختبار</gt:translation>
</entry>
</feed>
Shown above, we can see the feed containing a
gt:translation element, which specifies the original source
text and language. Within that feed we see an entry containing a
gt:translation element, but in this case, representing the
translation text.
If your request fails for some reason, the research API may return a different status code; for information about the status codes, see the Protocol document - HTTP status codes.
Using the java client library, you can easily issue queries and process the response without having to deal with HTTP requests and XML parsing.
First, you must specify the appropriate URL to which you'll issue the queries, create a TranslationService and specify your user credentials (as described above), and specify the type of feed the service will return:
// Create a TranslationService and include information about your application,
// "companyName-applicationName-versionID"
TranslationService service = new TranslationService("exampleCo-exampleApp-1");
// Create reader to get login info from user
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
// Get login information
System.out.print("Login: ");
String login = br.readLine();
System.out.print("Password: ");
String password = br.readLine();
// specify the user credentials requesting access
service.setUserCredentials(login, password);
Once you've prepared the TranslationService, you can create a query and send that query to the service, retrieving a resulting feed containing one entry, with your translated text.
// Create a query with appropriate parameters
TranslateQuery query = new TranslateQuery("This is a test", "en", "ar");
// Send the query to the translation service, retrieving the resulting feed
TranslateFeed feed = service.query(query, TranslateFeed.class);
Now that the TranslateFeed has been retrieved for your translation request, verify it has at least one TranslateEntry, get that entry, and display relevant information from the entry.
// As long as there's at least one entry
if (feed.getEntries().size() > 0) {
// Get the entry
TranslateEntry entry = feed.getEntries().get(0);
// Display the translated string with other information
System.out.println("Feed title: " + feed.getTitle().getPlainText());
System.out.println("Feed updated: " + feed.getUpdated());
System.out.println("Entry title : " + entry.getTitle().getPlainText());
System.out.println("Entry updated: " + entry.getUpdated());
System.out.println("Translated Text: " + entry.getText() + "\n\n");
}
This code is included with the client toolkit as the
SimpleClient.
Two types of detailed translations can be requested. The first of which will provide you with a list of hypothesis translations of your requested text. This N-best list of hypotheses will include the total cost of each hypothesis, as well as the cost of each feature that makes up that total cost. The list will be ordered from best to worst hypothesis (lowest to highest cost).
The second type of detailed translation is to request alignment information. Alignment responses include information on which words in the resulting translation align to which words in the original text.
You may issue detailed translation requests using either the HTTP-level protocol or using the client library.
Using the HTTP-level protocol, detailed translation requests are performed by sending a HTTP GET request to the same URL as mentioned above:
http://translate.google.com/researchapi/translate
The GET request responds to the following query parameters:
| Parameter | Description |
|---|---|
sl |
The language of the source text (en|ar|zh). |
tl | The requested language to which to translate (en|ar|zh). |
q |
The source text to have translated. |
nbestoptional |
The number of hypothesis translations to request. |
alignoptional |
If this parameter is present (and not zero), alignment information is requested. |
The first three parameters are always required. Using either of the last two parameters constitutes a detailed translation request, either for a list of hypotheses or for alignment information, depending on which parameter is included.
Again, in addition to the standard query parameters to the GET request, authentication information must also be provided. Authentication information must be provided in the HTTP-header as follows:
Authorization: GoogleLogin auth=AUTH_TOKEN
where AUTH_TOKEN is a 160 character string returned from the ClientLogin URL as the Auth value, described above.
An example using curl to issue a translation request:
curl -H "Authorization: GoogleLogin auth=yourAuthToken" "http://translate.google.com/researchapi/translate?sl=en&tl=ar&q=This+is+a+test.&nbest=5&align=1"
where yourAuthToken is the Auth value, described above
Following a successful translation, the research API will return an HTTP
200 OK status code and a feed containing entries for each
hypothesis returned, each entry containing translated text as well as a set
of features describing the cost of that hypothesis. If requested, the
entries will also include alignment information. The number of entries
returned may or may not equal the number of hypothesis translations
requested as for some translations we only generate a small number of
hypothesis, usually short translations. Also, currently the research API will only allow
requests of up to 25 hypothesis translations.
An example response for a detailed N-best translation might look something like this:
<feed>
<id>http://translate.google.com/researchapi/translate</id>
<updated>2006-09-05T23:59:37.924Z</updated>
<title type="text">Translation Feed</title>
<gt:translation lang="en">This is a test.</gt:translation>
<entry>
<id>http://translate.google.com/researchapi/translate/do_not_use</id>
<updated>2006-09-05T23:59:41.500Z</updated>
<title type="text">Translation</title>
<gt:translation lang="ar">هذا هو المحك.</gt:translation>
<gt:feature id="TOTAL" score="1.345591"/>
</entry>
... snip ...
<entry>
<id>http://translate.google.com/researchapi/translate/do_not_use</id>
<updated>2006-09-06T00:27:24.269Z</updated>
<title type="text">Translation</title>
<gt:translation lang="ar">وهذا الاختبار.</gt:translation>
<gt:feature id="TOTAL" score="1.521322"/>
</entry>
</feed>
Shown above, we can see the feed containing a
gt:translation element, which specifies the original source
text and language. Within that feed we can see entries containing
gt:translation elements, but in this case, representing the
hypothesis translation text. Each entry also contains many
gt:feature's identifying a feature (by its id)
and the score given to that feature.
If your request fails for some reason, the research API may return a different status code; for information about the status codes, see the Protocol document - HTTP status codes.
An example response for a detailed alignment translation might look something like this:
<feed>
<id>http://translate.google.com/researchapi/translate</id>
<updated>2006-09-05T23:59:37.924Z</updated>
<title type="text">Translation Feed</title>
<gt:translation lang="en">This is a test.</gt:translation>
<entry>
<id>http://translate.google.com/researchapi/translate/do_not_use</id>
<updated>2006-09-05T23:59:41.500Z</updated>
<title type="text">Translation</title>
<gt:translation lang="ar">هذا هو الاختبار.</gt:translation>
<gt:alignment word="هذا" position="0"/>
<gt:alignment word="هو" position="5"/>
<gt:alignment word="اختبار." position="8"/>
<gt:alignment word="اختبار." position="10"/>
</entry>
</feed>
Shown above, we can see the feed containing a
gt:translation element, which specifies the original source
text and language. Within that feed we see an entry containing a
gt:translation element as well as
gt:alignment's identifying an alignment between a word in the
translation to a position in the original text.
If your request fails for some reason, the research API may return a different status code; for information about the status codes, see the Protocol document - HTTP status codes.
Using the java client library, you can easily issue queries and process the response without having to deal with HTTP requests and XML parsing.
First, you must specify the appropriate URL to which you'll issue the queries, create a TranslationService and specify your user credentials (as described above), and specify the type of feed the service will return:
// Create a TranslationService and include information about your application,
// "companyName-applicationName-versionID"
TranslationService service = new TranslationService("exampleCo-exampleApp-1");
// Create reader to get login info from user
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
// Get login information
System.out.print("Login: ");
String login = br.readLine();
System.out.print("Password: ");
String password = br.readLine();
// specify the user credentials requesting access
service.setUserCredentials(login, password);
Once you've prepared the TranslationService and the URL, you can build a query:
// Create a query with appropriate parameters
TranslateQuery query = new TranslateQuery("This is a test", "en", "ar");
Depending on whether you want many hypothesis translations or alignment information use on of the following:
// request 10 hypothesis translations
query.setDetailedRequest("10");
or
// request alignment information query.setAlignmentRequest();
Then, send that query to the service:
// Send the query to the translation service, retrieving the resulting feed TranslateFeed feed = service.query(query, TranslateFeed.class);
The resulting feed will contain either many entries, one for each hypothesis translation, or a single translation with alignment information.
Now that the TranslateFeed has been retrieved for your translation request, either loop through each entry and display the relevant information for each translation hypothesis.
// For each entry
for (int i=0; i<feed.getEntries().size(); i++) {
// Get the entry
TranslateEntry entry = feed.getEntries().get(i);
// Display the translated string with other information
System.out.println("\nEntry title : " + entry.getTitle().getPlainText());
System.out.println("Entry updated: " + entry.getUpdated());
System.out.println("N-best Translated Text: " + entry.getText());
// Display the scoring features
System.out.print("Scoring features: ");
for (ScoringFeature feature : entry.getScoringFeatures()) {
System.out.print(feature.getID() + "=" + feature.getScore() + " ");
}
System.out.print("\n");
}
or display the alignment information:
// Get entry
TranslateEntry entry = feed.getEntries().get(0);
// Display alignment information
System.out.print("Alignments: ");
for (TranslationAlignment alignment : entry.getTranslationAlignments()) {
System.out.print(alignment.getWord() + " (" + alignment.getPosition() + ") ");
}
System.out.print("\n");
This code is included with the client toolkit as the
DetailedClient.
The Google Translate Research API client library is currently for Java only and depends on the GData java client library. To use the library you will have to download four packages.
To use the client library, 1.) download the Google Translate Research API java client library here.
Extract the zip file. By default, the zip extracts to directory
translation_api, in which you will find:
| File | Description |
|---|---|
README.txt |
A README file describing how to build the provided example clients. |
SimpleClient.java |
Example client for simple translations. |
DetailedClient.java |
Example client for detailed translations. |
lib/translate-1.0.jar |
The Google Translate Research API java client library. |
Since this client library depends on the GData java client library you must 2.) download the GData java client library here.
The GData library also depends on JavaMail and JavaBeans Activation Framework (jaf), so you'll need to 3.) Download the JavaMail API and 4.) Download the JavaBeans Activation Framework
Extract all of the libraries into the same directory:
unzip translate_api_java.zip unzip gdata.java-1.12.0.zip unzip javamail-1_4.zip unzip jaf-1_1-fr.zip
Then, go in to the translate_api directory
cd translate_api
Then to build the example clients, use the following (from within the
translation_api directory):
javac *.java -classpath ../gdata/java/lib/gdata-client-1.0.jar:lib/translate-1.0.jar:.
And then to run those example clients, use the following (from within the
translation_api directory):
java -classpath ../jaf-1.1/activation.jar:../javamail-1.4/mail.jar:../gdata/java/lib/gdata-client-1.0.jar:lib/translate-1.0.jar:. \
SimpleClient en ar "This is a test."
java -classpath ../jaf-1.1/activation.jar:../javamail-1.4/mail.jar:../gdata/java/lib/gdata-client-1.0.jar:lib/translate-1.0.jar:. \
DetailedClient en ar 10 "This is a test."
These example clients will ask you for your username and
password, use your Google account that was granted access to
the research API. Be sure to keep your Google account private. Do not share that account with others.