For Developers
The Semantic Experiences on this site are all based on fully learned end-to-end models that can be used for a wide variety of natural language understanding applications. We're excited to share these models with the community to see what else can be built with them. We know that what we're showing here is just the beginning...
Training the Models
The models we're sharing above are primarily trained on pairs of natural language inputs, together with their responses (or some other semantic relationship such as entailment). We share the details in this paper. A variety of semi-supervised data sources are used in the training process. In this case, the semi-supervision is typically the actual co-occurrence of one statement, and an actual subsequent statement. These models are trained using English language examples, but the same approach can and has been used for other languages. The simplest example is the actual next sentence from a multi-sentence piece of text (a newspaper article, for example). From a Q/A dataset, an input may be: "Why don't you come to dinner tonight?" with its paired reply being: "sorry, I can't." True pairs from the dataset are given as positive examples. Randomly paired input/responses provide the negative examples: "Why don't you come to dinner tonight?" paired with "The Mets won all three games." Again, the semi-supervision is simply the fact that the sentences or phrases co-occurred in a piece of training data. Using a variety of data sources (question/answer databases, threaded discussion forums, next-sentence pairs from a newspaper article) allows the models to learn proper pairing of phrases or sentences along a number of dimensions (syntactic consistency, general semantic similarity or consistency, topic consistency and even some world-knowledge consistencies).
By being forced to learn the complex task of distinguishing the true phrase pair from foils, the system learns to represent natural language syntax, semantics and world knowledge in a compact representation (500 dimensional vectors of real values). Input length can be variable (although effectiveness falls as the length of the inputs grows). These vectors can be used for semantic similarity tasks, Q/A tasks, natural language suggestion tasks and many others.
Using the Models
TensorFlow recently released TFHub, a repository for pretrained graphs that can be imported and used for these and other tasks. On this site, a model similar to the ones used for these applications can be found and downloaded. We provide several tutorials, including one on semantic similarity, and another on text classification. The Universal Sentence Encoder model is very similar to what we're using in Talk to Books and Semantris, although those applications are using a dual-encoder approach that maximizes for response relevance, while the Universal Sentence Encoder is a single encoder that returns an embedding for the input, instead of a score on an input pair.
A Word About Biases in Language Understanding Models…
Language understanding models use billions of examples to automatically learn about the world. What they learn can be used to power all kinds of applications relevant to communication in society, and can also reflect human cognitive biases. Careful design decisions are critical to making use of these models.
In Semantris, the list of words we're showing are hand-curated and reviewed. To the extent possible, we've excluded topics and entities that we think particularly invite unwanted associations, or can easily complement them as inputs. In Talk to Books, while we can't manually vet each sentence of 100,000 volumes, we use a popularity measure which increases the proportion of volumes that are published by professional publishing houses. There are additional measures that could be taken. For example, a toxicity classifier or sensitive topics classifier could determine when the input or the output is something that may be objectionable or party to an unwanted association. We recommend taking bias-impact mitigation steps when crafting end-user applications built with these models.
For the AI experiments demonstrated here, we have not taken these bias-impact-mitigating steps. These experiences demonstrate the AI's full capabilities and weaknesses. It will be possible to find offensive associations within these experiences. We encourage you to report offensive associations using the feedback tool so that we can improve future models.
We don't yet (and may never) have a complete solution to identifying and mitigating unwanted associations. As Caliskan et al. point out in their recent paper "Semantics derived automatically from language corpora contain human-like biases", these associations are deeply entangled in natural language data. You can find the results of the WEAT applied (Word Embedding Association Test) from Caliskan et al. and other approaches to evaluate associations in our model in the article "Text Embedding Models Contain Bias. Here's Why That Matters". We hope this raises awareness of the limitations of these models, and gives developers some design considerations to consider when crafting applications. In addition to simply sharing our findings here, we continue to join the conversation on fairness in ML by participating in the FAT* conference as well as our own resources on fairness in ML.