Translation has become integrated into an end-to-end communication process. Information “architects” design a structured framework that writers fill with “content.” Search Engine Optimization (SEO) experts add “keywords” so that the content can be easily found in cyberspace. Forget word processing; today we use XML to adorn content with metadata so that it can be properly managed and “repurposed” later. Translators are at the receiving end. They “transcreate” the content, preserving all the properties added upstream.
It is worth noting here that the smallest repurposable unit of content is terminology. Perhaps that is why companies are increasingly developing their own terminology databases (termbases). But unlike in the past, the underlying motivation for doing so is not some admirable notion of social responsibility. It is purely economic: compete in the global economy. In this context, what does “quality” mean for terminology resources? This article describes how the quality of terminology resources is increasingly being equated with return on investment (ROI) and less with the theoretical postulates of pre-computing times.
Let’s look at those theoretical postulates first. While terminologists continue to hold dear some principles of the General Theory of Terminology (GTT),1 such as concept orientation, which many, including the author, consider “untouchables,” certain heretofore unchallengeable assumptions have become impossible to sustain. Probably the best example is the notion that terminology work involves creating structured concept systems. In her 25 years working as a terminologist, this author only needed to produce one. The strict focus on concept analysis does not reflect terminology practice.
Thanks to advances in computing and the availability of large-scale corpora, new theories have emerged: the Communicative Theory,2 the Socio-Cognitive Theory,3 the Lexico-Semantic Theory,4 and the Textual Theory,5 all of which place greater emphasis on authentic communication and less on conceptual universals. When these theories are examined, it becomes obvious that there is also a theoretical basis for questioning some of the GTT’s postulates.
The inability of practicing terminologists to stick to traditional theoretical principles and the shifting foundations of terminology theory itself, indicate that the quality of production-oriented terminology resources cannot be measured according to those principles alone. What is needed is a new framework of best practices and quality metrics that can effectively be applied to meet today’s demands. What are the elements of such a framework?
Terminologists working in production-oriented environments need to focus on the core objectives and mission of the organization while respecting some well-founded theoretical principles as much as possible. When it comes down to getting approval of a budget, the most important measure of quality will be how well the terminology work actually supports those objectives.
For example, if the company plans to expand into a new geo-linguistic market, the terminologist should assign proportional resources to that language and be ready to provide the required support on time. In the case of acquisitions and mergers, the terminologist needs to harmonize terminology conflicts before the new content flows down the globalization pipeline. Marketing and branding efforts could suffer significant setbacks without this type of specialized terminology work. The terminologist needs to react to changes in the business environment.
Governmental termbases, such as TERMIUM and the BTQ, tend to have a normative focus — help the community use correct terms and vocabulary in order to strengthen the national languages. Commercial enterprises are more interested in marketing, in customer satisfaction, and in employee productivity. In each case, elements such as the data model, term attributes, the notion of termhood and workflows will differ considerably.
It seems clear that a high-quality terminology resource should have the following properties:
Let’s take a closer look at these key properties.
The first three properties in the list above are related. To be repurposable and extensible, a termbase needs to work in different computing environments. A term commonly used in ROI studies, beginning with that of Guy Champagne in 2004,6 is repurposability. Repurposability is the degree (frequency and diversity of uses) to which a terminology record is used. The more it is used, the greater the ROI. Back in 2004, Champagne refers to the number of times the record was consulted by a specialist such as a translator. He compares the time it takes to research a term with, and without, a termbase. The latter takes much longer than the former, and thus, each time an entry is consulted, the termbase is saving money.
Today, repurposability extends beyond that. Technical innovation and advances in natural language processing are leading to new uses of terminology resources. A case in point is what is referred to as controlled authoring. Yes, writers are finally catching up with translators by finding ways to leverage technology, in this case by using software that checks for grammar, style, and consistency.7 This software requires terminology resources in order to work properly. But a termbase that was designed for translators cannot simply be plugged into a controlled authoring software. Being designed for a different purpose, the termbase is structurally incompatible, and it lacks sufficient information about the source language — even the right terms — to support controlled authoring.
Like it or not, machine translation (MT) is here to stay. As language professionals, we know its limitations and can guide our employers to use it appropriately (and even create opportunities for ourselves in the process). Here is yet another case where terminology resources can be repurposed. The quality of MT is higher when the content translated is confined to a domain and the MT engine is supplied with terminology from the same domain. This was clearly demonstrated by METEO, the system used by Environment Canada to translate weather reports and one of the first success stories in MT.8 Again, most termbases cannot be used “out of the box” by MT systems because they lack the necessary information, such as domain values and certain linguistic attributes. Many termbases even lack part of speech, which, according to TerminOrgs, an association of terminologists working in large organizations, is the most important attribute to include.
Controlled authoring and MT are just two examples of how a terminology resource can be something more than a simple glossary for translators. Other applications, such as SEO, “keywords” of which are a form of terminology, are attracting significant interest. The problem is that, due to a lack of foresight, most termbases have low repurposability.
Termhood affects repurposability, so it needs some explanation. There are differing opinions about how to define termhood. A pragmatic view is that termhood is the degree to which a term candidate (linguistic unit) is deemed to be a “term” for the purposes at hand and therefore should be included in the termbase. While there are several universal criteria, how termhood is defined ultimately depends on what linguistic units actually need to be managed. Quite often, they are not “terms” according to traditionalists. Terminologists struggle to reconcile the apparent disconnect between what scholarly literature calls a “term” and what their company needs in the termbase.
Ronan Martin, a terminologist at SAS, a multinational software company, describes the challenges of choosing what to include in his termbase.9 To the classical conceptual basis for determining termhood, he adds factors that relate to repurposability: the nature of pre- and post-modifiers, term frequency, and term embeddedness (when terms are found embedded in larger terms). A combination of factors places term candidates on a scale of cost versus benefits. Terms that are easiest to identify in the company’s corpus and yet have the greatest repurposability in applications, such as controlled authoring and computer-assisted translation, are the low-hanging fruit that should be picked.
As just mentioned, a term’s frequency in the relevant corpus is one important measure of termhood. It is even a basic measure of Champagne’s ROI. One would think it quite obvious that the terms in a termbase should reflect, more or less, the terms used in the organization. Yet the “gap” between a termbase and the corpus it is supposed to represent can be surprisingly large.10 The reason? In practice, few terminologists have any experience with the use of corpora or are even aware that they can and should use them. The impact on termbase quality is worrying. To achieve sufficient representativity, terminologists can take inspiration from theories and practices that use corpora as the basis for terminology investigation, such as the Lexico-Semantic Theory and the Textual Theory. Using tools such as concordancing software to check termhood would, for example, vastly improve the quality of termbases.
Concept orientation is one of the key original principles of the GTT. In spite of the dramatic changes terminologists face, this principle remains valid – and possibly even more important – today.
Concept orientation is a key criterion for repurposability. Many potential applications of terminology resources, including controlled authoring and SEO, require knowledge of synonyms, and therefore a “synset” structure is essential. Simply speaking, a terminology resource that is not concept-oriented is not a terminology resource at all. This includes virtually all Excel spreadsheets and MSWord documents. A two-column bilingual glossary, widely used as the medium for so-called terminology work, does not come close to meeting the requirements laid out in this article.
Quality must be measured in terms of usability and repurposability. We terminologists simply cannot afford to construct terminology resources according to decades-old tenets that could not foresee the conditions in which we work today and the increasing diversity of end users. We need a new framework of best practices to guide working terminologists, and a concerted effort is being made to build one.
Kara Warburton is the Director, Business Development for Asia Pacific, at Interverbum Technology.
1. Wüster, Eugen (1979). International bibliography of standardized vocabularies = Bibliographie internationale de vocabulaires normalisés = Internationale Bibliographie der Normwörterbücher / initiated by Eugen Wüster; prepared by Helmut Felber, Magdalena Krommer-Benz, Adrian Manu; [edited by International Information Centre for Terminology]. – 2nd enl. and completely rev. ed. – München; New York: K. G. Saur. xxiv, 540 p.
2. Cabré, Maria Teresa (1999). La terminología: Representación y comunicación.
Elementos para una teoría de base comunicativa y otros artículos. Barcelona, Institut
Universitari de Lingüística Aplicada.
3. Temmerman, Rita (2000). Towards New Ways of Terminology Description: The sociocognitive approach. Amsterdam. John Benjamins Publishing Company.
4. L'Homme, Marie-Claude (2004). La terminologie : principes et techniques. Montréal. Les Presses de l'Université de Montréal.
5. Bourigault, Didier and Monique Slodzian (1999). Pour une terminologie textuelle.
Terminologies nouvelles, Vol. 19, p.29-32.
6. Champagne, Guy (2004). The Economic Value of Terminology. An Exploratory Study.
Ottawa. Translation Bureau of Canada.
7. For example: Acrolinx.
8. Chandioux, John and Marie-France Guéraud (1981). MÉTÉO: un système à l'épreuve du temps. Meta: Translators' Journal. Vol. 26, No. 1, p. 18-22. Les Presses de l'Université de Montréal.
9. Martin, Ronan (2011). Term Inclusion Criteria. Internal SAS document, SAS Inc., Cary, N.C.
10. Warburton, Kara (2014). Narrowing the Gap Between Termbases and Corpora in Commercial Environments. PhD Thesis. City University of Hong Kong. Available from www.termologic.com