What’s the word on online lexical and terminological resources for English?

elizabeth-marshmanBy Elizabeth Marshman

Finding useful, reliable information online can be a challenge for language professionals. We doubt the power of Google alone to immediately supply the best answer to questions of correct lexical choice and usage, but we require quick access to answers. How can we find our way through the masses of data to what we really need? Below, I briefly discuss some resources for primarily monolingual English lexical and terminological research, challenges of evaluating their quality and reliability, and research strategies.

Choosing the right resources

Although language professionals love a weighty dictionary, many have all but given up on the paper version for day-to-day research. Online dictionaries and term banks with speedy, flexible searches and adjustable displays are available freely online from reputable publishers. If free versions fail to meet our needs, we can subscribe for online access to full, up-to-date versions of general language and specialized dictionaries.

As broad as their coverage can be, however, conventional reference works are not all-inclusive. They are key sources of highly synthetic, (generally) carefully verified and structured information, but their compact format requires strict filtering of information that can fail to meet some needs. Text-based resources become invaluable complements. Many subject fields have one or more portal sites and information clearinghouses, which may be a gold mine of specialized resources for professionals in the field who have the background knowledge required to make the most of them. They are generally not, however, intended specifically for solving language issues. Corpora, collections of texts designed for linguistic research, are a boon for translators looking for equivalents and for information about meaning or correct and idiomatic use and co-occurrence of lexical items. Among the free corpus-based tools online are bilingual concordancers that query a wide (but largely pre-established) corpus of texts (e.g. from the websites of Canadian and international governments and other bi- and multilingual organizations). Monolingual corpora, provided their content is similar enough to what we are searching for, can also provide useful solutions. Each tool offers a different range of sources, documents and search functions.

If existing online corpora fail to meet our needs, we can almost use the entire Web as a corpus, with either a search engine’s snippet views or an online Web concordancer. Alternatively, many monolingual concordancers (free and commercial) can be installed on our workstations to query texts we gather ourselves and/or receive from our clients. While developing a personal corpus is time-consuming, adapting a collection of texts to a given subject, source, text type and/or client can increase both the speed of later research and the relevance of the results.

Evaluating resources

No matter how wide a range of resources we have at our disposal, their quality and reliability are paramount. For reference works and sites, the author’s background, expertise and affiliation, as well as the publisher or host of the site, are typically excellent indicators of a reliable source online. A relatively recent publication and/or the update date helps to identify up-to-date information. If only authors and webmasters were as sensitive to the value of this information as translators looking for reliable sources! All too often on otherwise apparently useful sites we find this information difficult and time-consuming to locate, minimally detailed, or altogether absent. And as the common saying goes: “When in doubt, doubt!” (I checked that saying on Google and got 45,000 hits, so it must be true.)

Without these indicators, we rely even more on additional factors such as recommendations from other quality sources (including portal sites), as well as points that we can evaluate based on the content and the linguist’s fine-tuned intuition about linguistic quality. We may be reassured by introductory material containing an accurate and thorough description of how the content was chosen and prepared, and the goals and intended audience of the resource; evidence of systematic coverage and description of the contents; and references and other supporting material that in turn appear to be of high quality and reliable. Conversely, while the slick and flashy appearance of a site ultimately says little about its content and value, a carelessly presented site with broken links and missing information can say quite a bit—none of it very promising. All of this comes together in a first evaluation of a resource.

Using ready-made corpora as a lexical resource requires a different approach. With no truly practical way to evaluate all original sources directly and systematically (although we should be able to access some information for specific cases as needed), what we can reasonably evaluate is the judgment of the corpus builders and their criteria for including texts (which should be clearly described). Then, we depend on the strength of numbers in occurrences of our search terms, assuming that occasional errors and problems will be outweighed by correct, typical—and yes, likely less creative—solutions. (Of course, our judgment may allow us to spot hidden, rare gems in a corpus as well.)

Recent developments also require a different evaluation. Metadictionary sites that offer to search several dictionaries at once and collaborative resources such as TermWiki, which allow users to share resources such as term records online with others (who can often also give feedback and suggest additions), provide a wide and eclectic range of information. This inevitably makes evaluation more complex, simply because these resources can be constantly and unpredictably evolving, with many contributors adding information. Discarding them can mean missing out on genuinely useful data, but using them requires us to evaluate each individual item we use (rather than judging the source as a whole).

Happily, having extensive and varied information quickly and easily available allows us to compare different search results (even within the time restrictions imposed by this busy industry). Resources such as the Terminotix toolbar can help to keep a wide range of specifically chosen and evaluated resources at our fingertips as we work, allowing us to compare, contrast and confirm information in minutes.

In summary

With so many resources available, and some good instincts for evaluating potential solutions, we are better equipped than ever before to make lexical and terminological decisions. Developing a network of resources that are appropriate for frequent clients and typical texts is more often a matter of choosing a fairly direct path to an accurate and appropriate answer than beating the bushes for any scrap of useful information. So what is the last word on online lexical and terminological resources? Quite simply: Explore!


Sincere thanks are extended to colleagues who shared their own ideas and practices with me.

