31 August 2007

Google vs British National Corpus

Tom Robb’s article on the Google corpus contrasts it with the British National Corpus(1). It seems to me that the British National Corpus is generally regarded as a cornerstone of English, a proper garden like the ones I saw in London with all the flowers and bushes arranged in neat rows in the front of everyone's homes, and Google as a wild field where anything goes and grows.

I think teachers realize the shortcomings of Google as corpus. But that does not mean that there are not some cautions that need to be applied when using the BNC. For example, using the BNC website recommended by Tom we find there are only 232 examples for "Email"(2) but 283 examples for "telegram".

Relying solely on the BNC is like driving a car by looking in the rear view mirror. The BNC will never reflect any new currently accepted language. Google will.



(1) "British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written." The written part composes 90% and the spoken part 10%.
(2) Actually 43 "email" + 189 "e-mail". A comparison on Google reveals 5.6 BILLION for "e-mail" or "email" and 11.8 million for telegram.

No comments:

Post a Comment