The International Corpus of English is a unique linguistic and sociolinguistic project. When complete it will consist of fifteen or more parallel corpora of spoken English drawn from countries where English is either a majority first language or an official second language.
Part I introduces the ICE project and a sub-project that investigates writing by advanced learners of English. Part II describes in detail the design of the corpora, the markup systems for speech and writing, the ICE tagset and parsing scheme, and the software packages that have been developed for automatic tagging and parsing, and for retrieving lexical, grammatical, and sociolinguistic information. Part III discusses problems in compiling the corpora, exemplified by the experience of teams in New Zealand, East Africa, and Hong Kong. Finally, Part IV considers some of the applications envisaged for the corpora: research in linguistics, sociolinguistics and natural language processing; teaching, language planning, and the establishment of norms for teaching and examining in second-language countries.