The Dante Project
The Dante database was initially created as the first product of the process of compiling the New English-Irish Dictionary (NEID), due to be published in print and electronic form in 2012 by Foras na Gaeilge, Dublin. The NEID editing process was planned in three stages:
|Stage 1:||the source language analysis, entailing the creation of a highly detailed lexical database of English, to inform …|
|Stage 2:||the translation, during which a large number of Irish Gaelic equivalences were added to the database, and …|
|Stage 3:||the dictionary editing, when the bilingual material in each of the lexical entries was scrutinized, evaluated and compressed into regular dictionary entries.|
Dante is the output of Stage 1.
The NEID project is managed for Foras na Gaeilge by Cathal Convery. Stage 1 was the responsibility of the Lexicography MasterClass and their team. Stages 2 and 3 are being carried out under the guidance of Editor Pádraig Ó Mianáin and Assistant Editor Muiris Ó Raghallaigh, who lead a team of Irish translators and lexicographers. Their ongoing work may be viewed here
The compilation was supported by:
- a 1.7 billion word lexicographic corpus;
- the Sketch Engine corpus query system with the corpus loaded, customized to meet the demands of the project;
- the IDM DPS configured according to the NEID DTD, style guide, schedule and data flow;
- a detailed user profile;
- headword selection principles and a 50,000 word list;
- a list of linguistic labels for marking register, style, domain etc.;
- a detailed description of entry structures needed for the dictionary;
- a detailed style guide for the English analysis process;
- 50 'template’ (model) entries, for specific lexical sets;
- 100 sample dictionary entries covering the full range of entry types.
The NEID English database, now Dante, was compiled by a 20-strong editing team which included several American and Irish lexicographers, the remainder being from the UK. The editorial work was led by Managing Editor Valerie Grundy, and the administration of the project was in the hands of Project Administrator Diana Rawlinson. LexMC’s directors Sue Atkins and Michael Rundell were in charge of the database design, quality control and project management, while Adam Kilgarriff was responsible for the Sketch Engine and oversight of the computational aspects.
Dante was begun in February 2007, and completed in September 2010.
The meticulous planning and sophisticated software resulted in significant improvements over many dictionary projects. These include:
- the theory of lexicographic relevance (Atkins et al 2003) which underlies the design of the database;
- the detailed, formalized analysis of word behaviour (Chapters 8 & 9, Atkins and Rundell 2008);
- the improvement of the reliability of schedule and workflow by classifying, before the compiling started, over 40,000 headwords according to type and complexity (Atkins and Grundy 2006);
- the systematic use of over 60 model ‘template’ entries which accelerated the compilation and enhanced consistency;
- the customization of the Sketch Engine corpus query software, with a corpus of 1.7bn words (Kilgarriff et al. 2004, 2007), together with the use of the ‘GDEX’ algorithm (Kilgarriff et al. 2008) for detecting and foregrounding the ‘best’ examples sentences in the corpus, combined with a seamless interface with the project’s dictionary-writing system;
- the customization of the DPS software;
- the rigorous quality control, combining conventional entry-editing by senior team members with the use of complex search scripts that list all entities of a specific type and allow rapid checking for accuracy.
Atkins, B. T. Sue, Charles Fillmore and Christopher Johnson (2003)
‘Lexicographic relevance: selecting information from corpus evidence’, in International Journal of Lexicography, guest editor Thierry Fontenelle, Oxford, OUP: 16:3 251-280
Atkins, B. T. Sue and Valerie Grundy (2006)
‘Lexicographic profiling: An aid to consistency in dictionary entry design’. In Proceedings of the Twelfth EURALEX International Congress, EURALEX 2006, Alessandria Italy: Edizioni dell’Orso. 1097-1107
Atkins, B. T. Sue and Michael Rundell (2008)
The Oxford Guide to Practical Lexicography, Oxford: Oxford University Press.
Kilgarriff, A., Rychly, P., Smrz, P. and Tugwell, D. (2004)
‘The Sketch Engine’, in Williams and Vessier (2004). 105-116. Reprinted in Fontenelle (2008)
Kilgarriff, A., Rundell, M. and Uí Dhonnchadha, E. (2007)
‘Efficient corpus development for lexicography: building the New Corpus for Ireland’, Language Resources and Evaluation 40:2. 127-152
Kilgarriff, A., Husák, M., McAdam, K., Rundell, M. and Rychly, P. (2008)
‘GDEX: Automatically Finding Good Dictionary Examples in a Corpus’, in Bernal, E. and DeCesaris, J. (Eds) Proceedings of the XIII EURALEX International Congress.. Barcelona: Universitat Pompeu Fabra: 425-433