MODERN LANGUAGE ASSOCIATION OF AMERICA
COMMITTEE ON SCHOLARLY EDITIONS

Guidelines for Electronic Scholarly Editions

Introduction

These guidelines are intended to help scholarly editors, publishers, CSE consultants, and CSE reviewers in carrying out their respective functions; they reflect the principles articulated in the MLA brochure "Aims and Services of the Committee on Scholarly Editions," parts of which are quoted below. [NOTE: Copies of the CSE brochure may be ordered from the Committee on Scholarly Editions, Modern Language Association of America, 10 Astor Pl., New York, NY 10003-6981. An electronic version may be found at www.mla.org/cse/0000] The guidelines for electronic scholarly editions are closely based on the guidelines for printed editions. Their goal is to enhance the usability and reliability of scholarly editions by making full use of the capabilities of the computer. At this stage, the guidelines are phrased in terms of desiderata rather than requirements, since hardware and software capabilities are changing so rapidly; and some desirable features are not yet technically or economically feasible. Because of this, the CSE encourages the greatest flexibility in carrying out the technical suggestions set forth below. While the CSE has over thirty years of experience with printed scholarly editions -- and the scholarly world at large has several centuries --, few useful models exist for electronic editions. Therefore experimentation and a variety of approaches are to be encouraged.

What cannot be compromised, however, is the scholarly quality of the edition. Exactly the same standards of accuracy, thoroughness, and detail must obtain for an electronic scholarly edition as for a printed one. In both, the reliability of the text is paramount.

The CSE does not prescribe a particular method of editing; the committee's position is that different approaches are appropriate in different situations. The CSE emphasizes that editors who are thoroughly acquainted with editorial options applicable to their materials and with the relevant documentary texts and who are sensitive to the circumstances attending the composition and production of all forms of the text are in a position to choose editorial procedures appropriate to their materials, carry out those procedures accurately and consistently, and explain exactly what they have done and why.

Standards for the "Approved Edition"and "Approved Text" Emblems (based on the 1991 CSE Statement of "Aims and Services")

The editorial standards that form the criteria for the award of the CSE "Approved Edition" emblem can be stated here in only the most general terms, as the range of editorial work that comes within the committee's purview makes it impossible to set forth a detailed, step-by-step editorial procedure. Whatever specific editorial theory and procedures may be used, the editor's basic task is to establish a reliable text. In an electronic edition, the provision of basic transcriptions and tools that allow alternative views of the text and that permit others to build upon existing editorial work is almost as important. Many, indeed most, scholarly editions include a general introduction -- either historical or interpretive -- as well as explanatory annotations to various words, passages, events, and historical figures. Although neither is essential to the editor's primary responsibility of establishing a text, both can add to the value, that is, the usefulness, of the edition. Whatever additional materials are included, however, the CSE considers the following essential for a scholarly edition:

  1. A textual essay, which sets forth the history of the text and its physical forms, describes or reports the authoritative or significant texts, explains how the text of the edition has been constructed or represented, gives the rationale for all decisions affecting its construction or representation, and discusses the verbal composition of the text as well as its punctuation, capitalization, and spelling. When it becomes technically feasible to do so, textual examples used in the essay might take the form of hypertext linkages to the edition itself rather than copies of the relevant passages. While it might not be possible to carry this practice through consistently for all texts cited (sources, analogues, translations, secondary bibliography, etc.), in principle it is highly desirable in order to avoid insofar as possible misquotations of those texts.

  2. An appropriate editorial apparatus and/or notes, or functional equivalent thereof, which (1) record authorial alterations and editorial emendations of the basic text(s) (e.g., a full-text transcription of the basic text(s) keyed to the edited text will make plain the alterations and emendations in the latter), (2) discuss problematical readings (if not treated in the textual essay), and (3) report variant substantive readings from all versions of the text that might carry authority (thus full-text transcriptions of all versions of the text that might carry authority obviate the need to report variant readings). These three kinds of information need not be presented in any specific arrangement, and not all obtain in every situation, but the CSE requires that, when applicable, they should appear either in each edition bearing the "Approved Edition" emblem or be otherwise available at the time of publication.

    If the apparatus is replaced by full-text transcriptions, mechanisms are needed to display passages selected in parallel and to create collated lists of textual variants in various categories (e.g., substantive, accidental).

  3. A proofreading plan that provides for meticulous proofreading at every stage of production so that the accuracy of the text, textual essay, and textual apparatus is not compromised. Automated proof-reading programs ("spell checkers"), word lists, and computerized collation or file comparing programs can be used to alleviate the burden, but they cannot substitute for manual proof-reading nor should they ever be allowed to make unverified changes in the text.

    In addition to the textual essay, editorial apparatus, and proof-reading plan, equally applicable to printed and electronic editions, the following requirements also obtains for any electronic edition seeking CSE approval:

    1. It must employ non-proprietary encoding standards.

    2. It must be self-describing.

    3. It must include retrieval software.

Each of these requirements is spelled out in more detail below.

 

Guidelines

The guidelines below suggest some considerations that the CSE regards as fundamental to the preparation and publication of useful, reliable scholarly editions. They cover the kinds of inquiries that an editor, reviewer, publisher, or informed critic needs to make in order to form a judgment about the accuracy and completeness of a scholarly edition, and they can therefore serve as a working checklist of matters that may demand attention in producing scholarly editions.

Just as no list of general guidelines can anticipate all of the special problems in a particular edition, so also many of the points mentioned below will not be applicable to every edition -- e.g., Section IV.C "Collations" would not be relevant to a diplomatic edition of a single text. The guidelines are intended only to provide a broad framework for identifying issues and for dealing with them reasonably.

For an electronic scholarly edition, perhaps the single most crucial decision is the choice of encoding standard. Internationally accepted and publicly defined norms, as set forth below, are preferable to proprietary systems. If the norms are chosen correctly, the edition can be migrated easily to new hardware and software platforms, thus preserving the work that has gone into it.

  1. Of paramount concern is the necessity of standardizing the character set, encoding norms, and documentation of the source documents and the electronic edition itself. These elements should be as machine- and software independent as possible and of such sufficiently wide-spread use that they can reasonably be expected to be ported into future systems without too much difficulty; since a well-prepared electronic edition will in all likelihood outlast the hardware and software environment in which it was produced. Editors must distinguish between the intellectual requirements of the edition and the requirements of its preparation, distribution, and use.

    1. Character set. For maximum portability the recommended character sets are ANSI standard X3.4-1986 (lower ASCII), with 128 characters, ISO 646 (82 characters), or UNICODE. In certain disciplines other coding schemes of long standing exist and may be used (e.g., beta coding for classical Greek). In some case unique codes using these character sets may need to be devised in order to represent special characters. The character set should be explicitly declared as part of the edition itself (e.g., as Text Encoding Initiative [TEI] Writing System Declarations or as SGML entities).
    2. Encoding norms. It is preferable to use the implementation of Standard Generalized Markup Language (SGML) specifically devised for coding electronic texts, the Text Encoding Initiative (TEI). The choice of an alternate standard should be fully justified and explained.
    3. The text itself should be essentially self-describing, which means that the computer file which embodies it should contain a header with essential "metadata." The Guidelines for Electronic Text Encoding and Interchange (TEI P3), edited by C.M. Sperberg-McQueen and Lou Burnard (1994) offer detailed descriptions of the sorts of information that should be provided for the source document as well as the electronic text itself (see chap. 5 of the TEI guidelines). Metadata should include:
      1. A description of the file itself and the sources used in its preparation (although the description given here need not be as detailed as that found in the introductory essay) (File Description).
      2. The encoding system used (Encoding Description).
        1. The level of encoding should respond to the purpose of the edition. However, at a minimum any edition should encode elements which by any reasonable standard are of general importance and objectively determinable (e.g., the text structure itself -- chapters, acts, scenes). Any encoding scheme should be extensible in order to allow the later encoding of additional elements.

      3. Contextual information concerning the subject matter of the text as well as the basic information about the editor(s) (Profile Description).
      4. Information concerning the changes made to the file in the course of its preparation (Revision Description).
        1. Coupled with this is the necessity of a mechanism to authenticate the contents of the file (e.g., a hashing algorithm using a time-stamping mechanism to generate a unique id number). Because of the ease with which electronic texts can be changed, users must be able to satisfy themselves that the file in fact is what it purports to be.

    4. Similarly, formats for other media included in the edition (sound, image, video) should conform to non-proprietary standards.
      1. While the format and content of electronic editions can, appropriately, vary as much as those of print editions, it seems clear that the possibility of digitized facsimiles of the original source materials, especially the copy text, would enhance the usability and reliability of virtually any electronic edition. Notionally, one can conceive of the utility of a hypermedia archive, comprising digitized facsimiles of all textual witnesses, encoded electronic transcriptions of each witness linked to it, and a critical text linked to those transcriptions, along with annotations, sources, analogues, etc. In practice, the cost of preparing such archives for long texts with many witnesses is likely to be prohibitive.Appropriate non-textual materials (e.g., illustrations, recordings of poetry read by the author or performances of dramatic works) can only enhance the scholarly value of the edition. In some cases, non-textual materials form an integral part of the edition. They should be treated with as much care and attention as the textual materials.
        1. Annotation of digitized facsimiles as well as linking of image to transcription at the line or word level would greatly facilitate scholarly use of such materials.
        2. Similarly, alignment of parallel texts (witnesses to a single text, translations) at least to the line level would also facilitate scholarly use. Line breaks in base transcriptions should be retained so that they may be shown (if desired) when the text is displayed in different-sized windows.

    5. Archival format: The "preservation form" of the text should be non-proprietary and as machine- and software- independent as possible (e.g., TEI conformant).
      1. The master digital archive should be maintained on a server, preferably network-accessible and ideally in the custody of an institution that can guarantee preservation of the archive and migration to suitable hardware and software platforms as technology changes (e.g., a university library or electronic text archive).
      2. A read-only version of the preservation form of the text should also be maintained (e.g., on a CD-ROM disk, digital linear tape, or other long-term storage medium).

  2. Delivery software involves both presentational and analytical software. Given the current existence of three widespread software platforms (MS-DOS/Windows, Macintosh, UNIX) and distribution on removable media (e.g., diskette, CD-ROM) or the Internet, it seems likely that most electronic editions will not be universally available to all users in their most sophisticated form.
    1. Presentational and analytical software should ideally be widely available (commercial, shareware, or public domain) for a variety of platforms and should have a reasonable life expectancy. Although electronic editions need not be published commercially, they should be made available in standardized formats, e.g., CD-ROM disks in ISO 9660 format or DVD, preferably not limited to a given computer platform. CD-ROM disks have the great advantage of fixing the form of the text at a given time, much like a traditional paper edition; but they do not allow for additions and corrections except through the release of a second edition.
    2. Network access from a central location, or text archive, although not essential, is highly desirable, both to minimize the proliferation of variant texts and to facilitate revisions. Network access may obviate the necessity of providing platform-specific versions, since Internet browsing tools exist already for each platform. The current (1997) HTML markup language is not adequate for serious scholarly purposes, since it is concerned with formatting, not the encoding of a text's logical structure; although HTML versions of SGML-marked-up text may be suitable delivery mechanisms. When later standards (e.g., XML) approach the capabilities of SGML, they may be considered as acceptable alternatives.
      1. Such momentary limitations can be overcome, however, by preserving the text in a more sophisticated archival form (e.g., SGML) and then converting it into other formats for presentation.
      2. Hypertext capabilities. The software chosen should allow for the use of hypertext, preferably with the capability to allow the user to add personal links as well as to annotate the text locally.
        1. The editorial principles should include a rationale of the kinds of hypertext links (two-way, one-way) used as well as of the categories of information that they are used to connect (e.g., sources, textual parallels, textual notes). The links themselves should include information to indicate their scholarly purpose and to facilitate searching by category (e.g., source).

    3. Analytic software similarly should be widely available and not limited to a single platform.
        1. Analytic software might include:
          1. Retrieval software (e.g., TACT). Retrieval software frequently uses an indexed data base. Such a data base should include every individual word form as well as (preferably) access to lemmatized forms. The latter is particularly necessary for old spelling editions. Texts should also be available in a non-indexed form as well.
          2. Collation software (e.g., CASE, COLLATE, UNITE). If the editor has constructed a critical text on the basis of full text transcriptions, collation software allows the user to verify the editor's critical practice as well as vary the editorial assumptions (e.g., by selecting another version as a base text) and criteria (e.g., preservation of accidentals). Moreover, collation software allows the user to prepare a subedition of an individual family in a complex textual tradition, thereby facilitating reception studies.

    4. Insofar as possible, software should be used instead of manual techniques. Thus, instead of encoding, for example, morphological information at the word level, or lemmatizing texts manually, parsers, lemmatizers, or machine-readable dictionaries external to the text could be employed. Software of this sort is not yet widely available and, when it is, may not necessarily fulfil an edition's requirements for accuracy. It is likely that the development of sophisticated software tools will be the single most important factor in facilitating the creation of sophisticated electronic scholarly editions. Any such software should have the capability of specifying and storing rules for any actions it carries out and following them without exception.

  3. CONCEPTION AND PLAN OF EDITION. The content of an electronic edition differs little from that of a print edition. It should be appropriate, complete, and coherently conceived. The criteria for what is to be included in an electronic critical edition will generally be more expansive than those for a comparable printed edition, because of the computer's inherent ability to organize and manipulate large amounts of data. In addition to materials that form part of the edition itself, an electronic edition can also make use of existing electronic materials by linking to them. The considerations set forth above with regard to encoding schemes, formats, digitized facsimiles, etc., apply equally to all of the materials listed below. The contents should:
    1. include logically selected, manageable textual content -- e.g., an edition of a single work, a group of works generically or chronologically grouped;
    2. include, when appropriate, authorial documents in addition to basic text(s), such as adaptations, working notes, contracts, tables of contents, prefaces, abstracts;
    3. present appropriate second-party textual materials -- e.g., letters from respondents may be desirable in an edition of letters;
    4. include the editorial materials required by the kind of edition envisaged -- e.g., [1] prefaces and acknowledgments; [2] lists of sigla, symbols, and abbreviations; [3] textual essay; [4] textual apparatus (or the functional equivalent, e.g., hypertext links) and/or notes; [5] historical/interpretive essay(s); [6] illustrations or charts, diagrams, maps; [7] historical/explanatory notes; [8] appendices; [9] bibliography; [10] glossary; [11] index(es);
    5. be logically arranged and easy to use;
    6. include appropriate analytical and text retrieval tools, either as part of the edition itself or as part of the access package for which the edition is designed (e.g., network browsers).

  4. EDITORIAL METHODS AND PROCEDURES
    1. Materials
      1. A thorough census of all relevant materials should be conducted.
      2. Although editors may use reproductions (e.g., photocopies, microfilms, or digitized facsimiles) for preliminary editing, they should at some point verify the accuracy of their work against the original artifacts.

    2. Transcriptions
      1. Machine-readable transcriptions should be made according to an established rationale and policy, covering, e.g., such matters as expansion of abbreviations, use of special characters, and indication of medium. Except for exceptionally clear machine-printed modern texts, photocopied or original, scanners and OCR software have not as yet proved accurate enough to replace manual transcription.
        1. One very reliable method of manual transcription for printed materials is to input the same text twice, by two different people, who do not necessarily have to know the language involved, then use a collation or file compare program to find the differences.

      2. Transcriptions should be double-checked and perfected by persons other than the transcriber, using appropriate manual and computerized proof-reading procedures.

    3. Collations
      1. All significant or potentially significant forms of the text(s) should be collated or included as machine-readable transcriptions of the witnesses.
      2. Accuracy of the collations should be verified by comparison of results obtained by different people using appropriate collation or file comparison software to supplement manual proofing. In the latter case, it may be assumed that the collations obtained through the use of that software will reflect faithfully the underlying transcriptions.
      3. Editorial policy for defining and recording variants should be clearly stated, preferably in the form of parameters established in the collation software. All items defined as variants should be recorded whether or not they are to be included in the completed edition. Such variants will be recorded automatically if complete transcriptions of the textual witnesses have been made and if the collation software has been programmed to list them.
        1. The collation software used should be capable of filtering out variants according to established categories (e.g., spelling, capitalization, punctuation) and of separating or grouping the resulting apparatus by those categories.

    4. Quotations
      1. Sources of references and quotations in the authorial text(s) should be identified, and any textual problems raised should be addressed.
      2. Care should be taken that the text is accurately quoted in the textual essay, textual notes, historical essay, and explanatory notes, preferably by hypertext linking to the quoted passage rather than by copying it, when it becomes technically feasible to do so; so that any change in the text is reflected in the essay.

    5. Proofing at every stage to safeguard accuracy is of the highest importance.
    6. The editors should give serious thought to preserving and making available the record of their editorial deliberations and the rationale for editorial decisions.

  5. PARTS OF THE EDITION
    1. Text(s)
      1. The decision to use a single or multiple base- or copy-text, parallel texts, sequential versions, or a combination of these, should be appropriate to the goal of the edition. Sophisticated encoding and linkage will allow the greatest flexibility to both editor and user in deciding and altering the presentation format.
      2. The form of presentation of the texts -- whether in clear text, diplomatic transcription, facsimile, or in some other format -- should be consistent with announced principles. Detailed encoding combined with appropriate filtering mechanisms can allow the same base text to be presented in a variety of different ways; e.g., as an old spelling or a modernized edition.
      3. Inclusive text should use a clear and efficient system to symbolize or reproduce cancellations, interlineations, omissions, insertions, writeovers, etc.

    2. Textual Essay
      1. The essay should provide a clear, convincing, and thorough statement of the edition's theoretical principles and practical methodology, covering such matters as:
        1. theory of copy-text adopted;
        2. description of alternative candidates, if any, for basic text (whether single, parallel, or sequential texts are presented) and justification of selection; instructions on how to use software to select alternative base texts;
        3. justification of form of presentation, whether clear text, diplomatic transcription, or other form, and instructions on how to convert the presentation of the text from one form to another;
        4. clear explanation of the policy of editorial emendation, covering all changes made in the basic text(s) or documents, whether or not such changes appear in the emendations list;
        5. rationale for including and excluding various classes of textual variants in the apparatus, or instructions on how to use the collation software to change the paradigms which select variants;
        6. explanation of treatment of ambiguously broken line-end compounds or possible compounds in source text(s);
        7. clear instructions for using the textual apparatus, or the accompanying collation programs;
        8. description of the character set and encoding scheme used;
        9. instructions for use of the text retrieval software.

      2. The discussion of the materials upon which the edition is based should include the following, where appropriate:
        1. a survey of all forms of the text(s) relevant to the edition, including an account of the provenance of such forms and/or artifacts;
        2. a record of locations of relevant manuscripts and unique printed texts;
        3. identification of the specific copies used for collations, preparation of printer's copy, etc.;
        4. bibliographical or codicological description of the relevant artifacts (printed copies, manuscripts, typescripts, tear-sheets, etc.). When possible this should be accompanied by complete digitized facsimiles of such artifacts.

      3. The account of the evolution of the text(s) should include:
        1. the history of composition and revision, whether by the author, scribes, editors, compositors, etc.;
        2. the history of publication of printed texts;
        3. for scribal texts, a profile of the copying habits, orthography, and dialect of manuscript scribes.

    3. Critical/textual apparatus (The term "apparatus" is used here in its broadest sense. The CSE does not require a standard format for the apparatus.)
        1. Design and Purpose of Apparatus
          1. The apparatus or collation software used in conjunction with the textual essay should enable thorough study of the composition and transmission of the text within the limits envisaged by the edition.
          2. The apparatus or collation software should distinguish, where possible, between what the author has done to the text and what was done by scribes, printers, compositors, advisors, and editors (including the present one).
          3. The record of textual variants should be logical, complete, and uncluttered; it should:
            1. conform to the principles announced in the textual essay;
            2. include variants from all authoritative or significant texts;
            3. make possible, when used in conjunction with the edited text(s), the recovery of all significant forms of the text, if such is consistent with the goals of the edition, preferably by display of the complete form of the transcription of the originals.

          4. Each part of the apparatus should be self-contained; cross-referencing of information between lists should be clear and simple to follow, a process that can be facilitated by appropriate use of hypertext links. Hypertext links should be coded to make clear the distinction between textual and non-textual material.
          5. Encoding of apparatus where there is not a complete transcription of all relevant witnesses should follow the TEI or other appropriate guidelines.

        2. Parts of the Apparatus
          1. Record of emendations: editorial emendations -- words, spelling, punctuation, and capitalization -- of the basic text(s) should be reported or adequately described in a manner consistent with the stated policy of emendation; if emendations are not individually reported, the policy must be justified and the classes of unreported emendations adequately described.
          2. Record of alterations: the author's alterations of the text should be recorded.
          3. Records of variants should follow the edition's stated principles of inclusion and exclusion and should make clear the history and/or permutations of the text. Collation software should allow the user to modify those principles to suit his or her own needs.
          4. Textual notes should identify the textual problems and adequately explain how the editors have dealt with them.
          5. Records of Word, Stanza, and Section Breaks
            1. All ambiguous line-end hyphenation of compounds or possibly compound words in printed texts used as basic texts should be recorded; a second list should indicate the way such compounds ambiguously broken in the new edition should be quoted. This process will be facilitated by the use of hard and soft (conditional) hyphens.
            2. Stanza, section, and verse paragraphs ambiguously broken at the ends of pages in the base or copy-text should be recorded.

    4. Extra-Textual Materials
      1. Historical or critical essays and analyses, explanatory notes, glosses, etc., should, if present:
        1. be clearly separated from the textual essay and complement rather than duplicate information in the textual essay;
        2. dovetail smoothly with the textual essay;
        3. conform to a reasoned policy for length, placement, and content;
        4. be complete.

      2. Glossaries and proper-name tables or indices
        1. The rationale for determining entries should be clear and appropriate both to the text and to the audience envisaged.
        2. The format should be clear and uncluttered.
        3. Cross-references should be provided for entries having alternate spellings.
        4. To the extent possible such tables should be electronically generated on the basis of encoding.

  6. PREPARATION FOR PUBLICATION
    1. All necessary permissions to publish the material must be obtained from the owners and copyright holders.
    2. The editor and the publisher should agree on the encoding scheme and software to be used and the publisher should at an early stage see a sample.
    3. The editor and the publisher should understand one another's special requirements for publishing electronic scholarly editions, including:
      1. the particular design requirements of the formatted edition and, if applicable, the format of the series as a whole;
      2. special aspects of the production schedule, including:
        1. the amount of time to be allowed for multiple proofreadings and for necessary final collations.

    4. Proofreading
      1. Final responsibility for maintaining the accuracy of the text during production must be clearly assigned.
      2. Adequate resources should be allotted, and a comprehensive plan for proofreading should be developed, taking into account:
        1. how proof will be readóby whom, how many times, and against what;
        2. which stages of proof will be read by the editor(s).

      3. Final collations or checks should be carried out to ensure that no unauthorized changes have been made in the final electronic files in proof. Spell checkers and word lists are useful for spotting anomalies but all changes must be verified.

    5. Use of Electronic Files
      1. Since electronic files will be used for the formatted edition, the editor and publisher should agree about:
        1. the choice of software and platform, bearing in mind problems such as the linking of notes with text, nonstandard characters, etc. (ideally, an edition should be available on as many platforms as possible);
        2. the extent to which the encoding scheme chosen will allow or facilitate subsequent publication in other formats, e.g., print;
        3. who is responsible for inserting final changes or corrections in the file -- the editor, the publisher, or third-party technical staff.

      2. Arrangements should be made for retaining and archiving the electronic files.
      3. Consideration should be given to publication of the edition in a variety of formats, including print.
        1. If the electronic files are to be translated to a system that will drive the typesetting machinery for a subsidiary printed edition, the resulting proofs should be checked as they normally would.

    6. Indexing: in addition to full text retrieval software, consideration should be given to the encoding of items to be indexed (e.g., proper names); and appropriate software for retrieval of indexed items should be included.
    7. Reformatting: To facilitate reformatting, editors and publishers should consider:
      1. making archived electronic files available for reformatting;
      2. encoding the apparatus and editorial in such a way that they can easily be omitted, if desired, from reformatted versions;
      3. licensing libraries to extract data in order to integrate it into locally-based electronic text collections;
      4. facilitating extraction of the text in a variety of formats (e.g., SGML, non-encoded ASCII) so that scholars may use the text with other software packages or tools.

PLEASE SEND COMMENTS TO

The Electronic Scholarly Editions Listserv:

ese@ra.msstate.edu

For the Committee on Scholarly Editions

Charles B. Faulhaber
The Bancroft Library
University of California
Berkeley, CA 94720-6000

December 1, 1997

cfaulhab@library.berkeley.edu