Multext - Document MSG 1. Mtseg. Version 1.1. Last modified 05/05/1996

MtSeg

The Multext multilingual segmenter tools

Di Cristo Philippe, CNRS

The purpose of the segmenter is to split a text into words and special tokens such as abbreviations and numbers, as well as certain multi-word units, and to detect and mark sentence boundaries.

Annexes

Recently

News and Changes (last changes 05/05/1996)

Credits

The segmenter has been developed in the context of the MULTEXT project.

Author: Philippe Di Cristo

Laboratoire Parole et Langage

CNRS & Université de Provence
29, Avenue Robert Schuman
13621 Aix-en-Provence Cedex 1, France
tel : (+33) 04 42 95 36 34
fax : (+33) 04 42 59 50 96
e-mail: multext@lpl.univ-aix.fr
Other contributors: Various people have contributed to the conception, improvement and documentation of the segmenter.

Please send comments and suggestions to multext@lpl.univ-aix.fr.

This document is only a draft and should be cited as such. Creators of WWW documents pointing to it are warned that its content and location may change without notice. This document is provided as is without any express or implied warranties. While every effort has been taken to ensure the accuracy of the information contained, the authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. Permission is granted to make and distribute verbatim copies of this document for non-commercial purposes provided this copyright, disclaimer and permission notice are preserved on all copies.

MtSeg

The Multext multilingual segmenter tools

Contents

Annexes

Recently

Credits