Multext - Document MSG 1. Mtseg. Version 1.1. Last modified 05/05/1996



The Multext multilingual segmenter tools

Di Cristo Philippe, CNRS

The purpose of the segmenter is to split a text into words and special tokens such as abbreviations and numbers, as well as certain multi-word units, and to detect and mark sentence boundaries.





The segmenter has been developed in the context of the MULTEXT project.


Philippe Di Cristo

Laboratoire Parole et Langage

CNRS & Université de Provence
29, Avenue Robert Schuman
13621 Aix-en-Provence Cedex 1, France
tel : (+33) 04 42 95 36 34
fax : (+33) 04 42 59 50 96

Other contributors

Various people have contributed to the conception, improvement and documentation of the segmenter.

Please send comments and suggestions to

This document is also available as a .
Mirror copies can be made if they respect the terms of the permission notice below.

Copyright © Centre National de la Recherche Scientifique, 1996.

This document is only a draft and should be cited as such. Creators of WWW documents pointing to it are warned that its content and location may change without notice. This document is provided as is without any express or implied warranties. While every effort has been taken to ensure the accuracy of the information contained, the authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. Permission is granted to make and distribute verbatim copies of this document for non-commercial purposes provided this copyright, disclaimer and permission notice are preserved on all copies.

HTML 3.2 Checked! This document is better viewed with Netscape

| Top | Next | LPL/CNRS | MULTEXT |

Copyright © Centre National de la Recherche Scientifique, 1996.