Archive for the ‘Resources’ Category

Chinese Segmenter and Annotation Tool (Perl and Java)

星期四, 六月 17th, 2010

I have also made available a Java version of the segmenter that works with Big5, GB, and UTF-8 encoded text files.

Usage: java -jar segmenter.jar [-b|-g|-8] inputfile.txt
-b Big5, -g GB2312, -8 UTF-8
Segmented text will be saved to inputfile.txt.seg

  • Share/Bookmark
Better Tag Cloud