<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>HFU AI Lab.</title>
	<atom:link href="http://ailab.mtir.net/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://ailab.mtir.net</link>
	<description>AI, NLP, MT, IR</description>
	<lastBuildDate>Fri, 18 Jun 2010 08:09:36 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>受保護的文章：中文斷詞程式</title>
		<link>http://ailab.mtir.net/?p=33</link>
		<comments>http://ailab.mtir.net/?p=33#comments</comments>
		<pubDate>Fri, 18 Jun 2010 08:09:36 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://ailab.mtir.net/?p=33</guid>
		<description><![CDATA[受保護的文章不會產生摘要。]]></description>
			<content:encoded><![CDATA[<form action="http://ailab.mtir.net/wp-pass.php" method="post">
<p>本文受密碼保護，須填寫您的密碼才能閱讀。</p>
<p><label for="pwbox-33">登入密碼：<br />
<input name="post_password" id="pwbox-33" type="password" size="20" /></label><br />
<input type="submit" name="Submit" value="送出" /></p></form>
<p><a class="a2a_dd addtoany_share_save" href="http://www.addtoany.com/share_save?linkurl=http%3A%2F%2Failab.mtir.net%2F%3Fp%3D33&amp;linkname=%E5%8F%97%E4%BF%9D%E8%AD%B7%E7%9A%84%E6%96%87%E7%AB%A0%EF%BC%9A%E4%B8%AD%E6%96%87%E6%96%B7%E8%A9%9E%E7%A8%8B%E5%BC%8F"><img src="http://ailab.mtir.net/wp-content/plugins/add-to-any/share_save_171_16.png" width="171" height="16" alt="Share/Bookmark"/></a> </p>]]></content:encoded>
			<wfw:commentRss>http://ailab.mtir.net/?feed=rss2&amp;p=33</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chinese Segmenter and Annotation Tool (Perl and Java)</title>
		<link>http://ailab.mtir.net/?p=27</link>
		<comments>http://ailab.mtir.net/?p=27#comments</comments>
		<pubDate>Thu, 17 Jun 2010 11:29:21 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[NLP]]></category>
		<category><![CDATA[Resources]]></category>
		<category><![CDATA[Chinese]]></category>
		<category><![CDATA[Java]]></category>
		<category><![CDATA[Perl]]></category>
		<category><![CDATA[Word Segmentation]]></category>

		<guid isPermaLink="false">http://ailab.mtir.net/?p=27</guid>
		<description><![CDATA[I have also made available a Java version of the segmenter that works with Big5, GB, and UTF-8 encoded text files.

Usage: java -jar segmenter.jar [-b&#124;-g&#124;-8] inputfile.txt
-b Big5, -g GB2312, -8 UTF-8
Segmented text will be saved to inputfile.txt.seg
]]></description>
			<content:encoded><![CDATA[<p>URL: <a href="http://www.mandarintools.com/segmenter.html">http://www.mandarintools.com/segmenter.html</a></p>
<p>You can download <a href="http://www.mandarintools.com/download/segment.zip">the zip file </a>which contains four files. First is the perl script segment.pl which takes one argument, the name of the source file to segment. It expects the file name to end with 『.txt』. It needs the library file segmenter.pl which has all the actual segmenation code. The program also expects to find the lexicon file wordlist.txt in the same directory it&#8217;s running in (though this is easily modified). It outputs a new segmented file with 『.txt』 replaced with 『.seg』. Right now it only works on GB encoded files, but a Big5 version (converting to GB, segmenting, and using the segmented file to segment the original Big5 version file) would not be hard. Also included is a convenience file, segment.bat, for people working in Windows. It runs perl on segment.pl and expects a file name as an argument.</p>
<p>The segmenter requires <a href="http://www.perl.com/">Perl</a> to run. It is a free and easily downloaded program.</p>
<p>I have also made available a <a href="http://www.mandarintools.com/download/segmenter.jar">Java version of the segmenter</a> that works with Big5, GB, and UTF-8 encoded text files.</p>
<blockquote><p>Usage: java -jar segmenter.jar [-b|-g|-8] inputfile.txt<br />
-b Big5, -g GB2312, -8 UTF-8<br />
Segmented text will be saved to inputfile.txt.seg</p></blockquote>
<p>Words can be added or deleted directly from the lexicon file. The segmenter has algorithms for grouping together the characters in a name, especially for Chinese and Western names, but Japanese and South-east Asian names may not work well yet.</p>
<p>The segmentation process is also a perfect time to identify interesting 『entities』 in the text. These could include dates, times, person names, locations, money amounts, organization names, and percentages. This collection of interesting nouns is often refered to as 『named entities』 and the process of identifying them as 『named entity extraction』. There is already code to identify person names and number amounts in the segmenter, and I will adding more code to find the rest in the future.</p>
<p>The segmenter works with a version of the maximal matching algorithm. When looking for words, it attempts to match the longest word possible. This simple algorithm is suprisingly effective, given a large and diverse lexicon, but there also need to be ways of dealing with ambiguous word divisions, unkown proper names, and other words not in the lexicon. I currently have algorithms for finding names, and am researching ways to better handle ambiguous word boundaries and unknown words. Additional knowledge that would be useful would be a list of characters and whether they are bound or unbound. A segmentation that would leave a bound character by itself would not be allowed. A statistical way of choosing amongst ambiguous segmentations would also be useful.</p>
<p>More information on segmenting Chinese text can be found at <a href="http://www.chinesecomputing.com/" target="_top">ChineseComputing.com</a>.</p>
<p>Contact Erik Peterson at <a href="http://www.mandarintools.com/contact.html">this contact page</a> with questions or comments. Please visit <a href="http://www.mandarintools.com/" target="_top">Online Chinese Tools</a> for many more useful Chinese-related software tools.</p>
]]></content:encoded>
			<wfw:commentRss>http://ailab.mtir.net/?feed=rss2&amp;p=27</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Techno Trade CGI Archive (PERL cgi scripts)</title>
		<link>http://ailab.mtir.net/?p=25</link>
		<comments>http://ailab.mtir.net/?p=25#comments</comments>
		<pubDate>Thu, 17 Jun 2010 11:20:03 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Source Code]]></category>
		<category><![CDATA[CGI]]></category>
		<category><![CDATA[Perl]]></category>

		<guid isPermaLink="false">http://ailab.mtir.net/?p=25</guid>
		<description><![CDATA[Techno Trade CGI Archive
URL: http://www.technotrade.com/cgi/
Following is a list of some PERL cgi scripts that we&#8217;ve created. Feel free to browse through them and see some examples of the scripts in action. Some are FREE, others require a license fee. For more information on installing these scripts to your server, please check the FAQ page.



This script [...]]]></description>
			<content:encoded><![CDATA[<p>Techno Trade CGI Archive</p>
<p>URL: <a href="http://www.technotrade.com/cgi/">http://www.technotrade.com/cgi/</a></p>
<p>Following is a list of some PERL cgi scripts that we&#8217;ve created. Feel free to browse through them and see some examples of the scripts in action. Some are FREE, others require a license fee. For more information on installing these scripts to your server, please check the FAQ page.</p>
<table border="0" cellspacing="0" cellpadding="9" width="100%">
<tbody>
<tr>
<td align="left">This script is used to manage multiple usernames/passwords for .htaccess/.htpasswd directory protection. This works with the apache web server (tested on Unix systems) and can be used to handle multiple password protected directories.</td>
</tr>
<tr>
<td align="left" valign="top"><a href="http://www.technotrade.com/cgi/search.html"><span style="font-family: ARIAL, 'TIMES NEW ROMAN'; color: #a50000; font-size: small;"><strong>URL<br />
Search Engine</strong></span></a></td>
<td align="left">Feel like starting your own little yahoo ? Well, this is basically what this script does. It searches a Text database file for one or more keywords input by the user and displays all the URL&#8217;s that match the Search Query. This can also be changed to search any type of text database file.</td>
</tr>
<tr>
<td align="left" valign="top"><a href="http://technotrade.com/password"><span style="font-family: ARIAL, 'TIMES NEW ROMAN'; color: #a50000; font-size: small;"><strong>Password<br />
Protector</strong></span></a></td>
<td align="left">This script is a <strong>very</strong> simple form of web site password protection. Instead of directing people to your main html file, you use this script as a filter to verify users for the password to gain access. If the password is accepted, then the script automatically takes the browser to the 『protected』 web page.</td>
</tr>
<tr>
<td align="left" valign="top"><a href="http://www.technotrade.com/cgi/engines/"><span style="font-family: ARIAL, 'TIMES NEW ROMAN'; color: #a50000; font-size: small;"><strong>Search Engine<br />
Redirector</strong></span></a></td>
<td align="left">Lets your visitors search the web from your site by selecting the search engine they want to use. Also can log what they&#8217;re searching for on each engine.</td>
</tr>
<tr>
<td align="left" valign="top"><a href="http://technotrade.com/board/index.html"><span style="font-family: ARIAL, 'TIMES NEW ROMAN'; color: #a50000; font-size: small;"><strong>Web Based<br />
Message Board</strong></span></a></td>
<td align="left">The Message Board script lets users who come to your site join in a discussion group by reading and posting messages. Users get a very friendly graphical interface with icons for navigation, posting etc. and is very simple to use. This board doesn&#8217;t make a mess on your server by creating tons of directories and files each time a posting is made. Instead it just uses one text file for all the postings in a discussion group.</td>
</tr>
<tr>
<td align="left" valign="top"><a href="http://www.technotrade.com/cgi/jump.html"><span style="font-family: ARIAL, 'TIMES NEW ROMAN'; color: #a50000; font-size: small;"><strong>URL Jumper</strong></span></a></td>
<td align="left">Do you hate having to clog up or redesign your page each time you add a new URL link to another site or another one of your web pages ? The URL Jumper fixes that problem by organizing your links in a selection box where the user would just click on one and hit Go.</td>
</tr>
<tr>
<td align="left" valign="top"></td>
<td>Looking for more scripts ?<br />
Be sure to visit <a href="http://www.cgi-resources.com/">The CGI Resource Index</a>.</p>
<p><a href="http://www.technotrade.com/htaccess.html">htaccess</a></td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://ailab.mtir.net/?feed=rss2&amp;p=25</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>UCI Machine Learning Repository</title>
		<link>http://ailab.mtir.net/?p=22</link>
		<comments>http://ailab.mtir.net/?p=22#comments</comments>
		<pubDate>Wed, 19 May 2010 03:37:09 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[ML]]></category>
		<category><![CDATA[DM]]></category>

		<guid isPermaLink="false">http://ailab.mtir.net/?p=22</guid>
		<description><![CDATA[UCI Machine Learning Repository
Welcome to the UC Irvine Machine Learning Repository!
We currently maintain 189 data sets as a service to the machine learning community. You may view all data sets through our searchable interface. Our old web site is still available, for those who prefer the old format. For a general overview of the Repository, please visit our About [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ics.uci.edu/~mlearn/MLRepository.html">UCI Machine Learning Repository</a></p>
<p><strong>Welcome to the UC Irvine Machine Learning Repository!</strong></p>
<p>We currently maintain 189 data sets as a service to the machine learning community. You may <strong><a href="http://archive.ics.uci.edu/ml/datasets.html">view all data sets</a></strong> through our searchable interface. Our <a href="http://mlearn.ics.uci.edu/MLRepository.html">old web site</a> is still available, for those who prefer the old format. For a general overview of the Repository, please visit our <a href="http://archive.ics.uci.edu/ml/about.html">About page</a>. For information about citing data sets in publications, please read our <a href="http://archive.ics.uci.edu/ml/citation_policy.html">citation policy</a>. If you wish to donate a data set, please consult our <a href="http://archive.ics.uci.edu/ml/donation_policy.html">donation policy</a>. For any other questions, feel free to <a href="http://archive.ics.uci.edu/ml/contact.html">contact the Repository librarians</a>. We have also set up a <a href="http://mlr.cs.umass.edu/ml/">mirror site</a> for the Repository</p>
]]></content:encoded>
			<wfw:commentRss>http://ailab.mtir.net/?feed=rss2&amp;p=22</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Mining Tools See5 and C5.0</title>
		<link>http://ailab.mtir.net/?p=20</link>
		<comments>http://ailab.mtir.net/?p=20#comments</comments>
		<pubDate>Wed, 19 May 2010 03:33:31 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[ML]]></category>
		<category><![CDATA[DM]]></category>

		<guid isPermaLink="false">http://ailab.mtir.net/?p=20</guid>
		<description><![CDATA[Data Mining Tools See5 and C5.0

URL: http://www.rulequest.com/see5-info.html
sample applications
tutorial
published applications

]]></description>
			<content:encoded><![CDATA[<h1><span style="color: navy;">Data Mining Tools See5 and C5.0</span></h1>
<ul>
<li><span style="color: #000080;">URL: <a title="http://www.rulequest.com/see5-info.html" href="http://www.rulequest.com/see5-info.html">http://www.rulequest.com/see5-info.html</a></span></li>
<li><a href="http://www.rulequest.com/see5-examples.html">sample applications</a></li>
<li><a href="http://www.rulequest.com/see5-win.html">tutorial</a></li>
<li><a href="http://www.rulequest.com/see5-pubs.html">published applications</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://ailab.mtir.net/?feed=rss2&amp;p=20</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weka 3: Data Mining Software in Java</title>
		<link>http://ailab.mtir.net/?p=12</link>
		<comments>http://ailab.mtir.net/?p=12#comments</comments>
		<pubDate>Wed, 19 May 2010 03:11:30 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[ML]]></category>
		<category><![CDATA[DM]]></category>

		<guid isPermaLink="false">http://ailab.mtir.net/?p=12</guid>
		<description><![CDATA[Weka


http://www.cs.waikato.ac.nz/ml/weka/
http://sourceforge.net/projects/weka/


DataSet


http://www.cs.waikato.ac.nz/~ml/weka/index_datasets.html
http://www.public.asu.edu/~sji03/resources/index.html
http://datam.i2r.a-star.edu.sg/datasets/krbd/


Repository for Epitope Datasets (RED)


http://ailab.cs.iastate.edu/red/


Tutorial


Weka 中文站論壇 ‧ 查看主題 &#8211; [原創]WEKA入門教程

Weka-ExplorerGuide-3.5.5

]]></description>
			<content:encoded><![CDATA[<div id="_mcePaste">Weka</div>
<div id="_mcePaste">
<ul>
<li>http://www.cs.waikato.ac.nz/ml/weka/</li>
<li>http://sourceforge.net/projects/weka/</li>
</ul>
</div>
<div id="_mcePaste">DataSet</div>
<div id="_mcePaste">
<ul>
<li>http://www.cs.waikato.ac.nz/~ml/weka/index_datasets.html</li>
<li>http://www.public.asu.edu/~sji03/resources/index.html</li>
<li>http://datam.i2r.a-star.edu.sg/datasets/krbd/</li>
</ul>
</div>
<div id="_mcePaste">Repository for Epitope Datasets (RED)</div>
<div id="_mcePaste">
<ul>
<li>http://ailab.cs.iastate.edu/red/</li>
</ul>
</div>
<div id="_mcePaste">Tutorial</div>
<div id="_mcePaste">
<ul>
<li><a title="Weka 中文站論壇 - [原創]WEKA入門教程" href="http://forum.wekacn.org/viewtopic.php?f=2&amp;t=9&amp;sid=3e11f64d53cf134215bd69450412cdb91">Weka 中文站論壇 ‧ 查看主題 &#8211; [原創]WEKA入門教程</a></li>
</ul>
<p><a href="http://ailab.mtir.net/wp-content/uploads/2010/05/Weka-ExplorerGuide-3.5.5.pdf">Weka-ExplorerGuide-3.5.5</a></p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://ailab.mtir.net/?feed=rss2&amp;p=12</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>消基會檢測沒問題的6種洗衣清潔劑</title>
		<link>http://ailab.mtir.net/?p=8</link>
		<comments>http://ailab.mtir.net/?p=8#comments</comments>
		<pubDate>Sun, 03 Jan 2010 07:43:24 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[洗衣清潔劑]]></category>
		<category><![CDATA[消基會]]></category>
		<category><![CDATA[調查報告]]></category>

		<guid isPermaLink="false">http://ailab.mtir.net/?p=8</guid>
		<description><![CDATA[消基會在2008年5月16日發布了一項洗衣清潔劑的調查報告]]></description>
			<content:encoded><![CDATA[<p>消基會在2008年5月16日發布了一項洗衣清潔劑的調查報告，這項報告提供不少訊息! 這次消基會總共檢測了20種洗衣清潔劑，檢測報告歸納如下：</p>
<p>1. 檢測無問題的洗衣清潔劑： (可以安心購買)</p>
<ul>
<li> 一匙靈亮彩洗衣精 (130元)</li>
<li>藍寶低泡沬濃縮洗衣精 (109元)</li>
<li>白蘭無磷超濃縮洗衣精 (119元)</li>
<li>妙管家濃縮洗衣乳 (109元)</li>
<li>白鴿防蟎天然濃縮洗衣精 (159元)</li>
<li>台麗洗衣粉 (34元)</li>
</ul>
<p>cf: <a href="http://www.sharecool.org/archives/410">http://www.sharecool.org/archives/410</a></p>
]]></content:encoded>
			<wfw:commentRss>http://ailab.mtir.net/?feed=rss2&amp;p=8</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>會議資料</title>
		<link>http://ailab.mtir.net/?p=3</link>
		<comments>http://ailab.mtir.net/?p=3#comments</comments>
		<pubDate>Sun, 03 Jan 2010 07:08:06 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Meeting (會議)]]></category>
		<category><![CDATA[Notice]]></category>
		<category><![CDATA[會議]]></category>
		<category><![CDATA[論文]]></category>

		<guid isPermaLink="false">http://ailab.mtir.net/?p=3</guid>
		<description><![CDATA[會議資料

標題: 2010&#8242;0102 論文標題
上傳資料: 論文電子檔(doc/pdf), 投影片(ppt), &#8230;
網址連結: URL

]]></description>
			<content:encoded><![CDATA[<p>會議資料</p>
<ul>
<li>標題: 2010&#8242;0102 論文標題</li>
<li>上傳資料: 論文電子檔(doc/pdf), 投影片(ppt), &#8230;</li>
<li>網址連結: URL</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://ailab.mtir.net/?feed=rss2&amp;p=3</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hello world!</title>
		<link>http://ailab.mtir.net/?p=1</link>
		<comments>http://ailab.mtir.net/?p=1#comments</comments>
		<pubDate>Sat, 02 Jan 2010 19:22:44 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://ailab.mtir.net/?p=1</guid>
		<description><![CDATA[Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!
]]></description>
			<content:encoded><![CDATA[<p>Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!</p>
]]></content:encoded>
			<wfw:commentRss>http://ailab.mtir.net/?feed=rss2&amp;p=1</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
