这几天整理的小巧中文分词词库供大家下载

这几天在还在优化中文分词,词库一直是个问题,所以重新整理了网上几个流行的词库,先发上一个比较小巧的词库(数据量92984)上来,带词性和tf/idf词频统计
大致的词性从以下代码中可以获得,整理好的词库文件在附件中下载,编码是UTF-8,比较适合小型项目使用,更大的还在整理中下载文件带词性和tf/idf词频统计小巧中文分词词库.rar (771.84 KB , 下载:4455次)
            HtPos.Add("a", "1073741824");
            HtPos.Add("b", "536870912");
            HtPos.Add("c", "268435456");
            HtPos.Add("d", "134217728");
            HtPos.Add("e", "67108864");
            HtPos.Add("f", "33554432");
            HtPos.Add("i", "16777216");
            HtPos.Add("l", "8388608");
            HtPos.Add("m", "4194304");
            HtPos.Add("mq", "2097152");
            HtPos.Add("n", "1048576");
            HtPos.Add("o", "524288");
            HtPos.Add("p", "262144");
            HtPos.Add("q", "131072");
            HtPos.Add("r", "65536");
            HtPos.Add("s", "32768");
            HtPos.Add("t", "16384");
            HtPos.Add("u", "8192");
            HtPos.Add("v", "4096");
            HtPos.Add("x", "1024");
            HtPos.Add("y", "512");
            HtPos.Add("z", "256");
            HtPos.Add("nr", "128");
            HtPos.Add("ns", "64");
            HtPos.Add("nt", "32");
            HtPos.Add("nz", "8");
 
            HtPosCsw.Add("a", "形容词");
            HtPosCsw.Add("b", "区别词");
            HtPosCsw.Add("c", "连词");
            HtPosCsw.Add("d", "副词");
            HtPosCsw.Add("e", "叹词");
            HtPosCsw.Add("f", "方位词");
            HtPosCsw.Add("i", "成语");
            HtPosCsw.Add("l", "习惯用语");
            HtPosCsw.Add("m", "数词");
            HtPosCsw.Add("mq", "数量词");
            HtPosCsw.Add("n", "名词");
            HtPosCsw.Add("o", "拟声词");
            HtPosCsw.Add("p", "介词");
            HtPosCsw.Add("q", "量词");
            HtPosCsw.Add("r", "代词");
            HtPosCsw.Add("s", "处所词");
            HtPosCsw.Add("t", "时间词");
            HtPosCsw.Add("u", "助词");
            HtPosCsw.Add("v", "动词");
            HtPosCsw.Add("x", "非语素字");
            HtPosCsw.Add("y", "语气词");
            HtPosCsw.Add("z", "状态词");
            HtPosCsw.Add("nr", "人名");
            HtPosCsw.Add("ns", "地名");
            HtPosCsw.Add("nt", "机构团体");
            HtPosCsw.Add("nz", "其他专名");
            HtPosCsw.Add("un", "未知词性");

引用通告地址: 点击获取引用地址
评论: 11 | 引用: 0 | 阅读: 8796
jszj [ 2008-12-01 15:02 | 回复 | 编辑 删除 ]
词库里类似13.53 8.30这样的东西是什么?
DGQ2010 [ 2010-03-04 02:14 | 回复 | 编辑 删除 ]
你的词库如何使用? 敬请指教!! 我的email: dinggangqiang@163.com
zhen [ 2010-04-13 17:42 | 回复 | 编辑 删除 ]
你好,你的词库很不错,可是词性有点看不懂,请指教
hanyu332@163.com [ 2010-08-25 16:51 | 回复 | 编辑 删除 ]
同上,能不能说一下,分词上面的那些数字是干嘛用的
qqlizewen [ 2010-09-28 22:59 | 回复 | 编辑 删除 ]
该内容只有管理员可见
chi flat iron [ 2011-06-23 17:44 | 回复 | 编辑 删除 ]
canada goose parka [ 2011-10-28 10:33 网址 | 回复 | 编辑 删除 ]
Bullpen coach Derek canada goose kensington parka Lilliquist took the call, and La Russa told him to get lefty Marc Rzepczynski ready and have righty canada goose parka Jason Motte play catch. brought in moncler jackets for men
Rzepczynski to face the lefthanded-hitting David Murphy, but when Mike Napoli came up, La Russa finally moncler jackets for women realized there and Napoli hit a game-winning two-run Moncler Clothing double.
发表评论
昵 称: 密 码:
网 址: 邮 箱:
选 项:    
头 像:
内 容: