Additional dictionaries for Mozc

(last updated: 20160524)

Index

Mozc UT Dictionary (20160524)
Mozc NEologd UT Dictionary (20160524)
Note: You can't use these dictionaries at the same time.

Mozc UT Dictionary

This will add over 580,000 words.

I modified these dictionaries.
My big thanks go to the authors/maintainers.

■ Default dictionaries
・alt-cannadic
・Japanese names (I wrote it)
・hatena keywords
・SKK-JISYO.L
・EDICT
station names

・katakana-English dictionary generated from EDICT
 e.g.
 Type "いんたーねっと" and press space ⇨ Internet
 If you don't want to use it, run
 $ /usr/lib/mozc/mozc_tool --mode=config_dialog
 and uncheck "Katakana to English conversion" in "Dictionary" tab.

・zip code dictionary and place names generated from Japan Post's zip code
 e.g.
 Type "920-2368" and press space ⇨ 石川県白山市出合町
 Type "はくさんしで" ⇨ 白山市出合町 will be suggested.
 If you need the latest zip code and place names,
 apply the mozcdic-ut patch to the official Mozc.
 ⇨ See "Advanced".
 It will download the latest Japan Post's zip code file.

■ Optional dictionaries (See "Advanced")
・English-Japanese dictionary generated from Japanese WordNet
 e.g.
 Press Caps Lock, type "dolphin" and press Tab.

 If you need a text dictionary file, check this page.
 http://www.geocities.jp/ep3797/wordnet-ejdic_01.html

・niconico daihyakka IME dictionary

■ License
altcanna, jinmei, skk: GPL
hatena: unknown
EDICT: Creative Commons Attribution-ShareAlike Licence (V3.0)
ekimei: redistributable
zip code: public domain
Japanese WordNet: http://nlpwww.nict.go.jp/wn-ja/license.txt
niconico: unknown
ruby/shell scripts: GPL

I think we can redistribute hatena's yomigana-hyouki pairs,
but I can't believe we can redistribute niconico's pairs.
If you want to make redistributable mozc-ut,
don't uncomment #NICODIC="true" in generate-mozc-ut.sh.

■ Download
Patched source code (EJDIC="false", NICODIC="false"):
mozc-ut-2.17.2322.102.20160524.tar.xz

Patch:
mozcdic-ut-20160524.tar.bz2

■ Install
See mozc's official "LinuxBuildInstructions".

If you are using Arch Linux (tested on Antergos Linux),
you can make and install packages as follows:
$ mkdir mozc-tmp
$ mv mozc-ut-2.17.2322.102.20160524.tar.xz mozc-tmp/
$ cd mozc-tmp/
$ tar xf mozc-ut-2.17.2322.102.20160524.tar.xz
$ cp mozc-ut-2.17.2322.102.20160524/PKGBUILD .
$ makepkg -f
$ makepkg -i

■ Advanced: Generate your mozc-ut
1. Get official mozc source files.
$ mkdir mozc-tmp
$ mv mozcdic-ut-20160524.tar.bz2 mozc-tmp/
$ cd mozc-tmp/
$ tar xf mozcdic-ut-20160524.tar.bz2
$ cd mozcdic-ut-20160524
$ ./get-latest-mozc.sh
$ mv mozc-2.17.2322.102.tar.bz2 ..

2. Select optional dictionaries.
Open "generate-mozc-ut.sh".

If you want to use an English-Japanese dictionary,
uncomment the following line.
#EJDIC="true"

If you want to use a niconico dictionary,
uncomment the following line.
#NICODIC="true"

3. Generate a mozc-ut dictionary.
You need ruby > 1.9.
$ ./generate-mozc-ut.sh
Wait for a few minutes.

4. Install mozc-ut.
$ cd ../mozc-ut-2.17.2322.102.20160524/
Build mozc-ut.

■ 収録基準
単語のヒット数を検索し、一定数以上ヒットしたものを収録しています。
ヒット数以外にも各種の条件を設定して、
収録単語を増やしつつもなるべくMozcの変換を壊さないよう気をつけています。

人名はなるべく多く収録するため基準を甘めにしていますが、
それ以外の単語は抑制的に収録しています。

■ 辞書の形式
読み 品詞 ヒット数 表記

ヒット数は「キーボード」が75万件ヒットする場合の数値。
例えば「冷蔵庫」が400万件ヒットして「キーボード」が150万件ヒットする場合、
補正して200万件にする。

得られるヒット数は時期によって差が激しいので、
「マイナーな単語を除外する」という程度の効果しかありません。



Mozc NEologd UT Dictionary

I modified mecab-ipadic-NEologd's "user-dict" for Mozc.
https://github.com/neologd/mecab-ipadic-neologd/tree/master/seed

The "user-dict" will be updated twice per week(!) (Monday and Thursday) automatically.
mecab-ipadic-NEologd was written by Toshinori Sato (@overlast).
Thank you so much!

mozcdic-neologd-ut will add over 690,000 words.

■ Download
Patched source code:
mozc-neologd-ut-2.17.2322.102.20160524.1.tar.xz
It includes zip code data.
e.g. "110-0001" is converted to "東京都台東区谷中".

Patch:
mozcdic-neologd-ut-20160524.1.tar.bz2

■ Install
See mozc's official "LinuxBuildInstructions".

If you are using Arch Linux (tested on Antergos Linux),
you can make and install packages as follows:
$ mkdir mozc-tmp
$ mv mozc-neologd-ut-2.17.2322.102.20160524.1.tar.xz mozc-tmp/
$ cd mozc-tmp/
$ tar xf mozc-neologd-ut-2.17.2322.102.20160524.1.tar.xz
$ cp mozc-neologd-ut-2.17.2322.102.20160524.1/PKGBUILD .
$ makepkg -f
$ makepkg -i

■ Advanced: Generate your mozc-neologd-ut
1. Get the latest mecab-user-dict-seed.*.csv.xz
https://github.com/neologd/mecab-ipadic-neologd/tree/master/seed

2. Put it into mecab-ipadic-neologd/
$ rm mecab-ipadic-neologd/mecab-user-dict-seed.20160524.csv.xz
$ mv mecab-user-dict-seed.*.csv.xz mecab-ipadic-neologd/

3. Get the latest Mozc
$ ./get-latest-mozc.sh
$ mv mozc-*.tar.bz2 ..

4. Change version numbers
$ leafpad generate-mozc-neologd-ut.sh
Change "MOZCVER" and "DICVER" numbers.

5. generate mozc-neologd-ut
$ ./generate-mozc-neologd-ut.sh

HOME