Mozc UT2 Dictionary

20171002

Mozc NEologd UT Dictionary is here.
Mozc UT Dictionary (Discontinued) is here.


I lost a disk partition that includes tools for making mozc-ut dictionary.
I used yahoo or google's "hit numbers" to sort words in mozc-ut1,
but I can't do it again.
They don't provide free fast search API now.
I wrote mozc-ut2 from scratch.
I splitted Wikipedia's articles into 1 million files and got hit numbers by Hyper Estraier.
mozc-ut2 will add over 500,000 words.

Default entries

My big thanks go to the authors/maintainers.

Optional entries (See "Advanced: Add optional entries")

License


I think we can redistribute hatena's yomigana-hyouki pairs,
but I can't believe we can redistribute niconico's pairs.
If you want to make redistributable mozc-ut,
don't uncomment #NICODIC="true" in generate-dictionary.sh.

Download

https://osdn.net/users/utuhiro/pf/utuhiro/files/
Patched source code:
mozc-ut2-ver.date.tar.xz
Patch:
mozcdic-ut2-date.tar.bz2

Install

See mozc's official Build Instructions.
If you are using Arch Linux (tested on Antergos Linux),
you can make and install packages as follows:

$ mkdir mozc-tmp
$ mv mozc-ut2-ver.date.tar.xz mozc-tmp/
$ cd mozc-tmp/
$ tar xf mozc-ut2-ver.date.tar.xz
$ cp mozc-ut2-ver.date/PKGBUILD .
$ makepkg -f
$ makepkg -i

Advanced: Add optional entries

  1. Get the latest Mozc.

    $ mkdir mozc-tmp
    $ mv mozcdic-ut2-date.tar.bz2 mozc-tmp/
    $ cd mozc-tmp/
    $ tar xf mozcdic-ut2-date.tar.bz2
    $ cd mozcdic-ut2-date/src/
    $ sh get-latest-mozc.sh
    $ mv mozc-ver.tar.bz2 ../..
    $ cd ..
  2. Choose optional entries.

    $ leafpad generate-dictionary.sh

    If you want to use an English-Japanese dictionary, uncomment the following line.

    #EJDIC="true"

    If you want to use a niconico dictionary, uncomment the following line.

    #NICODIC="true"
  3. Generate mozc-ut.
    You need ruby > 1.9.

    $ ./generate-dictionary.sh

Advanced: Refresh hit numbers with the latest Japanese Wikipedia articles

  1. Install ruby and gcc-5.4.0.
    estcmd built with gcc-7.2.0 caused segfault.
    I sent mails to the author, but I couldn't get a reply.

    $ pacman -S ruby gcc5
  2. Install QDBM and Hyper Estraier.
    I use Hyper Estraier to get hit numbers.

    $ wget http://fallabs.com/qdbm/qdbm-1.8.78.tar.gz
    $ tar xf qdbm-1.8.78.tar.gz
    $ cd qdbm-1.8.78/
    $ ./configure --prefix=/usr --enable-zlib
    $ make -j4 CC=/usr/bin/gcc-5
    $ sudo make install
    $ wget http://fallabs.com/hyperestraier/hyperestraier-1.4.13.tar.gz
    $ tar xf hyperestraier-1.4.13.tar.gz
    $ cd hyperestraier-1.4.13/
    $ ./configure --prefix=/usr --enable-zlib
    $ make -j4 CC=/usr/bin/gcc-5
    $ sudo make install
    $ cd ../..
  3. Put mozcdic-ut2 into mozc-tmp.

    $ mkdir -p mozc-tmp
    $ mv mozcdic-ut2-date.tar.bz2 mozc-tmp/
    $ cd mozc-tmp/
    $ tar xf mozcdic-ut2-date.tar.bz2
  4. Get alt-cannadic.
    Get alt-cannadic-110208.tar.bz2.

    $ mv alt-cannadic-110208.tar.bz2 mozcdic-ut2-date/alt-cannadic/
  5. Change SEEDVER of mecab-user-dict-seed.
    Check mecab-user-dict-seed.yyyymmdd.csv.xz and
    change SEEDVER in neologd/generate-dictionary.sh.

    $ cd mozcdic-ut2-date/neologd/
    $ leafpad generate-dictionary.sh

    Change SEEDVER.

  6. Change MOZCVER and DICVER.

    $ cd ../
    $ leafpad generate-dictionary.sh

    Change MOZCVER and DICVER.

    $ cd src/
    $ leafpad generate-release.sh

    Change DICVER.

  7. Refresh hit numbers with the latest Japanese Wikipedia articles.

    $ sh update-dictionary.sh


HOME