Mozc UT2 Dictionary

20171002

Mozc NEologd UT Dictionary is here.
Mozc UT Dictionary (Discontinued) is here.

Contents

Second mozc-ut
Default entries
Optional entries
License
Download
Install
Advanced: Add optional entries
Advanced: Refresh hit numbers with the latest Japanese Wikipedia articles

Second mozc-ut

I lost a disk partition that includes tools for making mozc-ut dictionary.
I used yahoo and google’s “hit numbers” to sort words in mozc-ut1,
but I can’t do it again. They don’t provide free search API now.
I wrote mozc-ut2 from scratch.
I splitted Wikipedia’s articles into 1 million files and got hit numbers by Hyper Estraier.
mozc-ut2 will add over 500,000 words.

Default entries

My big thanks go to the authors/maintainers.

Optional entries

License


I think we can redistribute hatena’s yomigana-hyouki pairs,
but I can’t believe we can redistribute niconico’s pairs.
If you want to make redistributable mozc-ut,
don’t uncomment #NICODIC=“true” in generate-dictionary.sh.

Download

https://osdn.net/users/utuhiro/pf/utuhiro/files/
Patched source code:
mozc-ut2-ver.date.tar.xz
Patch:
mozcdic-ut2-date.tar.bz2

Install

See mozc’s official Build Instructions.
If you are using Arch Linux (tested on Antergos Linux),
you can make and install packages as follows:

$ mkdir mozc-tmp
$ mv mozc-ut2-ver.date.tar.xz mozc-tmp/
$ cd mozc-tmp/
$ tar xf mozc-ut2-ver.date.tar.xz
$ cp mozc-ut2-ver.date/PKGBUILD .
$ makepkg -f
$ makepkg -i

Advanced: Add optional entries

  1. Get the latest Mozc.

     $ mkdir mozc-tmp
     $ mv mozcdic-ut2-date.tar.bz2 mozc-tmp/
     $ cd mozc-tmp/
     $ tar xf mozcdic-ut2-date.tar.bz2
     $ cd mozcdic-ut2-date/src/
     $ sh get-latest-mozc.sh
     $ mv mozc-ver.tar.bz2 ../..
     $ cd ..
  2. Choose optional entries.

     $ leafpad generate-dictionary.sh

    If you want to use an English-Japanese dictionary, uncomment the following line.

     #EJDIC="true"

    If you want to use a niconico dictionary, uncomment the following line.

     #NICODIC="true"
  3. Generate mozc-ut.
    You need ruby > 1.9.

     $ ./generate-dictionary.sh

Advanced: Refresh hit numbers with the latest Japanese Wikipedia articles

  1. Install ruby and gcc-6.4.1.
    estcmd built with gcc-7.2.0 caused segfault.
    I sent mails to the author, but I couldn’t get a reply.

     $ pacman -S ruby gcc6
  2. Install QDBM and Hyper Estraier.
    I use Hyper Estraier to get hit numbers.

     $ wget http://fallabs.com/qdbm/qdbm-1.8.78.tar.gz
     $ tar xf qdbm-1.8.78.tar.gz
     $ cd qdbm-1.8.78/
     $ ./configure --prefix=/usr --enable-zlib
     $ make -j4 CC=/usr/bin/gcc-6
     $ sudo make install
     $ wget http://fallabs.com/hyperestraier/hyperestraier-1.4.13.tar.gz
     $ tar xf hyperestraier-1.4.13.tar.gz
     $ cd hyperestraier-1.4.13/
     $ ./configure --prefix=/usr --enable-zlib
     $ make -j4 CC=/usr/bin/gcc-6
     $ sudo make install
     $ cd ../..
  3. Put mozcdic-ut2 into mozc-tmp.

     $ mkdir -p mozc-tmp
     $ mv mozcdic-ut2-date.tar.bz2 mozc-tmp/
     $ cd mozc-tmp/
     $ tar xf mozcdic-ut2-date.tar.bz2
  4. Get alt-cannadic.
    Get alt-cannadic-110208.tar.bz2.

     $ mv alt-cannadic-110208.tar.bz2 mozcdic-ut2-date/alt-cannadic/
  5. Change SEEDVER of mecab-user-dict-seed.
    Check mecab-user-dict-seed.yyyymmdd.csv.xz and
    change SEEDVER in neologd/generate-dictionary.sh.

     $ cd mozcdic-ut2-date/neologd/
     $ leafpad generate-dictionary.sh

    Change SEEDVER.

  6. Change MOZCVER and DICVER.

     $ cd ../
     $ leafpad generate-dictionary.sh

    Change MOZCVER and DICVER.

     $ cd src/
     $ leafpad generate-release.sh

    Change DICVER.

  7. Refresh hit numbers with the latest Japanese Wikipedia articles.

     $ sh update-dictionary.sh

HOME