Mozc UT2 Dictionary

20171002

Mozc NEologd UT Dictionary is here. Mozc UT Dictionary (Discontinued) is here.

Second mozc-ut

I lost a disk partition that includes tools for making mozc-ut dictionary. I used yahoo and google's "hit numbers" to sort words in mozc-ut1, but I can't do it again. They don't provide free search API now. I wrote mozc-ut2 from scratch. I splitted Wikipedia's articles into 1 million files and got hit numbers by Hyper Estraier. mozc-ut2 will add over 500,000 words.

Default entries

My big thanks go to the authors/maintainers.

Optional entries

License

I think we can redistribute hatena's yomigana-hyouki pairs, but I can't believe we can redistribute niconico's pairs. If you want to make redistributable mozc-ut, don't uncomment #NICODIC="true" in generate-dictionary.sh.

Download

https://osdn.net/users/utuhiro/pf/utuhiro/files/

Install

See mozc's official Build Instructions. If you are using Arch Linux (tested on Antergos Linux), you can make and install packages as follows:

mkdir mozc-tmp
mv mozc-ut2-ver.date.tar.xz mozc-tmp/
cd mozc-tmp/
tar xf mozc-ut2-ver.date.tar.xz
cp mozc-ut2-ver.date/PKGBUILD .
makepkg -f
makepkg -i

Advanced: Add optional entries

  1. Get the latest Mozc.

    mkdir mozc-tmp
    mv mozcdic-ut2-date.tar.bz2 mozc-tmp/
    cd mozc-tmp/
    tar xf mozcdic-ut2-date.tar.bz2
    cd mozcdic-ut2-date/src/
    sh get-latest-mozc.sh
    mv mozc-ver.tar.bz2 ../..
    cd ..
  2. Choose optional entries.

    leafpad generate-dictionary.sh
  3. If you want to use an English-Japanese dictionary, uncomment the following line.

    #EJDIC="true"
  4. If you want to use a niconico dictionary, uncomment the following line.

    #NICODIC="true"
  5. Generate mozc-ut. You need ruby > 1.9.

    ./generate-dictionary.sh

Advanced: Refresh hit numbers with the latest Japanese Wikipedia articles

You need 35GB disk space (use SSD) and it will take 8 hours.

This will download the latest edict/hatena/niconico/skk-jisyo files, and refresh hit numbers with the latest Japanese Wikipedia articles.

  1. Install ruby and gcc-6.4.1.

    estcmd built with gcc-7.2.0 caused segfault. I sent mails to the author, but I couldn't get a reply.

    pacman -S ruby gcc6
  2. Install QDBM and Hyper Estraier.

    I use Hyper Estraier to get hit numbers.

    wget http://fallabs.com/qdbm/qdbm-1.8.78.tar.gz
    tar xf qdbm-1.8.78.tar.gz
    cd qdbm-1.8.78/
    ./configure --prefix=/usr --enable-zlib
    make -j4 CC=/usr/bin/gcc-6
    sudo make install
    wget http://fallabs.com/hyperestraier/hyperestraier-1.4.13.tar.gz
    tar xf hyperestraier-1.4.13.tar.gz
    cd hyperestraier-1.4.13/
    ./configure --prefix=/usr --enable-zlib
    make -j4 CC=/usr/bin/gcc-6
    sudo make install
    cd ../..
  3. Put mozcdic-ut2 into mozc-tmp.

    mkdir -p mozc-tmp
    mv mozcdic-ut2-date.tar.bz2 mozc-tmp/
    cd mozc-tmp/
    tar xf mozcdic-ut2-date.tar.bz2
  4. Get alt-cannadic.

    Get alt-cannadic-110208.tar.bz2.

    mv alt-cannadic-110208.tar.bz2 mozcdic-ut2-date/alt-cannadic/
  5. Change SEEDVER of mecab-user-dict-seed.

    Check mecab-user-dict-seed.yyyymmdd.csv.xz and change SEEDVER in neologd/generate-dictionary.sh.

    cd mozcdic-ut2-date/neologd/
    leafpad generate-dictionary.sh
  6. Change MOZCVER and DICVER.

    cd ../
    leafpad generate-dictionary.sh
  7. Change DICVER.

    cd src/
    leafpad generate-release.sh
  8. Refresh hit numbers with the latest Japanese Wikipedia articles.

    sh update-dictionary.sh

HOME