(last updated: 20081005)
Add word frequencies by Yahoo search
I added numbers of Yahoo hits to EDICT.
I analyzed word frequencies in Goo Blog to know daily expressions.
Goo Blog is a popular blog service, and the expressions are used by
modern Japanese people.
Get "edict-freq-20081002" in
ftp.monash.edu.au.
It includes a program to get word frequencies.
It will do that automatically:
http://search.yahoo.co.jp/search?p="室内"&vs=blog.goo.ne.jp
The word "室内" in blog.goo.ne.jp -
147,000 hits.
Original EDICT
室内 [しつない] /(n,adj-no) indoor/inside the room/(P)/
EDICT-freq
室内 [しつない] /(n,adj-no) indoor/inside the room/(P)/###147000/
It will take 4 days to analyze EDICT's 160,000 words.
NOTES:
1. The word frequencies are not always correct.
e.g.
滑り [すべり] /(n) sliding/slipping/
滑り [ぬめり] /(n) (uk) viscous liquid/slime/mucus/(P)/
Yahoo search returns same numbers for the words, because
Suberi and Numeri have same spellings.
But the pronunciations are different and they have different meanings.
2. Yahoo search might not detect short words properly.
e.g.
滑り台で滑りました。
Yahoo search might detect one "滑り台" and two "滑り".
"滑り台" and "滑り" have same parts, but they are different words.
Yahoo search will return inflated numbers for short words.
Romaji EDICT
I generated Romaji EDICT.
edict-romaji-20081002.tar.bz2
It includes a conversion program.
NOTE:
It will take over 20 minutes to convert.
Original EDICT
100円ショップ [ひゃくえんショップ] /(n) hundred-yen store/
Romaji EDICT
100円ショップ [hyakuensyoppu] /(n) hundred-yen store/