(last updated: 20081005)
Add word frequencies by Yahoo search
I added numbers of Yahoo hits to EDICT.
I analyzed word frequencies in Goo Blog to know daily expressions.
Goo Blog is a popular blog service, and the expressions are used by
modern Japanese people.
Get "edict-freq-20081002" in
It includes a program to get word frequencies.
It will do that automatically:
The word "室内" in blog.goo.ne.jp - 147,000
室内 [しつない] /(n,adj-no) indoor/inside the room/(P)/
室内 [しつない] /(n,adj-no) indoor/inside the room/(P)/###147000/
It will take 4 days to analyze EDICT's 160,000 words.
1. The word frequencies are not always correct.
滑り [すべり] /(n) sliding/slipping/
滑り [ぬめり] /(n) (uk) viscous liquid/slime/mucus/(P)/
Yahoo search returns same numbers for the words, because
Suberi and Numeri have same spellings.
But the pronunciations are different and they have different meanings.
2. Yahoo search might not detect short words properly.
Yahoo search might detect one "滑り台" and two "滑り".
"滑り台" and "滑り" have same parts, but they are different words.
Yahoo search will return inflated numbers for short words.
I generated Romaji EDICT.
It includes a conversion program.
It will take over 20 minutes to convert.
１００円ショップ [ひゃくえんショップ] /(n) hundred-yen store/
１００円ショップ [hyakuensyoppu] /(n) hundred-yen store/