|

At 180,000 entries, Jim Breen’s freeware Japanese dictionary is still growing

by Daniel Morales

Contributing Writer

We all know Jim. He casts a vigilant eye over thousands of us each day as we track down definitions for confusing kanji and decipher new vocabulary.

Jim’s bearded 34-by-45-pixel countenance sits atop his WWWJDIC online dictionary and supervises searches at a rate of 130,000 queries a day.

You may have figured out Jim’s identity by this point: Yes, it’s Jim Breen, research fellow at Monash University and father of the major freeware Japanese dictionary JMdict/EDICT. Even if you haven’t visited his site directly, you’ve undoubtedly come across his dictionary material in some form.

“If it wasn’t for Jim’s data, and I think to a certain extent how helpful he is as a person, we wouldn’t have the plethora of Japanese language learning apps we do,” ” notes Kim Ahlstrom of Jisho.org. “It’s had a profound impact.”

Jim’s JMdict dictionary is the backbone of many of the open-source apps and online dictionaries out there, including Jisho.org, Imiwa and Rikaikun, in addition to his own WWWJDIC. The dictionary recently reached 180,000 entries, up from 170,000 in August 2013.

Breen is very much old-school internet, in the best of ways. He maintains a detailed, 1990s-style homepage with a self-introduction and an explanation of his background, and a specialized Japanese page that includes extensive links to various online 用語集 (yōgoshū, glossaries), other 辞書 (jisho, dictionaries) and general Japanese resources.

This is fitting because the history of JMdict is, in a way, a history of the internet. A telecommunications engineer by trade, Breen began interacting online via usenets in the 1980s and had a passing interest in Japanese — in particular in getting Western computers to display Japanese text.

This interest in Japan, he says, “was just coincidental.” His children and wife studied musical instruments using the Suzuki method and the family went on a two-month study-abroad vacation in Matsumoto, Nagano Prefecture, in 1981, giving him the opportunity to pick up some of the language.

In 1991, he encountered a DOS-based dictionary created by Mark Edwards at the University of Wisconsin. The dictionary had 2,000 entries and could display Japanese text. Breen was smitten. “I didn’t realize I was going to be trapped in it for the rest of my life,” he says.

These were not easy programs to use. “Back in the early ’90s, handling multiple languages in text files was actually a fairly heroic thing to do,” Breen says. “Japanese text coding was an awful mess. And a major problem all the time was 文字化け (mojibake, corrupted text), because stuff would get screwed up.”

Breen took the dictionary, expanded it and created his own DOS dictionary that he distributed for free.

Before the web, he says, “we exchanged dictionary material by email, by FTP (file transfer protocol), by whatever methods you could find for moving stuff around. Gradually, the web became the vehicle for doing that.”

Twenty-seven years later, Breen has moved around enough material to solve some of the most vexing problems for students of Japanese, including the need to jump back and forth between dictionaries.

In the 1990s, without the ability to コピペ (kopipe, copy and paste) the word, students had to carry a 漢字辞典 (kanji jiten, kanji dictionary) to find individual characters and a regular dictionary to find the compound if it wasn’t listed in the kanji dictionary.

“Students these days just don’t know what it was like 30 years ago,” Breen says. “And it’s not just foreigners!” His dictionary gets 50,000 searches a day from Japan. Breen isn’t sure what proportion of those searches are performed by Japanese users, but many of the referral sites are Japanese.

The dictionary itself began simply as text files and Breen taking submissions via email. Rene Malenfant, a biology professor at the University of New Brunswick, was working on translations from Japanese at the time and admits he might have created a lot of this work for Breen: “In those early days, Jim was manually making all the edits himself, and I was often submitting more than 100 entries per day, so I’m sure it was rather taxing on him.”

Now the dictionary outputs XML files and there is an interactive system that allows anyone to submit a change or a new entry, which is then verified by one of several active editors.

Entries are complex enough to include commentary Breen likens to a “blog post” attached to each entry. “It’s lexicography in the open,” he says.

For example, the entry for 自撮り (jidori, selfie) shows an initial conversation in 2012 when the term was added, as well as some fiddling over the English definition, which was amended in 2014 to cover taking video in addition to pictures. セルフィー (serufii, selfie), on the other hand, wasn’t added until 2017.

There is also a Yahoo Groups mailing list where conversations continue at further length, and sometimes go on for years! This results in sentences like this: “The 国語s often show the inflection class, for example 大辞林 tags 働く as ‘動カ五,’ and 話す as ‘動サ五.'”

It’s hard not to love Breen’s use of the English plural on 国語 (kokugo, literally, “national language,” i.e., Japanese), which is an abbreviation for 国語辞典 (kokugo jiten, Japanese dictionary). The rest of the sentence explains how the well-respected dictionary 大辞林 (Daijirin) labels the inflection of the verbs 働く (hataraku, to work) and 話す (hanasu, to speak).

JMdict isn’t growing as rapidly as it once did. Entries are culled or combined. For example ソフトウェア著作権の侵害 (sofutowea chosakuken no shingai, piracy of software) was deleted but 著作権侵害 (chosakuken shingai, piracy) remains. Nevertheless, Jim and his team forge on, always looking to solve the next problem.

They are even solving linguistic issues the community may not have been aware of.

“I had a 電子辞書 (denshi jisho, electronic dictionary) in Japan,” says Ahlstrom. “I brought it back to Sweden and it broke because it was too cold. The cord between the keyboard and the screen had physically split in two.”

This was Ahlstrom’s equally coincidental start on his path to Jisho.org.

Breen’s dictionary has two entries that seem to apply to both his and Ahlstrom’s inspiration for their projects: 窮すれば通ず (Kyū sureba tsūzu) and 必要は発明の母 (Hitsuyō wa hatsumei no haha). They both mean “Necessity is the mother of invention.”