Post by Ｓｒｉｎ・Ｔｕａｒ Post by DawnF
So if anyone is working on JMDICT and has hints or tips, i can't bribe
you, but i'd appreciate it ;-)
Maybe. A small C or perl program using expat should be more than enough.
Just come up with a properly normalized schema that captures all the
aspects of the jmdict that you care about and it could be maybe 30mins
worth of effort to write the converter.
Quite likely expat would be able to parse it OK, although I think it
would take some fiddly programming for someone not very familiar
with XMl and expat.
My XMLish colleagues swear by XSLT for these things, so I thought I'd
try it to generate flat EDICT-ish extracts. Sadly, like a lot of
XML utilities, libxslt loads the whole file into an inefficient memory
structure. The result was that on my Linux box with 256M of RAM and 512M
of swap, I was out of memory in seconds. (With JMdict being "only" 36M,
this is rather serious inflation. I'm told the problem is just as
severe with Windows-based XML utilities.)
The trick in this case is to do it in a script that extracts one entry at
a time from the JMdict file, then hits it with the XSLT stuff. I gave up
before trying this step.
(Maybe you can see why I don't edit JMdict in native XML.
I can *just* get xmllint to work over JMdict without blowing my memory,
but that's about all.)
Jim Breen http://www.csse.monash.edu.au/~jwb/
Computer Science & Software Engineering,
Monash University, VIC 3800, Australia