Forum: MemoQ support
Topic: Import EDICT or other dictionary into memoQ as TM or other resource
Poster: Michael J.W. Beijer
Post title: short on time at the moment, but ...
[quote]pmzeitler wrote:
I'm a relative neophyte to translation memory software, but I've tinkered with the memoQ trial long enough to know I want to go forward using that. However, I can't help but shake the feeling that I'm either missing some element of the software that would make my life easier, or that I'm using the software incorrectly.
Here's my current process: (SL Japanese, TL American English)
1. Scan the page using a flatbed scanner, and OCR it in FineReader. Repeat for all pages in the project.
2. Load the OCR output (plaintext) into memoQ and import my existing TM files.
3. Line by line read the SL text and compose a translation based on suggestions provided by the TM and/or going out to Bing Translate API (non-optimal solution as it's metered access and I'm poor at the moment after shelling out for memoQ, plus it requires an Internet connection).
3a. If I encounter a kanji I'm not familiar with (there are a few of these, given that I'm in the third year of a 4-year college level Japanese program), copy it to the clipboard and paste into a dictionary app (outside of memoQ, local to the computer) for a definition.
3b. Manually create a new TM entry (CTRL-Q?) for the kanji and any variants of its appearance (e.g. for adjectivals, both the -i and -ku forms).
4. Refine the translation for readability.
Where I think the Doing It Wrong is in steps 3a and 3b. I think there may be a way to have MemoQ do the lookup based on a dictionary file stored locally. Such a file exists-- Jim Breen's EDICT file-- but it is in a format that if memoQ can read it, it is not obvious to me. EDICT is in a somewhat proprietary XML format but Mr. Breen does provide the DTD, so I can probably extract data out of it if necessary to transform it to a well-known import format that memoQ can digest.
(Another post on this topic, which yielded no answers to how this extraction could be done, was looking to extract the data from WWWDIC, which violated that package's licensing. Based on my reading of the EDICT license, I believe that transforming the data does NOT violate the license, and that even if I were to distribute the data in memoQ-readable format, I would still not be in violation of the EDICT license assuming I provided the appropriate attribution. If I am in error on this point as well, please say so.)
This may seem like I'm too deep in the weeds on this. I expect I am, actually. However, even with the manual entry into the TM, using memoQ has dramatically accelerated the pace at which I can produce work. For a ten-page project I had to complete last semester, it took me roughly twenty hours to do the translation by hand, with only an electronic dictionary available to me. I chewed through a lot of time in kanji identification and lookup. As a test of memoQ, I loaded three pages (of a different source text of equivalent difficulty) in, and was through two of them in less than an hour, even with the copy-lookup method. I'm sticking with the software, but if it can give me that little push I need to stay in one program, I'll be even happier with it.
Thank you for your time and insight. [/quote]
No time right now, but it ought to be relatively simple to convert the data (e.g. this file: [url removed] ) into a format memoQ can understand/import. You should also ask over @ [url removed]
I also vaguely remember that quite a few Japanese/English translators hang out over at [url removed] #!forum/felix-users (where you also might want to ask, as someone there might already have converted the data)
Michael
Topic: Import EDICT or other dictionary into memoQ as TM or other resource
Poster: Michael J.W. Beijer
Post title: short on time at the moment, but ...
[quote]pmzeitler wrote:
I'm a relative neophyte to translation memory software, but I've tinkered with the memoQ trial long enough to know I want to go forward using that. However, I can't help but shake the feeling that I'm either missing some element of the software that would make my life easier, or that I'm using the software incorrectly.
Here's my current process: (SL Japanese, TL American English)
1. Scan the page using a flatbed scanner, and OCR it in FineReader. Repeat for all pages in the project.
2. Load the OCR output (plaintext) into memoQ and import my existing TM files.
3. Line by line read the SL text and compose a translation based on suggestions provided by the TM and/or going out to Bing Translate API (non-optimal solution as it's metered access and I'm poor at the moment after shelling out for memoQ, plus it requires an Internet connection).
3a. If I encounter a kanji I'm not familiar with (there are a few of these, given that I'm in the third year of a 4-year college level Japanese program), copy it to the clipboard and paste into a dictionary app (outside of memoQ, local to the computer) for a definition.
3b. Manually create a new TM entry (CTRL-Q?) for the kanji and any variants of its appearance (e.g. for adjectivals, both the -i and -ku forms).
4. Refine the translation for readability.
Where I think the Doing It Wrong is in steps 3a and 3b. I think there may be a way to have MemoQ do the lookup based on a dictionary file stored locally. Such a file exists-- Jim Breen's EDICT file-- but it is in a format that if memoQ can read it, it is not obvious to me. EDICT is in a somewhat proprietary XML format but Mr. Breen does provide the DTD, so I can probably extract data out of it if necessary to transform it to a well-known import format that memoQ can digest.
(Another post on this topic, which yielded no answers to how this extraction could be done, was looking to extract the data from WWWDIC, which violated that package's licensing. Based on my reading of the EDICT license, I believe that transforming the data does NOT violate the license, and that even if I were to distribute the data in memoQ-readable format, I would still not be in violation of the EDICT license assuming I provided the appropriate attribution. If I am in error on this point as well, please say so.)
This may seem like I'm too deep in the weeds on this. I expect I am, actually. However, even with the manual entry into the TM, using memoQ has dramatically accelerated the pace at which I can produce work. For a ten-page project I had to complete last semester, it took me roughly twenty hours to do the translation by hand, with only an electronic dictionary available to me. I chewed through a lot of time in kanji identification and lookup. As a test of memoQ, I loaded three pages (of a different source text of equivalent difficulty) in, and was through two of them in less than an hour, even with the copy-lookup method. I'm sticking with the software, but if it can give me that little push I need to stay in one program, I'll be even happier with it.
Thank you for your time and insight. [/quote]
No time right now, but it ought to be relatively simple to convert the data (e.g. this file: [url removed] ) into a format memoQ can understand/import. You should also ask over @ [url removed]
I also vaguely remember that quite a few Japanese/English translators hang out over at [url removed] #!forum/felix-users (where you also might want to ask, as someone there might already have converted the data)
Michael