Quantcast
Channel: ProZ.com Translation Forums
Viewing all articles
Browse latest Browse all 7927

Is anyone else having problems with MemoQ's handling of html or xml?

$
0
0
Forum: MemoQ support
Topic: Is anyone else having problems with MemoQ's handling of html or xml?
Poster: Thomas T. Frost

I used MemoQ 2015 to translate a page on my own website, and I have other pages to translate. However, after exporting the first translation, I noticed a number of problems with MQ's handing of html:

1. Code formatting destroyed

Html and xml code have two levels of formatting:
1. The formatting rendered to the end user by (typically) a browser.
2. The formatting of the code made by the programmer so he or she can overview the code. Line breaks and indentations are used for this.

Html and xml are line-oriented formats. Editors typically only display the first 80 characters of a line, although some have 'wrap line' options. Hence, many programmers try to keep their line lengths to what can be seen on the screen.

MQ does not respect any of level 2 formatting (of the code). Here are some examples:

Example 1

Source:

Export:

Example 2

Source:

Export:

Example 3

Source:

Export:

The carefully organised formatting intended to make the code easy to maintain is destroyed.

Another problem appears when one translates the texts, and MQ outputs as long a line as necessary to hold the text within a given set of

tags, meaning it cannot be displayed on the programmer's screen, and he has to manually insert as many line breaks as necessary to be able to read the text.

Their default html filter has the following option checked by default:
"Break segment at preserved newline characters: Check this check box to make memoQ start a new segment whenever it encounters a newline character in the HTML text, so that the newline character will be preserved in all cases."

But that's not how MQ behaves.

I reported this to Kilgray support on 25 July. So far, they haven't even admitted that it's a bug.

2. Html symbols not preserved

In example 3 above, one can see that the html symbol has been replaced with ©. That's indeed how it should be rendered, but keeping such characters as symbols in the code can prevent poor rendering caused by various software in the other end that doesn't respect all standards. It's not the translator's or the CAT tool's job to make the decision to change the symbols anyway. One has the option to export all characters that can be represented as symbols as symbols, but that outputs é as etc., not just those characters that were originally symbols.

Kilgray has not admitted that this is not desirable either.

3. Code page changed

I noticed that even though the source had a code page declaration, they changed it to UTF-8.

Their default filter does not have the following option checked:

"Use this codepage even if there is a different declaration in the file: Check this check box to enforce the import codepage selection. Use this when you suspect that the encoding declaration in the HTML file is incorrect or inconsistent. This check box is not checked by default."

So this looks like another bug, but they haven't admitted that yet.

UTF-8 may well be a better choice; it's just not the CAT tool's role to decide that.

4. tag syntax changed

By default, they will change . That is correct syntax for xml but not html.

However, unchecking the option

"Enforce empty tags: Check this check box to treat old-style tags as empty tags. Normally, these would be imported as opening tags, but with this setting memoQ will import them as XML-style empty tags in all cases – so you won't get rogue XML warnings when confirming segments in the document."

fixes this even though the explanation is totally cryptic and incomprehensible.

Am I the only one bothered by this, and does any other CAT tool do these things properly?

Fortunately, this happened on my own website, but I would have been very unhappy as a paying client to receive the mess MQ created in return, and I would require the translator to clean it up and re-establish the original formatting. Depending on the number of files, this clean-up task could take several hours. It took me 1-2 hours to clean up just one page on my own site.

What this means is that MQ is useless for html and xml, as I could not return such a shambles to a client.

Has anyone else had such problems?

How do other CAT tools handle html?

[Edited at 2015-08-10 13:47 GMT]


Viewing all articles
Browse latest Browse all 7927

Trending Articles