In-house translation – outside the box

Tag: Trados

Generating bilingual EUR-LEX web addresses in Python
Following on from the Python courses that I recently completed, I wrote a little bit of code that generates the URLs for finding bilingual versions of EU Directives and Regulations in Eur-Lex. This can be useful for allowing a translator to access to a bilingual version for then aligning the output in Trados using Bilingual Excel.

How does it work?

It asks you to specify the language versions you with to use, and you specify one language to appear on the left-hand side, and one on the right-hand side. This can be useful for the order of the columns for alignment using the Bilingual Excel filetype in Trados.

You are asked to specific the year number and the item number (e.g. CRR which is Regulation (EU) 575/2013 (note the old order) or the IFD: Directive (EU) 2019/2034), and whether you are dealing with a Directive or Regulation. Future iterations will handle Decisions etc. The itemtype ensures the correct “filler” letter in the URL.

In case the item number (e.g. 575 or 36) is not four digits in length, item.zfill(4) pads it to four digits (i.e. 0575 or 0036).

If you want the original version as it first appeared in the Official Journal then you select Original (O). Consolidated versions are chosen using (C). In the latter case you are asked for the date of the consolidated version (note this date is when in enters into force, not the data of the OJ publication). This also needs either a 3 (for legislation) or 0 (consolidated legislation) to be entered into the URL.

The Code:
```
#filename bits
urlstart = "https://eur-lex.europa.eu/legal-content/"
urlcelex = "/TXT/?uri=CELEX:"
exportfiletype = ".txt"
sep = "-"

# input language, year number, item number and whether Directive or Regulation
sourcelang = input("Enter left language (EN = English, DE = German):")
targetlang = input("Enter right language (EN = English, DE = German):")
year = input("Enter year:")
fileyear = year + "_"
item = input("Enter item:")
paditem = item.zfill(4)
fileitem = paditem
rlvo = input("Enter type (D for Directive, R for Regulation):")

if rlvo == "D":
     itemtype = "L"
     
elif rlvo == "R":
     itemtype = "R"

# consolidated
consolidated = input("Consolidated version (C) or Original version (O):")

if consolidated == "O":
    legislation = 3
    unconsurl = urlstart + sourcelang + sep + targetlang + urlcelex + str(legislation) + str(year) + itemtype + str(paditem)
    print(unconsurl)

elif consolidated == "C":
    legislation = 0
    consolyear = input("Enter year of consolidated version as YYYY:")
    consolmonth = input("Enter month of consolidated version:")
    padconsolmonth = consolmonth.zfill(2)
    consolday = input("Enter day of consolidated version:")
    padconsolday = consolday.zfill(2)
    consurl = urlstart + sourcelang + sep + targetlang + urlcelex + str(legislation) + str(year) + itemtype + str(paditem) + sep + consolyear + padconsolmonth + padconsolday
    print(consurl)
```
3 December 2022
Stripping non-breaking hyphens in Word
The non-breaking hyphen is a useful device. It is a special character that ensures that “Greco-Roman” in “Greco-Roman wrestling” never splits across two lines. Similarly, it helps avoid “gemischte EU-Mutterfinanzholdinggesellschaften” splitting as gemischte EU- / Mutterfinanzholdinggesellschaften (see e.g. Article 1, para. 1 no. 4 BaSAG for a typical usage!) or similar compound nouns in legal translations. In the latter case, if hyphenation is turned on in narrow columns, you end up with some strange-looking lines.

However, for translators using CAT tools, it can be annoying, with non-breaking hyphens rendered as tags in the source text. This can cause problems in the CAT tool as it will flag the target text as missing a tag. Fortunately there is a simple way in Word to search and replace all non-breaking hyphens, to get around this problem. Once you have mastered this for non-breaking hyphens there are other use cases for other special characters.

What to do in Word

There is a wildcard (similar to a regular expression) in Word for finding non-breaking hyphens – which is ^~. To use it, extend the Search/replace dialogue (Ctrl + H) by clicking on “More > >” and then select the wildcards option. Alternatively you can add this wildcard character through the “Special” button at the bottom of this dialogue. Then do a search/replace for all instances.

Naturally, going through a lot of documents can become a time-consuming process. Help is at hand in the form of a macro. If you use it a bit, it definitely warrants having a Word Macro for this, which appears below.
```
Sub ChangeNonBreakHyphen()
'
' ChangeNonBreakHyphen Makro
' Converts non-breaking hyphens into a normal hyphen.
'
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "^~"
        .Replacement.Text = "-"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub
```
Add it to your macros in Word through the Developer tab. I even have it added to my ribbon tab containing all my translation macros. You may find other special characters that can be treated in a similar manner.

If you discover a slew of non-breaking hyphens in the middle of your source text in Trados, there is an easy solution. You can clean up the Word document, add it to your translation project again, and run a Perfect Match. afterwards.
20 November 2022
ETUG 2022 – incorporating eTranslation into Trados workflows
I recently moderated a session on incorporating the European Commission’s eTranslation MT solution into Trados workflows at ETUG 2022. ETUG 2022 is annual meeting of the European Trados Users Group. It brings together translators, language technologists from corporate language services and representatives from RWS to talk about all aspects Trados. RWS unveil roadmaps for their products and there are use case presentations, like in my session.

Catherine Lane and Daniel García-Magariños from the Language Technology and Innovation Unit within DG Translation at the European Central Bank demonstrated how they have approached incorporating eTranslation into Trados workflows.

For me, as a banking supervision translator, it made moderating the session simpler, but interventions from the floor from the automotive industry also provided valuable inputs on some of the considerations for use of MT in workflows.

The ECB Experience

The presentation was split into two parts. Catherine addressed setting up the Finance engine, including the QA of imported data, and language combinations and available engines. Daniel demonstrated the tool developed by the ECB for importing machine translations TMX files.

Catherine dealt with how the ECB has applies rules about the level of confidentiality of documents that can be sent to eTranslation. Mitigations are in place (e.g. files downloaded from eTranslation, not by e-mail, and deleted immediately from the system after download). These measures are necessary for ensuring that the files do not remain in the system for any longer than is necessary.

Catherine also addressed issues about onboarding of translators – they had used an eLearning module to handle some of the training. Currently it is still an additional aid to complement existing server-based human translation TMs, and not a direct replacement, and serving more as a starting point where existing TMs did not include good fuzzy matches for sentences.

Currently translations are only delivered for one engine at a time. However, it is possible to have translations into multiple languages. I meant to ask about pivot languages for exotic combinations – e.g. for Finnish-Maltese does MT output involve an intermediate step through English?

MT’s Top Model

Another consideration is about which engines to use. For my area of work, I would probably use 2-3 engines (e.g. Bundesbank Neural, Finance, Formal). This would require running the process three times at the moment. Depending on the text type, however, the Formal engine (e.g. for legal texts) might prove the most useful. The Finance engine would prove more useful for financial texts.

As Daniel explained, processing power also means that there is currently not direct way to access eTranslation from inside Trados. Instead, eTranslation translates the document and the output made available to download. Downloaded files are imported into a separate Translation Memory for MT results. In the translation project a 20% penalty applies. The TM settings are “lookup” and “concordance” enabled, but “update” disabled. This essentially means it is a read-only translation memory.

The ECB’s “eTranslator importer” helps ensure that the files land in the right place and domain-specific fields appended to each TU. This includes extra field content about the engine used. The Translation Memory is cleared regularly.

Averse – Ambivalent – Evangelist

Three attitudes towards MT emerged in the discussion about the uptake among translators. I called them “averse”, i.e. those who opposed the use of MT, “ambivalent” i.e. nice to have but not a deal-breaker, and “evangelist”. There has been some move away from “averse” towards “ambivalent”. Possibly this is due to the emergence of NMT, thereby overcoming the aversity displayed towards statistical Machine Translation.

A similar project from the automotive industry mentioned that their own project had only given access to more experienced translators. Less-experienced translators might lack the depth of knowledge to identify that a fluent sounding TU was in fact incorrect.

I am in the “ambivalent” camp. The potential uses for eTranslation in my setting in banking supervision are evident. I am aware of the fact that there is still a considerable post-editing of the MT required. My direct concern is needing to pseudonymise all mentions of the entities in question. Similarly for any placeable values (e.g. about total assets etc.), but doing so negates the productivity gain.

I find MT output is very rigid in its word order, whereas I like to invert sentences to in turn negate the use for a passive in English

However, I can understand and appreciate that texts carefully prepared for translation (check out search results for “writing for translation” to get an idea), mean a greater productivity gain. This might in turn improve unnecessary verbosity and lead to clearer writing.

Takeaways from the session

A few take-aways from the break-out session on integrating eTranslation into the translation workflow of the European Central Bank
- Any institution, agency and authority with access to eTranslation can use this approach.
- There are a number of domain-specific engines. Currently eTranslation only uses a single engine per request. Different engines seem better suited to different text types.
- eTranslation works for all EU official languages, and there are also some other non-EU languages (e.g. Ukrainian, Chinese, Arabic)
- ECB used language data from central banks and supervisory authorities to build the Finance engine.
- A 20% penalty to MT output means that eTranslation output only comes into play where there are no human-translated and verified TUs.
- From an assessment of translation quality for pure MT out, language combinations with the largest number of TUs achieve the best results.
- Translators fall into three camps “averse”, “ambivalent”, “evangelist”. Some sceptics (averse) are becoming more enthusiastic, partially due to the advent of NMT.
- Future developments include tools for anonymization or pseudonymisation – essential when using names of entities etc.
- Translator experience level may contribute to gains from these workflows.
18 September 2022