How RegEx pairs help me to prepare WordPress content for translation

application blur business code

Three little RegEx pairs used to strip out tag pairs in WordPress posts and pages prior to translation.

One of my regular tasks is to translate news items (WordPress posts) on my employer’s website. If contributions contain ABBR and LANG tags for accessibility, I usually need to remove them before I translate. These approaches naturally also work for pages in WordPress. Some content generated in Word might contain other hidden tags inserted in the text. For this blog post, they relate to formatting phone numbers allowing dialling from a VOIP phone system.

Removing ABBR Tags

Accessibility requires spelling out abbreviations using the following ABBR tag pairs.

[abbr title="Bankwesengesetz"]BWG[/abbr]

In this specific case, in displayed posts, the abbreviation’s full form (text inside the title=” ” prior to the abbreviation) appears in a tooltip when hovering over the abbreviation between the tag pair.

I’ve recorded a macro in Notepad++ that does a search and replace for the following RegEx pair. In both cases the replace field is empty. With a macro like that I can execute it with a single keyboard shortcut, which can save a lot of time.

\[abbr.*?\]
\[\/abbr\]

I usually add the necessary abbreviations to the English version after translation in Trados from a file I have saved in Notepad++, and copy and paste the full file into the code view of the post in WordPress.

Removing LANG tags

LANG tags ensure that screen readers read words/phrases/sentences in a language other than the page language.

For example take the following sentence:

Article 38 of the Bankwesengesetz addresses banking secrecy requirements, commonly referred to in Austria as Bankgeheimnis.

A sample sentence showing an English sentence containing some German words.

The code view will show

Article 38 of the [lang title="DE"]Bankwesengesetz[/lang] addresses banking secrecy requirements, commonly referred to in Austria as [lang title="DE"]Bankgeheimnis[/lang].

To remove these tags, I perform a search and replace for the following two respective tags. The first one is used to select the tag before the words/phrases/sentences to be read by a screen reader in another language. The second tag selects the closing tag in the pair.

\[lang.*?\]
\[\/lang\] 

Removing proprietary tag pairs

The cited example removes the tags inserted to turn a telephone number, e.g. in a mail signature. The tag pair may be visible in the code view of the post text. Typically this is the case for the contact details of a media spokesperson in a press release. The tag pair’s purpose in this case is to allow a VOIP telephony system to dial a phone number. This may not work correctly, so it makes sense to remove the tag pair from the source code.

To do that, I use the following pair of entries in the search/replace function of Notepad++.

<avaya.*?>
</avaya.*?>

Further uses

There are endless uses in addition to the use cases above, One that I use quite often is to remove SPAN tags that appear in a post or page when copy-pasted out of MS Word. Typically, this is where someone has used the format painter, thereby creating some tag soup in the source text.

Why do I do this? SPAN tags can bloat the post/page code unnecessarily. This can prove disruptive for translating the text of a page/post.

Visited 55 times, 1 visit(s) today

Categories

Tags

Comments

2 responses to “How RegEx pairs help me to prepare WordPress content for translation”

  1. L avatar
    L

    Thanks for your blog post.
    I share your enthusiasm for regular expressions.

    On a similar note, I would like to add that Trados also offers some advanced possibilities when it comes to regular expressions:
    The versatile regex based text filter in Trados Studio…

    (I think there are similar options for other CAT tools.)

    1. t9natno5 avatar
      t9natno5

      Hi L! Good to hear from you – we met in Bonn in 2019 – that seems a long time ago! Paul Filkin’s blog has some great RegEx posts and one use I have is in the QA tool – I learned about it from a speaker from the Commission at ETUG 2023 – a really entertaining presentation and I took away a lot from it. Michael

Leave a Reply

Your email address will not be published. Required fields are marked *