MENU

Carruse

viernes, 18 de enero de 2019

Converting TMX into bilingual Excel file without special tools

Not all translators have all of the complex software tools available. Imagine you have a TMX file and you need to extract its contents into a bilingual Excel file. This article describes how to perform this ?data digging? with Microsoft Word and Unicode text editor.

Pre-requisites:

Unicode text editor, such as Notepad 2, PS Pad, or similar
Microsoft Word
Microsoft Excel
Procedure:

Open the TMX file in Unicode editor.
Select all content and copy the same into a new Word file.
Select all text in Word file (Ctrl+A) and set it as hidden (Ctrl+D, select Hidden checkbox).
Press Ctrl-H (Search and Replace) and click the More button.
Select the Use wildcards checkbox.
Enter ?\<seg\>*\</seg\>? (without double quotes) into Find input field.
Click into the ?Replace with? field and leave it empty. Click Format button and then select the Font option. Uncheck the Hidden checkbox). Click Replace All button.
Press Ctrl-H (Search and Replace).
Uncheck the Use wildcards checkbox.
Enter ?<seg>? (without double quotes) into Find input field.
Click the ?Replace with? field and enter ?<seg>?. Click the More button, click Format button and then select the Font option. Check the Hidden checkbox). Click Replace All button.
Press Ctrl-H (Search and Replace). Keep the Use wildcards checkbox unchecked.
Enter ?</seg>? (without double quotes) into Find input field.
Click the ?Replace with? field and enter ?</seg>^p?. Click the More button, click Format button and then select the Font option. Uncheck the Hidden checkbox). Click Replace All button.
Enter ?</seg>? (without double quotes) into Find input field.
Click the ?Replace with? field and enter ?</seg>?. Click the More button, click Format button and then select the Font option. Check the Hidden checkbox). Click Replace All button.
Save the Word file.
Now we have a Word file, where all texts other than source and target are hidden.

Copy all text in the Word file and paste into a new text document in Unicode text editor.
Save a new text file.
Open Microsoft Excel.
Open the text file you have just created in Excel.
Import the TXT following screen instructions. On pages 1 and 2 of the import wizard, click Next button. On the last page, select the Text radio button and click the Finish button.
Now we have an Excel file with our source and translation text. However, pairs are arranged into a single column. As the last step, we need to transpose every second row into a column. For this, we will use the following macro in Excel.

Copy the below code.
Press the keys ALT + F11 to open the Visual Basic Editor.
Press the keys ALT + I to activate the Insert menu.
Press M to insert a Standard Module.
Paste the code by pressing the keys CTRL + V.
Press the keys ALT + Q to exit the Editor, and return to Excel.
To run the macro from Excel, press ALT + F8 to display the Run Macro Dialog. Double Click the macro’s name to Run it.
Important:

You have to change the sheet name in the code below to whatever your sheet name is. Example: If your sheet name is ?List1? (instead of default Sheet1), you need to replace With Sheets(“Sheet1”) with With Sheets(“List1”).

WE ARE DONE. Now we have a bilingual Excel file created from source TMX.

Here is the code:

Option Explicit
Sub ReorgData()
Dim a As Variant, b As Variant
Dim lr As Long, lc As Long
Dim i As Long, ii As Long, c As Long, nc As Long
With Sheets("Sheet1")
lr = .Cells.Find("*", , xlValues, xlWhole, xlByRows, xlPrevious, False).Row
lc = .Cells.Find("*", , xlValues, xlWhole, xlByColumns, xlPrevious, False).Column
a = .Range(.Cells(1, 1), .Cells(lr, lc))
ReDim b(1 To UBound(a, 1), 1 To UBound(a, 2) * 2)
For i = 1 To UBound(a, 1) Step 2
ii = ii + 1
nc = 1
For c = 1 To UBound(a, 2)
b(ii, nc) = a(i, c)
b(ii, nc + 1) = a(i + 1, c)
nc = nc + 2
Next c
Next i
.Range(.Cells(1, 1), .Cells(lr, lc)).ClearContents
.Cells(1, 1).Resize(UBound(b, 1), UBound(b, 2)) = b
.Columns.AutoFit
End With
End Sub