www.loctimize.com Loctimize GmbH
Localizing dynamic websites
created from open source
content management systems
memoQfest 2012, May 10, 2012, Budapest
Daniel Zielinski Martin Beuster
2
Agenda
• Open source content management systems • The localization challenges
• General localization strategies • Conclusions
3
Open source content management systems
4
Challenges
© 2012 Loctimize – All rights reserved
Extract content Prepare content Translate content Integrate translated content Test localization Fix bugs Publish localized website Identify content Create / update content
5
Identify content - Database
• Most of the content is stored in databases • Databases are made up of related tables
• The tables are made up of rows and columns
• The fields contain the content in different formats (Text, HTML, XML, proprietary format) and
• Metadata used for identifying/filtering the relevant content
translate = 0 language = 2 published = 1
6
Identify content - Database
© 2012 Loctimize – All rights reserved
HTML content Text
7
Identify content - File system
© 2012 Loctimize – All rights reserved
• Template files (HTML, CSS, JPG, PNG, GIF)
• Configuration files (INI, PHP, PROPERTIES, TXT…)
• Localization files (XLIFF, XML, PHP…)
8
Identify content - Template files (HTML)
© 2012 Loctimize – All rights reserved
Translatable content?
9
Identify content - Configuration files – INI Files
• Some of the content is stored in INI files. • It is stored in key-value pairs.
10
Identify content - Configuration files – PHP Files
• Some of the content is stored in PHP files • It is stored in key-value pairs or arrays
11
Identify content - Localization files - XML
© 2012 Loctimize – All rights reserved
UI strings
IDs
Language groups
Extract content – Database
• Manually by copying • Available extensions
– that understand the I18N/L10N logic of the CMS – that extract and export into a translatable
exchange format
• Develop scripts and exchange formats
– to extract and export the content into a translatable exchange format
Extract content – Database
• Joomla! Joom!Fish Plus, Jolomea (XML, XLIFF, PO)
• TYPO3 Localization Manager (XML)
• Drupal i18n, Translation Management, (XML, XLIFF)
• Wordpress Easy Translator Pro (PO) • Wordpress WPML (XLIFF)
14
Extract content – Database
© 2012 Loctimize – All rights reserved
Meta data
Page content
15
Extract content – Database
Extract content – Files
• Copy files
• Know the file structure of the CMS • FTP access
• Access to CMS backend with appropriate rights
17
Automate workflow?
© 2012 Loctimize – All rights reserved
• Use content connector and/or API to pass on the localisable content to memoQ.
Prepare files
• Defining non-translatable content
– Add additional tags – Defining filter settings
• XML filter • HTML filter
• RegEx text filter • Cascading filters • RegEx tagger
• Joining files
Translate content
• Lack of context
– Translation of content deltas (updates)
– Translation without visual information (XML, INI)
• Placeholders like %1, $2, {1}, $VAR, \n, \t
20
Translate content - HTML
• HTML files are added to memoQ using the standard filter.
• Tags and attributes can not be configured (localized hyperlinks).
21
Translate content - HTML
© 2012 Loctimize – All rights reserved
Preview
Lookup results Editor
Translate content – XML files
• Add the XML files to memoQ using a pre-defined XML filter (and a cascading HTML/RegEx text
filter).
• Content is grouped by page • Source URL in comments field
23
Translate content - XML
© 2012 Loctimize – All rights reserved
Preview
Lookup results Editor
24
Translate content – INI files
• Add the INI files to memoQ using a Regex text filter and a cascading HTML filter.
• The Regex text filter defines paragraphs as
25
Post-processing translated content – Convert to SQL
• Using a script the HTML files are converted to SQL files.
• The IDs extracted from the tags in the HTML are used to update the correct rows.
Integrate localised content
• Manually by copying & pasting • Available extensions
– that understand the I18N/L10N logic of the CMS – that import the localized content
• Develop scripts
– to import the localized content
27
Importing localised content - Database
© 2012 Loctimize – All rights reserved
• Preview links with login information • Overwrite mode
Importing localised content - Database
• The translated SQL file is imported into the database.
• The table rows are updated with the translated content along with other settings.
– Original text – Modified date – Published flag – Hashed value
29
Importing localised content – INI files
• Translated INI files are exported from memoQ • These are stored in the appropriate folders on
Automate workflow?
• Watch export folders and use CMS API/script to import localized files
31
Test localised content
• Find the localised content in the website (frontend)
• Proof-read • Layout check
32
Fix localization bugs
• Find where content to be updated came from • Update content in CMS
• Update bilingual files and/or translation memory
• Modify stylesheets (CSS) • …
33
Publish localized page
© 2012 Loctimize – All rights reserved
34
Conclusion
• Complex processes
• Interaction of a lot of people • No standard procedures
• Need to develop processes and tools
• Risk of loosing/missing data when trying to mimic CMS core functionality
35
Conclusion
Translation Service Provider
• Time consuming
• Scoping is a non trivial first step
• Expertise in CMS, web technology, databases • Develop tools
• Educate client and web developers
• Sponsor development!
Client
• Choose CMS wisely • I18N & L10N strategy • Expect additional costs
for localisation
engineering and/or development
• Time consuming!