What do you need for your Localization business, a generic MT or a Custom MT?
A few months ago, I bought a new mountain bike; my other bike showed signs of aging after accompanying me since 2007.
When I bought my old bike, I went to the store and told the manager: I want a mountain bike, and a few minutes later I left the store with it. Cycling.
Easy and fast.
But it's incredible how much the MTB world has evolved lately.
This time, when I told the store manager I want a new bike, he started a round of questions that I didn't count on.
Do you want an aluminum or carbon frame?
V-brakes or disc brakes?
Traditional or tubeless wheels?
Manual or clipless pedals?
Single or double suspension?
And the list of custom items went on and on and on.
To come out with a suitable "machine" for me, I had to customize it since different users need different components.
Logical.
Customization is important, whether it's a bike or an MT ☺️
One of the most common mistakes I see when newbies enter the localization industry and take their first steps with MT is to use a generic MT engine.
Those results conclude whether MT is a good solution for their localization program or not.
But just as you can customize a mountain bike, you can (and should) customize an MT.
But how do you know if you need a custom MT or if on the other hand you can get by with a Generic MT?
In the following paragraphs, I'll give you some ideas that I hope will be useful to determine if you need a custom MT or if a generic MT will work for you.
Let's start with a little bit of history about what has happened in the MT world recently.
In the last few years, enterprises and tech LSPs started to fill a growing need in the localization industry. To improve the throughput and quality of content processed via MT.
This is when numerous companies started to produce their custom MTs based on TMs of their translated content. These custom MTs were gradually improving, but outside the content niche for which those MTs were evolved, their performance was quite poor.
LSPs marketed custom MTs for optimized content, for example, for product documentation, but then customers (buyers) implemented this custom MT on other types of content to optimize workflow (i.e., marketing documentation); that's when frustration set in for buyers.
The root problem is that custom engines are rather clumsy.
Although they can produce content of fairly acceptable quality for the niche in which they have been trained when those engines are run on other types of content, the quality drops a lot.
In fact, a generic MT can often beat a custom MT that has been trained on niche content.
What is a generic Machine Translation engine?
Generic engines are the most popularly known translation engines. For many people, especially outsiders in the localization industry, a generic machine translation engine is Google Translate. Google's engine is the one that usually catches the eye of the neophytes. But it is not the only popular engine. Microsoft Translate and Amazon Translate are also popular products.
These engines are competitively priced and, in many cases free versions. The good thing (or bad thing depending on how you look at it) is that these engines are not customized for a specific industry (or type of content) so they can be used in a wide variety of environments.
When are generic MT engines a good choice?
If you have some content in a language you don't understand, and you want to get a rough idea of what it says, Google Translate, Amazon or Microsoft will do the work.
For example, when you need to localize player support or user-generated content a generic MT will do. However, with a generic MT engine, there is a high chance of context errors or occasional grammatical errors. But if this is something you can afford and you are not aiming for excellent quality, a generic MT engine can be useful to include in your Localization workflow.
What is a Custom Machine Translation engine?
A custom MT engine is a machine translation system that has been trained for a specific field. Custom MT engines in localization verticals such as medicine or pharmaceuticals are highly recommended as they are sectors with a lower error tolerance rate than other sectors (e.g., gaming).
A custom machine translation engine will eventually produce a higher quality of content translation than a generic engine. The more you customize your machine translation engine, the sooner you will reach your desired quality levels.
Keep in mind that if you choose the custom machine translation path you will see the fruits in the long term. In the short one, you should be aware that you must invest time providing your engine with large volumes of source and already translated data relevant to the domain relevant to your organization.
These custom engines learn by failure and error, so this task must be carried out by a team that can teach the "machines" well.
When should you use a custom MT engine?
A custom MT engine shines when the content belongs to a specific industry (i.e., manufacturers) and there is a lot of content (millions) of words to translate with tight deadlines.
In that environment, a custom mt engine can put a (human) translator in a tight spot. In the example below, we can see how Microsoft's custom engine is starting to outperform even the human translator in certain localization verticals.
It is important to note that actual quality improvement depends on customer data quality, training dataset size, and domain coverage.
Sample domains translation quality BLEU score when using standard Translator, Custom Translator V1, and Custom Translator V2.
Examples of custom MT engines are Kantan MT, Yandex, or Globalese, to name a few.
Conclusion?
Just as not all bikes are the same, not all MT engines are the same. If your goal is to make cycling a little more serious, you will get the most out of a $3000 custom mountain bike; likewise, to get the most out of MT you'll need also some customization.
The first thing you will need to do is find the best engine, then train it and then find a way to integrate it with your TMS.
Good luck with your customization!
@yolocalizo
Related content
This blog explores how AI-driven post-editing frameworks, like ChatGPT, can redefine localization quality assurance by providing automated checks, real-time feedback, and prioritization of critical issues. If manual LQA feels impossible in your environment, this post will help you explore actionable AI solutions to bridge the gap.