Skip to content
machinewrapped edited this page May 24, 2024 · 18 revisions

Welcome to the gpt-subtrans wiki!

How to use gpt-subtrans

The easiest and most flexible way to use gpt-subtrans is with the GUI, gui-subtrans. See the readme for installation and setup instructions.

How does it work?

gpt-subtrans takes subtitles in a source language, divides them up into batches (grouped into scenes) and politely asks an AI Language Model to translate them to the target language.

It then attempts to extract the translated lines from the response and map them to the source lines. Basic validation is applied to the translation to check for some of the errors AI tends to introduce, and if any are found a reply is sent noting the issues and asking it to try again. The AI is usually able to correct its mistakes when they are pointed out.

Each batch is treated as a new conversation, with some details condensed from preceding batches to try to help the AI understand the context of the lines it is translating. Additional context for the translation can be provided via project options, e.g. a short synopsis of the film and names of major characters.

Requests are constructed from the file instructions.txt (unless overridden), and have been tweaked to minimise the chances of the AI desyncing or going off the rails and inventing its own plot. The prompts use an xml-like syntax that helps the AI understand the request and provide a structured response.

How good is it?

Modern LLMs like ChatGPT are able to follow context across a scene, which marks a significant advance over earlier machine-translation tools. However, it has not actually watched the film so the result is still likely to fall short of what a human translator would produce. The intention is not to replace translators but to open up accessibility to content that is unlikely to receive a professional translation. The goal is a translation that is good enough to follow the dialogue, but corrections will usually be needed if you intend to share the translation more widely.

Since the AI does not actually watch the video it is wholly reliant on the quality of the source subtitles, and will still make mistakes such as misunderstanding who is speaking a particular line, or misinterpret a remark that needs visual context to be understood.

I recommend passing the output through Subtitle Edit's "Fix Common Errors" function, which can automatically fix things like line breaks and short durations to make the subtitles more readable.

How long does it take?

The answer is definitely "it depends". Firstly, if you are using a free OpenAI trial account then requests are severely restricted (three per minute at the time of writing). Each batch of subtitles is one request, as are any retranslations or retries, so it can take multiple hours to translate a full movie.

Paid OpenAI accounts are much less restricted, though for the first 48 hours after signing up the rate limit is still somewhat restricted (after that there is still a limit, but you are very unlikely to hit it with gpt-subtrans). Gemini runs at a similar speed, but you may want to set a rate limit to remain in the free tier (currently 15 requests per minute). Claude Haiku runs fast but can hit API rate limits quickly.

The translation time can vary considerably depending on the size of a request. Requests to translate larger batches take exponentially longer to process, but fewer requests are required to process an entire file so there is a "sweet spot" but no easy way to predict what it is. Additionally, larger batches are more likely to encounter errors that require the request to be tried again, obviously at least doubling the processing time. Experimenting with different batch sizes is encouraged to find settings that work well for your use case.

Whilst batches in a scene are always processed sequentially, multiple scenes can be processed in parallel (this can currently only be done in the GUI). This dramatically decreases the total processing time, often being bound only by the length of the longest scene. This should not be attempted if you are on a rate-limited trial plan.

How much does it cost?

Cost depends on:

  • Which translation provider is used
  • Which model is used as the translator
  • How many lines of subtitles?
  • How many batch requests?
  • How many retries are needed?

Translating an average movie of up to 1,000 lines of dialogue with gpt-3.5-turbo this should cost between $0.10 and $0.50. With GPT4 the cost is approximately ten times higher. Gemini is free up until a certain request rate that used to be a lot more generous but is currently 15 requests per minute, or 2 RPM for Gemini 1.5 Pro. Anthropic Claude models are priced per million tokens, which should be enough to translate several movies - keep in mind that translation produces a lot of Output Tokens, which are more expensive than prompt tokens. The Haiku model has similar costs to gpt-3.5-turbo.

Is it safe?

All subtitles need to be sent to the translation provider's servers for processing, so don't use the app to translate anything you wouldn't want to be used in future training data for their models.

I can make no guarantees about the accuracy or suitability of the translations produced, and I can't guarantee that the app won't do something catastrophically stupid with your data so please make sure you have backups. This is not a commercial product and does not come with any sort of warranty or guarantees.

Which provider should I use?

The app was developed to interface with OpenAI GPT models, so that is the most tested provider by a long way and is a sensible default choice.

The Google Gemini API has expanded to most geographic regions but please note that there is no free usage tier in the EU. It tends to produce the best translations (possibly beaten by GPT4-o) and handles poor quality source material well (e.g. subtitles transcribed with Whisper) so it is a good option, though it quite often produces translations with errors that prevent parsing. With the lower free usage levels it is a less attractive option than it once was.

Anthropic Claude Haiku produces similar translations to Gemini 1.0 Pro but with more errors, it is perhaps a good option if Gemini is unavailable in your region and/or you don't want to use GPT.

Which model should I use?

For Open AI the gpt-3.5-turbo model is probably the best choice as of writing, as it is fast and cheap and supports a large batch size (around 80-100 lines for average movie dialogue).

The gpt-instruct models are fine-tuned for following instructions rather than conversation. In theory this can produce better translations, but in practice the difference is mixed - some lines come out better, some worse. It is better at using provided names consistently. Since it has a smaller token limit and higher per-token cost, I don't recommend it as a default.

gpt-4 is better at understanding and summarizing content and has a better grasp of nuance and tone, so it may produce better translations for some material, and it is much better at following specific instructions. In most cases the difference is likely to be small though, and not always positive, so at ten times the cost it is probably better to save this option for selective retranslation when a cheaper, faster model has not done a good enough job.

The new gpt-4o model looks very promising, with excellent multilingual benchmark scores and lower costs than previous GPT-4 models, and is likely to be the new best choice for some language pairs.

For Google Gemini the Gemini 1.0 Pro model provides very good results at low cost so it is not clear that there is a good reason to use Gemini 1.5 Pro with its much lower rate limits and higher costs. However, the new Gemini 1.5 Flash model seems very promising - it costs less than 1.0 Pro and handles larger batches easily, and seems good at following instructions. It's also very fast (set a rate limit if you want to stay in the free use tier).

For Anthropic Claude the Haiku model does a good job at a much lower price than the larger Sonnet and Opus models, so those should probably be reserved for selective retranslation of batches where Haiku failed.

Instructions

The instructions tell the AI what its task is and how it should approach the task. The instructions are sent as a system message. You've probably heard of prompt engineering - this is where it happens.

Several sets of instructions are provided, but the default instructions.txt should work for most use cases.

If you have specific needs or the source material has special characteristics, consider editing the instructions to guide the model. Instructions can be edited on a per-project basis, but if you are likely to use them with multiple source files consider saving them as instructions (something).txt in the same location as instructions.txt. They will then become available to select in the project or global settings.

Keep in mind that LLMs work best when they have an example of how they should respond.

A couple of examples of custom instructions are included, e.g. "improve quality" that can be used to fix spelling and grammatical errors without changing the language (it's important to modify the prompt as well in this case).

Prompt

This is the instruction given as the user at the start of each batch. The default prompt should work for most cases:

  • Please translate these subtitles[ for movie][ to language].

The for movie and to language tags are automatically replaced with the name and target language specified for the project. Settings will also be replaced with their value, e.g. you may want to prompt the model in the target language to focus it:

  • Per favore, traduci questi sottotitoli per [movie_name] in [target_language]

Description

This is a per-project setting where you can provide some context about the source material to help guide the translator. If you add a synopsis, keep it short (one or two sentences) otherwise the AI can start using it to make up its own plot rather than following the dialog. A short description does encourage it to produce batch and scene summaries, which are useful for giving each batch context about what came before, and it may help to indicate things like that the film is a comedy.

Names

In theory, providing a list of names should help the translator use names consistently in the translation. In practice, it depends on the AI model whether this is used. gpt-3.5 is not really smart enough to understand when the names should be used, though this is one area where the turbo-instruct model shows noticeably better performance. gpt-4 is, predictably, better still and most likely to use the provided names as given.

Substitutions

If you have certain terms (like character names) that should have a specific translation you can add a substitution to replace specific text in either the input or the translation.

In practice you will probably need to have translated the material to know that a substitution would have been helpful, in which case you should probably just do a search and replace on the output rather than retranslate the whole thing, so this is most likely to be useful when translating multiple episodes of a series where the same substitutions will be relevant.

What do the other options do?

Scene threshold

Subtitles will be divided into scenes whenever a gap longer than this threshold is encountered, regardless of the number of lines. An appropriate threshold depends on the source material, as a rule of thumb aim for a threshold that produces around 10 scenes for a feature film as this will maximize the context available to later batches.

Maximum batch size

It is important to set a maximum batch size that is suitable for the model. There isn't a hard answer for the maximum size as it depends on the content of the subtitles, but empirically speaking these batch sizes work well:

  • gpt-3.5-turbo: 60-120 lines (larger batches are much more likely to have desyncs)
  • gpt-4 : 60-70 lines (despite the large context size this model is limited to 4K output tokens)
  • gpt-4-turbo : currently unknown, please share your findings!
  • gpt-4o : 100+ lines (upper limit unknown)
  • gemini 1.0 pro : 50-60 lines
  • gemini 1.5 flash: 90+ lines (upper limit unknown)
  • claude haiku : currently unknown, please share your findings!

Keep in mind that if batches are too large for the model there is a higher chance of errors that require a retry, which is self-defeating.

Minimum batch size

This is less important as batches over the maximum size are divided at the longest gap to encourage coherence, but specifying a minimum size may prevent an excessive number of small batches if the source has sections of sparse dialogue. Around 8-10 lines is probably a good balance.

Allow retranslations

LLMs can be unpredictable, and sometimes returns a response that can't be processed or that fails to complete the task. Some simple validations are performed on the response, and if any of them fail the batch will be retried if this setting is enabled. It is strongly advised.

Enforce line parity

This validates that the number of subtitles in the translation matches the number of lines in the source text. Some models have a tendency to desync when a single sentence spans multiple lines, so this check will catch that and trigger a retranslation request. When the problem is pointed out the AI can usually fix it, so it is strongly advised to enable this check (note: this seems to be less effective than it used to be, new mitigation strategies are being worked on).

Max characters

Any translated line that exceeds this number of characters will cause the batch to be sent back for retranslation. There is no guarantee that the result will respect the limit though, and if the source material has a lot of long lines it could just result in a lot of pointless retries, so consider raising the value to something that suggests a total failure (e.g. 200 characters) if you see this happening a lot in the logs.

Max newlines

More than 1 or 2 newlines generally indicates that the response was not in the correct format and multiple lines have been grouped together, so the batch will be retried. If the source material has lines that are longer it might be expected that the translation also will though, so raise if necessary.

Max threads

It is usually best to process a translation sequentially (the "Play" button on the toolbar) since this means a summary of the previous scene will be available to pass as context when translating the next scene. If you're in a hurry and are not too worried about translation accuracy you can process multiple scenes in parallel (the "Fast Forward" button). 3 or 4 parallel translations is usually enough, there will probably be one or two long scenes that are the bottleneck and one or two extra threads will be able to finish the rest before they are completed.

How can I help?

Report any issues here on github and I'll do my best to investigate them. Even better, fix them yourself and create a pull request!

If there are any features or changes that would make the app easier to use or more effective the same logic applies.

Spread the word if you find gpt-subtrans helpful, or at least unhelpful in interesting ways.