Automatic Translation

Kototoro includes a reader-side OCR + translation workflow designed for practical Android use. This page explains what the feature does, how to set it up, and which choices matter first.

What This Feature Does

OCR happens inside the reading workflow
Translation can stay local-first
Model management is built into app settings
Results are rendered back into the page as an overlay

Start Here

Settings -> Reader -> Translation
Settings -> Reader -> Translation -> Manage models

End-to-End Flow

Detect text from page images.
Recognize text with the selected OCR engine and models.
Translate locally or through a configured API.
Render translated text back onto the page.

First-Time Setup

Open translation settings.
Enable translation.
Choose the OCR engine you want to start with.
Open Manage models.
Download the OCR and translation-related models you need.
Set source language and target language.
Choose local-first or API mode.

Main User Choices

OCR Engine

The current settings UI is centered around the most practical user-facing options:

Fast setup
Good starting point for lightweight use cases
Depends on device-side ML Kit support

Hybrid OCR is also available when you want a stronger model-backed workflow and are willing to download local assets.

Translation Mode

LOCAL_ONLY keeps the workflow device-local where possible.
LOCAL_FIRST prefers local processing and falls back to a configured API when needed.
API_ONLY is for users who want remote translation every time.

Languages

Set the source and target languages before tuning advanced behavior. This avoids chasing OCR or layout issues that are actually language mismatches.

Configurable Settings

Depending on the current build and selected mode, users can configure:

OCR engine
Detection / recognition model selection
Source language
Target language
API endpoint
API key
API provider preset
API model name
Bubble grouping behavior
Overlay compactness
Debug logging

Current Render Behavior

Recent work on the reader overlay has focused on a specific class of problems:

OCR text is complete
translation text is complete
final overlay text is still clipped, offset, or visually inconsistent across nearby regions

The current render path now behaves more defensibly than earlier builds:

single-region overlays are solved through a unified fit pass instead of ad-hoc text-size retries
horizontal and vertical text are drawn from measured layout bounds, which reduces the classic "background box looks right but text is shifted left/right" failure
local expansion is applied around the current text region when the first fit still overflows
multi-region bubbles can be rendered through a conservative segmented flow instead of blindly stretching one merged rectangle across unrelated OCR fragments
heavily overlapping sparse bubbles are filtered before drawing, which reduces obviously wrong large blank overlays caused by tiny text such as page numbers or decoration

This does not mean every edge case is solved. The hardest failures now usually come from:

incorrect merge quality upstream
unstable source content rects from OCR grouping
detector regions that are valid for recognition but still awkward for translated overlay geometry
very dense mixed-layout bubbles where several small regions compete inside one large speech area

In practical terms, the main priority of the current branch is:

text

show the whole translation first,
then keep font size and box size visually reasonable.

That tradeoff is intentional.

Model Management

The model management screen is used to inspect and download built-in OCR and ONNX-related models.

Current categories include:

Recognition models
Detection models
Classic translation models
General LLM models
Bubble detector models

Downloaded status is shown per model, and missing models can be downloaded on demand from inside the app.

Practical Recommendations

Start simple: enable translation, choose an OCR engine, and download only the models you need.
Use LOCAL_FIRST if you want local behavior without losing a remote fallback path.
Test on a few representative pages before tuning compactness or bubble grouping.
If you depend on offline use, complete model downloads before travel or unstable-network use.

Common Problems

The translation overlay does not appear

Confirm that translation is enabled.
Confirm source and target languages are set correctly.
Confirm the required local models were downloaded.

The model list is visible but downloads fail

Check network connectivity.
Check available storage.
Reopen the screen and retry the download.

API translation is not working

Confirm endpoint, API key, and model name.
If model discovery fails, enter the model name manually.
Prefer LOCAL_FIRST while validating a new remote provider.

The overlay background is correct but some translated text is still clipped

This is now more likely to be a layout-fit or source-rect problem than a missing translation problem.
Small residual clipping can still happen when local expansion hits page bounds or when the grouped source regions are already inconsistent.
If debug logging is enabled, compare source rect, content rect, prepared rect, and final content area before assuming OCR dropped text.

The same speech bubble contains normal text in one region and oversized text in another

This usually points to fragment grouping quality rather than translation quality.
The current build reduces aggressive short-text auto-enlargement and keeps segmented rendering conservative, but a bad merge upstream can still produce uneven visual weight.

Advanced Notes

The project contains multiple OCR runtime paths internally, including model-backed pipelines for users who need stronger offline or hybrid workflows. The exact user-facing choices may evolve across builds, but model management remains the entry point for downloading the required local assets.

Automatic Translation ​

What This Feature Does ​

Start Here ​

End-to-End Flow ​

First-Time Setup ​

Main User Choices ​

OCR Engine ​

Translation Mode ​

Languages ​

Configurable Settings ​

Current Render Behavior ​

Model Management ​

Practical Recommendations ​

Common Problems ​

The translation overlay does not appear ​

The model list is visible but downloads fail ​

API translation is not working ​

The overlay background is correct but some translated text is still clipped ​

The same speech bubble contains normal text in one region and oversized text in another ​

Advanced Notes ​

Related Documents ​