【Python + tesseract】tesseract_layoutの選び方

2022年2月17日 16:55

TesseractをPythonで操作してOCRする

一連の流れはこちらのマガジンでどうぞ。自分メモ記事が多いので不親切ですが、座右の銘は「ないよりマシ」です。

from PIL import Image
import pyocr

function ocr(image_2d):
tools = pyocr.get_available_tools()
tool = tools[0]
image = Image.fromarray(image_2d)
       builder = pyocr.builders.TextBuilder(tesseract_layout=1)
       text = tool.image_to_string(image, lang="jpn", builder=builder)

ざっくりOCRする処理の流れはこんな感じなんですが、今回はこの

builder = pyocr.builders.TextBuilder(tesseract_layout=1)

ここに渡すtesseract_layoutの種類についてちゃんと確認します。

適当に設定してたら全然読み取り精度出ないんだもん。

設定可能なオプションの種類と内容を確認する

参考：https://kazu-oji.com/python-pyocr_base/#toc8

$ tesseract --help-extra

で、コマンドの一覧が出てきます。

その中の、「Page segmentation modes:」の部分がtesseract_layoutの引数に該当します。

 0    Orientation and script detection (OSD) only.
 1    Automatic page segmentation with OSD.
 2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
 3    Fully automatic page segmentation, but no OSD. (Default)
 4    Assume a single column of text of variable sizes.
 5    Assume a single uniform block of vertically aligned text.
 6    Assume a single uniform block of text.
 7    Treat the image as a single text line.
 8    Treat the image as a single word.
 9    Treat the image as a single word in a circle.
10    Treat the image as a single character.
11    Sparse text. Find as much text as possible in no particular order.
12    Sparse text with OSD.
13    Raw line. Treat the image as a single text line,
      bypassing hacks that are Tesseract-specific.

…………助けてDeepL先生ーーー

0 オリエンテーションとスクリプト検出（OSD）のみ。
1 自動ページ分割とOSD。
2 自動ページ分割、ただしOSD、またはOCRはなし。(未実装)
3 完全自動ページ分割、ただしOSDなし。(初期設定)
4 サイズの異なるテキストが1列に並んでいると仮定します。
5 縦書きの一様なテキストブロックを想定しています。
6 一様なテキストブロックを想定しています。
7 画像を1つのテキスト行として扱う。
8 画像を1つの単語として扱う。
9 画像を円内の単一単語として扱う。
10 画像を1つの文字として扱う。
11 疎なテキスト。できるだけ多くのテキストを順不同に探します。
12 OSDでテキストを疎にする。
13 生の行。画像を1つのテキスト行として扱います。
Tesseract 固有のハックを回避することができます。

（DeepLによる翻訳ママ）

「OSD」というのが「Prientation and script detection」つまり「オリエンテーションとスクリプト検出」を示しているようです。「オリエンテーション」がどの処理を示すのかはよく分かりませんが、「OSD」が要するに文字認識をするための前処理って事のようです。

あと、11番にある「疎な」っていうのは要するに「色が薄い」って意味らしい。

そして結局何れを試して見てもあちらを立てればこちらが立たずで頭を抱えています…………うーんうーん。

この記事が気に入ったらサポートをしてみませんか？