よちよちAI[AI翻訳以前に大敗北編]

2024年5月8日 02:28

こんなトレンドが流れてきたよ

アニメーターや漫画家よりも先に滅ぶのが翻訳業なんですよね。

AI翻訳の間違いを校正できる人材を育成する環境がなくなるので、今できる人が引退したら終わりです。😑 https://t.co/HoEgA1YLWW
— Kataho@フランクフルト🌭 (@sakaikataho) May 6, 2024

できらぁ！

というわけで、まずは英訳の前に内容読み取りに挑戦してみた。題材はこれ

（いきなり難易度MAX）

まずは扉絵の第1話のタイトルを読んでみる。（画像は転載できないのでオフィシャルから見てね）

ふふふ、こちらには日本語OCRというウラワザがあるのだよ

# Since we previously encountered an error, I will reattempt the OCR, ensuring to reference the correct language files
# First, we'll need to update the configuration to use the Japanese training data provided.

# Update Tesseract to use the custom Japanese training data path
custom_config = r'--tessdata-dir /mnt/data/'

# Reattempt the OCR with the specified custom configuration
text_with_custom_data = pytesseract.image_to_string(img, lang='jpn', config=custom_config)
text_with_custom_data

Result
''

彼の名誉のために、OCR読み取りの試行自体はできているようです。普通の横書きPDFからの画像とかだとそこそこの精度で読めます。ただ今回は読めてない。

では次に、何の絵が描かれているか読んでもらおう