How to turn a Scanned pdf into a text searchable pdf in Ubuntu – Install OCRmyPDF on Ubuntu

Post last modified:August 19, 2024
Post category:Linux
Post author:Manikandan
Post comments:0 Comments

OCRmyPDF is a free and open source OCR (Optical character recognition) application for Linux. It is released under GNU General Public License v3.0 and written in python. It adds an OCR text layer to your scanned PDF files and it allows you to search the pdf text and you can also copy paste the text. Using OCRmyPDF you can convert scanned pdf to a text searchable pdf. Some of its features are Keeping the exact resolution original images in output and validates input and output pdf files. It Uses Tesseract OCR engine to recognize the pdf languages. It support more than 100 languages.

Install OCRmyPDF on Ubuntu

You can install OCRmyPDF via the below command on Ubuntu. Open your terminal application (ctrl+alt+t) and run this command.

sudo apt update

sudo apt install ocrmypdf

Enter your Ubuntu user password if needed.

Install addition language pack:

In terminal run this command to show the list of all available tesseract language packs.

sudo apt-cache search tesseract-ocr

From the list if you want to install the Tamil language pack, then run this command.

sudo apt install tesseract-ocr-tam

Convert Scanned PDF to Text Searchable PDF:

Syntax:

ocrmypdf input.pdf output.pdf

Replace input.pdf with your scanned file name and output.pdf with your new file name.

Also Read: How to install setup NordVPN for Linux on Ubuntu

Example: First go to your scanned pdf folder. If your scanned file is in your Downloads folder. then in terminal.

cd Downloads

Then run

ocrmypdf scanned.pdf newgeneratedfilename.pdf

Here the “scanned.pdf” is a my pdf file name which is in Downloads folder. After running this command. It will create new “newgeneratedfilename.pdf” file in the same Downloads folder. Now open the new file and it will be searchable and you can also copy paste the text.