OCR PDF in C# | .NET OCR Library

Overview

The Syncfusion .NET optical character recognition (OCR) library is used to extract text from scanned PDFs and images. With just a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR library uses the powerful Tesseract OCR engine.

The OCR feature works seamlessly across platforms, including Windows, macOS, and Linux, through any .NET-based applications, such as ASP.NET Core, ASP.NET MVC, Blazor, WinForms, WPF, and WinUI.

Convert scanned PDF to a searchable PDF in C#

The below code demonstrates how to convert scanned PDF to a searchable PDF.

c#
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;

// Initialize the OCR processor
using (OCRProcessor processor = new OCRProcessor())
{
    // Load the existing PDF document
    using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
    {    
      PdfLoadedDocument pdfLoadedDocument = new PdfLoadedDocument(stream);
      // Set OCR language to process
      processor.Settings.Language = Languages.English;
      // Process OCR by providing the PDF document
      processor.PerformOCR(pdfLoadedDocument);
      // Save the OCRed document
      using (FileStream outputFileStream = new FileStream(@"Output.pdf", FileMode.Create, FileAccess.ReadWrite))
      {
        //Save the PDF document to file stream.
        pdfLoadedDocument.Save(outputFileStream);
      }
      //Close the document.
      pdfLoadedDocument.Close(true);
    }    
}

Key features of the OCR library

Discover the features of our OCR processor library to enhance text extraction, language recognition, and document processing for seamless integration.

Create a searchable PDF in .NET.

Create a searchable PDF

Perform OCR on an entire scanned PDF document and convert it into a searchable PDF document.

Create a searchable PDF

Extract text from an image in .NET PDF.

Extract text from an image

Extract text from a single scanned image or multi-page TIFF images.

Extract text from an image

Perform OCR on a rotated page in .NET PDF.

Perform OCR on a rotated page

Extract the text from a scanned, rotated page of a PDF document and convert it to a searchable PDF document.

Perform OCR on a rotated page

Convert image to searchable PDF/A in .NET.

Convert image to searchable PDF/A

Make images searchable and selectable by converting them to PDF or PDF/A document using OCR.

OCR on image to PDF/A document

Zonal text extraction in .NET PDF.

Zonal text extraction

Extract data from PDFs and images by restricting OCR to a particular region in a PDF or image.

Zonal text extraction

Post-processing in .NET PDF.

Post-processing

After performing OCR, you can programmatically highlight, underline, and strike through the text of the resulting PDF document. You can also redact, edit, and digitally sign the PDF document.

Post-processing documentation

Explore references for the .NET OCR library

Discover valuable resources from our blog and knowledge base on using the OCR library.

Blog

Easiest Way to OCR Process PDF Documents in ASP.NET Core

Read Blog

Blog

Optical Character Recognition (OCR) Made Easy with the .NET PDF Library in C#

Read Blog

Blog

OCR in .NET MAUI: Building an Image Processing Application

Read Blog

Documentation

Perform OCR in Linux

Read Documentation

Documentation

Get image rotation from OCR processor

Read Documentation

Documentation

Perform OCR with different OCR engine mode

Read Documentation

Syncfusion .NET PDF Library Resources

Explore these resources for comprehensive guides, knowledge base articles, insightful blogs, and ebooks.

Learning

Product Updates

Technical Support

Our comprehensive competitor comparison of PDF framework will guide you to the perfect choice.

20+ Conversions support

50+ interactive demos

1.7M+ downloads

Explore Complete PDF Comparison

Frequently Asked Questions

What is OCR?

OCR stands for optical character recognition. It is a technology used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data.

What types of documents can OCR process?

OCR can process various types of documents, including scanned paper documents, PDF files, images, screenshots, and photographs containing printed or handwritten text.

How does OCR work for PDF and images?

The OCR library analyzes the shapes, patterns, and structures within an image or PDF file to identify and extract text. It then converts this extracted text into a format that can be edited, searched, and manipulated.

What are the benefits of using OCR on PDF documents and images?

OCR lets you convert non-editable documents, such as scanned images or PDF files, into editable and searchable formats. This allows for easier document management, text extraction, content indexing, and accessibility improvements.

Our Customers Love Us

Having an excellent set of tools and a great support team, Syncfusion reduces customers’ development time.
Here are some of their experiences.

What a Deal….Syncfusion Essential Studio.

With very few lines of code I can generate Excel, Word or PDFs. I can customize the look and feel of the CRUD presentation easily. I am looking forward to trying out the other control suites that Syncfusion provides.

Ibrahim M,

Contractor and CEO

Syncfusion Essential Studio Review

We use Syncfusion Essential Studio for RAD purpose. It has been used for Blazor and .NET Core Razor UI implementations. It really saved time on Web UI development. Additionally PDF & EXCEL components saved time in back-end development as well.

Jaish Mathews,

Chief Applications Architect

Rated by users across the globe

4.5/5

(500+ Reviews)

Want to create, view, and edit PDF files in C# or VB.NET?

Start a free 30-day evaluation today!

DOWNLOAD FREE TRIAL

No credit card required.

Awards

Greatness—it’s one thing to say you have it, but it means more when others recognize it. Syncfusion is proud to hold the following industry awards.

OCR Library for PDF and Images in C#

No credit card required.

No credit card required.

Syncfusion is trusted by the world’s leading companies

Overview

Convert scanned PDF to a searchable PDF in C#

Key features of the OCR library

Create a searchable PDF

Extract text from an image

Perform OCR on a rotated page

Convert image to searchable PDF/A

Zonal text extraction

Post-processing

Explore references for the .NET OCR library

Easiest Way to OCR Process PDF Documents in ASP.NET Core

Optical Character Recognition (OCR) Made Easy with the .NET PDF Library in C#

OCR in .NET MAUI: Building an Image Processing Application

Perform OCR in Linux

Get image rotation from OCR processor

Perform OCR with different OCR engine mode

Frequently Asked Questions

What is OCR?

What types of documents can OCR process?

How does OCR work for PDF and images?

What are the benefits of using OCR on PDF documents and images?

Our Customers Love Us

Rated by users across the globe

Want to create, view, and edit PDF files in C# or VB.NET?

No credit card required.

Awards

CONTACT US