We use cookies to give you the best experience on our website. If you continue to browse, then you agree to our privacy policy and cookie policy. Image for the cookie policy date
Syncfusion Feedback

Syncfusion is trusted by the world’s leading companies

Syncfusion Trusted Companies

Overview

The Syncfusion .NET optical character recognition (OCR) library is used to extract text from scanned PDFs and images. With just a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR library uses the powerful Tesseract OCR engine.

The OCR feature works seamlessly across platforms, including Windows, macOS, and Linux, through any .NET-based applications, such as ASP.NET Core, ASP.NET MVC, Blazor, WinForms, WPF, and WinUI.

Convert scanned PDF to a searchable PDF in C#

The below code demonstrates how to convert scanned PDF to a searchable PDF.

using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;

// Initialize the OCR processor
using (OCRProcessor processor = new OCRProcessor())
{
    // Load the existing PDF document
    using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
    {    
      PdfLoadedDocument pdfLoadedDocument = new PdfLoadedDocument(stream);
      // Set OCR language to process
      processor.Settings.Language = Languages.English;
      // Process OCR by providing the PDF document
      processor.PerformOCR(pdfLoadedDocument);
      // Save the OCRed document
      using (FileStream outputFileStream = new FileStream(@"Output.pdf", FileMode.Create, FileAccess.ReadWrite))
      {
        //Save the PDF document to file stream.
        pdfLoadedDocument.Save(outputFileStream);
      }
      //Close the document.
      pdfLoadedDocument.Close(true);
    }    
}

Key features of the OCR library

Discover the features of our OCR processor library to enhance text extraction, language recognition, and document processing for seamless integration.

Create a searchable PDF in .NET.

Create a searchable PDF

Perform OCR on an entire scanned PDF document and convert it into a searchable PDF document.

Extract text from an image in .NET PDF.

Extract text from an image

Extract text from a single scanned image or multi-page TIFF images.

Perform OCR on a rotated page in .NET PDF.

Perform OCR on a rotated page

Extract the text from a scanned, rotated page of a PDF document and convert it to a searchable PDF document.

Convert image to searchable PDF/A in .NET.

Convert image to searchable PDF/A

Make images searchable and selectable by converting them to PDF or PDF/A document using OCR.

Zonal text extraction in .NET PDF.

Zonal text extraction

Extract data from PDFs and images by restricting OCR to a particular region in a PDF or image.

Post-processing in .NET PDF.

Post-processing

After performing OCR, you can programmatically highlight, underline, and strike through the text of the resulting PDF document. You can also redact, edit, and digitally sign the PDF document.

Explore references for the .NET OCR library

Discover valuable resources from our blog and knowledge base on using the OCR library.

Easiest Way to OCR Process PDF Documents in ASP.NET Core

Blog

Easiest Way to OCR Process PDF Documents in ASP.NET Core

Optical Character Recognition (OCR) Made Easy with the .NET PDF Library in C#

Blog

Optical Character Recognition (OCR) Made Easy with the .NET PDF Library in C#

OCR in .NET MAUI Building an Image Processing Application

Blog

OCR in .NET MAUI: Building an Image Processing Application

Perform OCR in linux

Documentation

Perform OCR in Linux

Get image rotation from OCR processor

Documentation

Get image rotation from OCR processor

Perform OCR with different OCR engine mode

Documentation

Perform OCR with different OCR engine mode

Syncfusion .NET PDF Library Resources

Explore these resources for comprehensive guides, knowledge base articles, insightful blogs, and ebooks.

Struggling to decide on the right product?

Our comprehensive competitor comparison of PDF framework will guide you to the perfect choice.

tick-mark 20+ Conversions support
tick-mark 50+ interactive demos
tick-mark 1.7M+ downloads
competitive-banner-FT-image

Frequently Asked Questions

OCR stands for optical character recognition. It is a technology used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data.

OCR can process various types of documents, including scanned paper documents, PDF files, images, screenshots, and photographs containing printed or handwritten text.

The OCR library analyzes the shapes, patterns, and structures within an image or PDF file to identify and extract text. It then converts this extracted text into a format that can be edited, searched, and manipulated.

OCR lets you convert non-editable documents, such as scanned images or PDF files, into editable and searchable formats. This allows for easier document management, text extraction, content indexing, and accessibility improvements.

Our Customers Love Us

Having an excellent set of tools and a great support team, Syncfusion reduces customers’ development time.
Here are some of their experiences.

Rated by users across the globe

Want to create, view, and edit PDF files in C# or VB.NET?

Start a free 30-day evaluation today!
DOWNLOAD FREE TRIAL

No credit card required.

Mobile Free Evaluation Section

Awards

Greatness—it’s one thing to say you have it, but it means more when others recognize it. Syncfusion is proud to hold the following industry awards.

Scroll up icon