The Syncfusion .NET optical character recognition (OCR) library is used to extract text from scanned PDFs and images. With just a few lines of C# code, a scanned PDF document containing a raster image is converted into a searchable and selectable PDF document. You can save the OCR result as text, structured data, or searchable PDF documents. The .NET OCR library uses the powerful Tesseract OCR engine.
The OCR feature works seamlessly across platforms, including Windows, macOS, and Linux, through any .NET-based applications, such as ASP.NET Core, ASP.NET MVC, Blazor, WinForms, WPF, and WinUI.
The below code demonstrates how to convert scanned PDF to a searchable PDF.
using Syncfusion.OCRProcessor;
using Syncfusion.Pdf.Parsing;
// Initialize the OCR processor
using (OCRProcessor processor = new OCRProcessor())
{
// Load the existing PDF document
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
PdfLoadedDocument pdfLoadedDocument = new PdfLoadedDocument(stream);
// Set OCR language to process
processor.Settings.Language = Languages.English;
// Process OCR by providing the PDF document
processor.PerformOCR(pdfLoadedDocument);
// Save the OCRed document
using (FileStream outputFileStream = new FileStream(@"Output.pdf", FileMode.Create, FileAccess.ReadWrite))
{
//Save the PDF document to file stream.
pdfLoadedDocument.Save(outputFileStream);
}
//Close the document.
pdfLoadedDocument.Close(true);
}
}
Discover the features of our OCR processor library to enhance text extraction, language recognition, and document processing for seamless integration.
Perform OCR on an entire scanned PDF document and convert it into a searchable PDF document.
Extract text from a single scanned image or multi-page TIFF images.
Extract the text from a scanned, rotated page of a PDF document and convert it to a searchable PDF document.
Make images searchable and selectable by converting them to PDF or PDF/A document using OCR.
Extract data from PDFs and images by restricting OCR to a particular region in a PDF or image.
After performing OCR, you can programmatically highlight, underline, and strike through the text of the resulting PDF document. You can also redact, edit, and digitally sign the PDF document.
Discover valuable resources from our blog and knowledge base on using the OCR library.
Explore these resources for comprehensive guides, knowledge base articles, insightful blogs, and ebooks.
Product Updates
Technical Support
OCR stands for optical character recognition. It is a technology used to convert different types of documents, such as scanned paper documents, PDF files, or images captured by a digital camera, into editable and searchable data.
OCR can process various types of documents, including scanned paper documents, PDF files, images, screenshots, and photographs containing printed or handwritten text.
The OCR library analyzes the shapes, patterns, and structures within an image or PDF file to identify and extract text. It then converts this extracted text into a format that can be edited, searched, and manipulated.
OCR lets you convert non-editable documents, such as scanned images or PDF files, into editable and searchable formats. This allows for easier document management, text extraction, content indexing, and accessibility improvements.
Greatness—it’s one thing to say you have it, but it means more when others recognize it. Syncfusion is proud to hold the following industry awards.