Newest 'text-extraction' Questions

-1 votes

0 answers

34 views

How can I extract text with coordinates from a PDF using JavaScript? [closed]

I’m working on a project where I need to extract text along with its coordinates from a PDF document using JavaScript. I’m familiar with PDF.js, but I’m unsure how to use it to get the coordinates of ...

Vikram Ray

1,164

asked Sep 16 at 10:32

0 votes

1 answer

74 views

Order arrangement of texts of docx documents in the document.xml

I am trying to extract text from docx files, where I am getting collapsed text from the document like the text present at the bottom or in a random text box is extracted first and then the texts from ...

vignesh

3

asked Aug 28 at 12:27

1 vote

0 answers

39 views

Extracting text from a pdf file with differents strcuture failed how to properly do it Not all texts is extracted , just a portion is extracted

I am trying to extract text from CV in pdf extension. I come up with this script but I have a problem. The script does not extract all the text and I have problem to identify different block of the ...

emma

343

asked Aug 26 at 17:01

0 votes

0 answers

47 views

Capturing Formatted Numbering from DOCX Files in Python

I'm working on a Python project where I need to extract text from DOCX files, preserving the formatted numbering. I've encountered a peculiar issue that I'm hoping someone can help me solve. The ...

Anshuman Sharma

19

asked Aug 23 at 23:43

0 votes

0 answers

43 views

403 Clients Error: Forbidden for url: https://something.org/anotherthing/more%20things%20andfile.pdf

I was scraping a website and I tried to open a URL to PDF file to extract text from the pages. Unfortunately, I keep getting the following error message 403 Clients Error: Forbidden for url: https://...

Bacha

11

asked Aug 14 at 0:30

0 votes

0 answers

59 views

Guidance on Extracting Compliance Items from PDF documents by fine-tuning a LLM

Need some guidance on extracting large compliance items from raw PDF documents. I have csv with these compliance items and I want to fine-tune a LLM such that if it reads any new PDF documents it can ...

Daremitsu

643

asked Aug 1 at 16:23

3 votes

1 answer

83 views

Parsing formulas efficiently using regex and Polars

I am trying to parse a series of mathematical formulas and need to extract variable names efficiently using Polars in Python. Regex support in Polars seems to be limited, particularly with look-around ...

Oyibo

97

asked Jul 23 at 21:34

-1 votes

1 answer

51 views

Extracting Text from PDFs with Python Without Including Comments

I have been trying to extract text from PDF files to automate a significant and tedious part of my job using Python. With the help of ChatGPT, I have written multiple lines of code. However, I am ...

MDMT

1

asked Jul 8 at 12:42

1 vote

1 answer

92 views

Accurately Detecting randomly rotated Text in Images

I'm trying to detect text from items, which may be rotated in various directions. I've tried using Tesseract, EasyOCR, and EAST for text detection and extraction, but I am encountering issues with ...

Agura

11

asked Jul 2 at 19:34

1 vote

0 answers

53 views

AWS Textract With AWS Signature Version 4 Using Go Lang

I have 3 credentials: host acckey secretkey That from AWS. I am using AWS Signature Ver 4 method And then i want to using textract feature from AWS with Golang. I have build the code and have a ...

Hafi Ihza Farhana

21

asked Jul 2 at 11:20

0 votes

2 answers

79 views

How to convert a string in python to separate strings [closed]

I have a pandas dataframe with only one column containing symbols. I need to separate those symbols in groups of 13 and 39 inside a single string. symbol 3IINFOTECH 3MINDIA 3PLAND 20MICRONS 3RDROCK ...

Hamza Ahmed

1,751

asked Jun 27 at 9:46

0 votes

0 answers

51 views

How to extract data from a PDF and their position?

Currently, I'm using Google's Vision AI to extract information about dates and prices from pdf files. I proceed with the following steps: Extract the text from the PDF. The result received from ...

Đạt Vũ Trọng

1

asked Jun 19 at 4:47

0 votes

0 answers

35 views

Extracting structured data from user query

I want to extract structured data from the query provided by the user. For example, user query: I need data for females above the age of 3 output : { min_age: 3, max_age: None, sex: female } These are ...

llms_query

1

asked Jun 18 at 11:27

1 vote

0 answers

77 views

Improving OCR accuracy with pytesseract for processing manga images

def get_string(img_path): img = cv2.imread(img_path) img = cv2.resize(img, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC) gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ...

Myat Thet

19

asked Jun 18 at 5:38

2 votes

0 answers

115 views

lopdf RUST PDF - only getting text

brand new to rust and am trying to read a pdf file with lopdf. trying out various examples but I am just getting characters. I need all the chars like spaces, tabs, line breaks, etc...for Regex. Is ...

diogenes

2,067

asked Jun 9 at 5:15

Collectives™ on Stack Overflow

How can I extract text with coordinates from a PDF using JavaScript? [closed]

Order arrangement of texts of docx documents in the document.xml

Extracting text from a pdf file with differents strcuture failed how to properly do it Not all texts is extracted , just a portion is extracted

Capturing Formatted Numbering from DOCX Files in Python

403 Clients Error: Forbidden for url: https://something.org/anotherthing/more%20things%20andfile.pdf

Guidance on Extracting Compliance Items from PDF documents by fine-tuning a LLM

Parsing formulas efficiently using regex and Polars

Extracting Text from PDFs with Python Without Including Comments

Accurately Detecting randomly rotated Text in Images

AWS Textract With AWS Signature Version 4 Using Go Lang

How to convert a string in python to separate strings [closed]

How to extract data from a PDF and their position?

Extracting structured data from user query

Improving OCR accuracy with pytesseract for processing manga images

lopdf RUST PDF - only getting text

Hot Network Questions

Collectives™ on Stack Overflow

Related Tags