Newest 'cjk+nlp' Questions

0 votes

0 answers

53 views

Issues with Sokuon Conversion in pykakasi Library for Japanese to Romaji Translation in Python

I am attempting to use the pykakasi library in Python to convert Japanese text to Romaji. However, I am encountering issues with the conversion of sokuon (促音). Here is the code I am using: import ...

Gao Burning

1

asked Nov 25, 2023 at 17:04

1 vote

2 answers

1k views

Python, using pdfplumber, pdfminer packages extract text from pdf, bolded characters duplicates

Goal: extract Chinese financial report text Implementation: Python pdfplumber/pdfminer package to extract PDF text to txt problem: for PDF text in bold, corresponding extracted text in txt duplicates ...

user19560886

39

asked Apr 10, 2023 at 8:17

1 vote

1 answer

447 views

Pinyin packages: accuracy and efficiency

I am looking to get the pinyin of Simplified Mandarin characters, and have come across two packages: pinyin 0.4.0 which is 6 years old (GitHub repo here) pinyin_jyutping_sentence which is 2> years ...

Thomas

261

asked Jul 19, 2022 at 20:09

1 vote

0 answers

168 views

How to parse and encode Chinese Characters in Jupyter Notebook?

I want to train a really basic NLP model but using Chinese characters. Read_csv doesn't really work. I was also wondering if there is any way to extract the different parts of the character, like for ...

SaltyGamer

149

asked Feb 16, 2022 at 15:30

2 votes

2 answers

753 views

Tokenizing Chinese text with keras.preprocessing.text.Tokenizer

keras.preprocessing.text.Tokenizer doesn't work correctly with Chinese text. How can I modify it to work on Chinese text? from keras.preprocessing.text import Tokenizer def fit_get_tokenizer(data, ...

Shahed Islam

33

asked Jan 28, 2022 at 7:55

0 votes

2 answers

60 views

identify elements with specific language, f.e. chinese

I have a dataset that looks simplified similar to this: call_id<- c("001","002","003","004","005","012","024") transcript <- ...

wiwi123

3

asked Jun 23, 2021 at 17:52

0 votes

0 answers

255 views

Spell Check/DidYouMean for Japanese language

looking for ideas for implementing Spellcheck/DidYouMean for the Japanese language (mostly). The target for spellcheck is search queries, search engine build on solr, but the solution is not bound to ...

aTan

21

asked Apr 15, 2021 at 6:46

-4 votes

5 answers

239 views

Is there R function to extract number amounts from string of Chinese characters?

I have a string like d d <- c("您尾号1234卡11月11日00:03转入人民币1,500.00元，余额人民币1,501.12元", "您尾号3256卡11月11日00:03转出人民币678.12元，余额人民币1,501.12元", "您尾号7894卡11月11日00:03取现0....

zhangnan

11

asked Dec 16, 2020 at 2:05

0 votes

1 answer

103 views

Where to find resource of Japanese - Chinese dictionary

Hey I am trying to provide japanese - chinese translation functionality for my project. I have found Rikaichan which is a chrome plugin that achieves a popup japanese - english translation. Rikaichan ...

RioAraki

554

asked Jun 14, 2020 at 4:27

1 vote

2 answers

1k views

Module import issue with a Japanese Tokenizer

I am trying to get the JapaneseTokenizer working in python, but I am having trouble with one of the modules it depends on. Here is the trace of the errors I am getting: /Users/home/PycharmProjects/...

Terry Rozmus

396

asked Dec 11, 2018 at 6:02

0 votes

2 answers

659 views

RASA how to use Japanese (Tokennization-Mecab)

RASA is known to be an effective bots framework. Stack such as RASA NLU and RASA Core is really useful. I hand-on it around, I find out that its amazing especially with English text. I give another ...

Stev Jane

53

asked Oct 26, 2018 at 4:36

2 votes

2 answers

1k views

How to split CJK text into words?

I use JavaScript to create a transliteration. I am wondering whether it is possible to split CJK text into a sequence of words, defined according to some word segmentation standard. Any alternative? ...

Sidal

129

asked Apr 19, 2018 at 12:30

6 votes

3 answers

5k views

Spacy Japanese Tokenizer

I am trying to use Spacy's Japanese tokenizer. import spacy Question= 'すぺいんへいきました。' nlp(Question.decode('utf8')) I am getting the below error, TypeError: Expected unicode, got spacy.tokens.token....

MenorcanOrange

2,805

asked Nov 1, 2017 at 11:22

2 votes

1 answer

528 views

C# Japanese morphological analyzers

I can't find any Japanese morphological analyzers for C#. Can anyone please suggest one?

user1561543

21

asked Jul 29, 2012 at 21:23

1 vote

2 answers

403 views

Determine whether a romanized name is Japanese or not, preferably in Ruby

How can I determine whether a romanized name is likely, or unlikely, to be a Japanese name? "Yukihiro Matsumoto".likely_to_be_japanese? # => true "John Smith".likely_to_be_japanese? # => false ...

Andrew Grimm

81.1k

asked Jul 10, 2012 at 23:16

Collectives™ on Stack Overflow

All Questions

Issues with Sokuon Conversion in pykakasi Library for Japanese to Romaji Translation in Python

Python, using pdfplumber, pdfminer packages extract text from pdf, bolded characters duplicates

Pinyin packages: accuracy and efficiency

How to parse and encode Chinese Characters in Jupyter Notebook?

Tokenizing Chinese text with keras.preprocessing.text.Tokenizer

identify elements with specific language, f.e. chinese

Spell Check/DidYouMean for Japanese language

Is there R function to extract number amounts from string of Chinese characters?

Where to find resource of Japanese - Chinese dictionary

Module import issue with a Japanese Tokenizer

RASA how to use Japanese (Tokennization-Mecab)

How to split CJK text into words?

Spacy Japanese Tokenizer

C# Japanese morphological analyzers

Determine whether a romanized name is Japanese or not, preferably in Ruby

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags