Skip to main content

All Questions

Tagged with
0 votes
0 answers
53 views

Issues with Sokuon Conversion in pykakasi Library for Japanese to Romaji Translation in Python

I am attempting to use the pykakasi library in Python to convert Japanese text to Romaji. However, I am encountering issues with the conversion of sokuon (促音). Here is the code I am using: import ...
Gao Burning's user avatar
1 vote
2 answers
1k views

Python, using pdfplumber, pdfminer packages extract text from pdf, bolded characters duplicates

Goal: extract Chinese financial report text Implementation: Python pdfplumber/pdfminer package to extract PDF text to txt problem: for PDF text in bold, corresponding extracted text in txt duplicates ...
user19560886's user avatar
1 vote
1 answer
447 views

Pinyin packages: accuracy and efficiency

I am looking to get the pinyin of Simplified Mandarin characters, and have come across two packages: pinyin 0.4.0 which is 6 years old (GitHub repo here) pinyin_jyutping_sentence which is 2> years ...
Thomas's user avatar
  • 261
1 vote
0 answers
168 views

How to parse and encode Chinese Characters in Jupyter Notebook?

I want to train a really basic NLP model but using Chinese characters. Read_csv doesn't really work. I was also wondering if there is any way to extract the different parts of the character, like for ...
SaltyGamer's user avatar
2 votes
2 answers
753 views

Tokenizing Chinese text with keras.preprocessing.text.Tokenizer

keras.preprocessing.text.Tokenizer doesn't work correctly with Chinese text. How can I modify it to work on Chinese text? from keras.preprocessing.text import Tokenizer def fit_get_tokenizer(data, ...
Shahed Islam's user avatar
0 votes
2 answers
60 views

identify elements with specific language, f.e. chinese

I have a dataset that looks simplified similar to this: call_id<- c("001","002","003","004","005","012","024") transcript <- ...
wiwi123's user avatar
0 votes
0 answers
255 views

Spell Check/DidYouMean for Japanese language

looking for ideas for implementing Spellcheck/DidYouMean for the Japanese language (mostly). The target for spellcheck is search queries, search engine build on solr, but the solution is not bound to ...
aTan's user avatar
  • 21
-4 votes
5 answers
239 views

Is there R function to extract number amounts from string of Chinese characters?

I have a string like d d <- c("您尾号1234卡11月11日00:03转入人民币1,500.00元,余额人民币1,501.12元", "您尾号3256卡11月11日00:03转出人民币678.12元,余额人民币1,501.12元", "您尾号7894卡11月11日00:03取现0....
zhangnan's user avatar
0 votes
1 answer
103 views

Where to find resource of Japanese - Chinese dictionary

Hey I am trying to provide japanese - chinese translation functionality for my project. I have found Rikaichan which is a chrome plugin that achieves a popup japanese - english translation. Rikaichan ...
RioAraki's user avatar
  • 554
1 vote
2 answers
1k views

Module import issue with a Japanese Tokenizer

I am trying to get the JapaneseTokenizer working in python, but I am having trouble with one of the modules it depends on. Here is the trace of the errors I am getting: /Users/home/PycharmProjects/...
Terry Rozmus's user avatar
0 votes
2 answers
659 views

RASA how to use Japanese (Tokennization-Mecab)

RASA is known to be an effective bots framework. Stack such as RASA NLU and RASA Core is really useful. I hand-on it around, I find out that its amazing especially with English text. I give another ...
Stev Jane's user avatar
2 votes
2 answers
1k views

How to split CJK text into words?

I use JavaScript to create a transliteration. I am wondering whether it is possible to split CJK text into a sequence of words, defined according to some word segmentation standard. Any alternative? ...
Sidal's user avatar
  • 129
6 votes
3 answers
5k views

Spacy Japanese Tokenizer

I am trying to use Spacy's Japanese tokenizer. import spacy Question= 'すぺいんへ いきました。' nlp(Question.decode('utf8')) I am getting the below error, TypeError: Expected unicode, got spacy.tokens.token....
MenorcanOrange's user avatar
2 votes
1 answer
528 views

C# Japanese morphological analyzers

I can't find any Japanese morphological analyzers for C#. Can anyone please suggest one?
user1561543's user avatar
1 vote
2 answers
403 views

Determine whether a romanized name is Japanese or not, preferably in Ruby

How can I determine whether a romanized name is likely, or unlikely, to be a Japanese name? "Yukihiro Matsumoto".likely_to_be_japanese? # => true "John Smith".likely_to_be_japanese? # => false ...
Andrew Grimm's user avatar
  • 81.1k

15 30 50 per page