Chainer data loader for text dataset using chainer.dataset.DatasetMixin. This data loader can be used for Japanese, Chinese and English.
pip install -r requirements.txt
git clone https://github.com/matasukef/chainer-seq2seq-dataloader
cd chainer-seq2seq-dataloader
python
import nltk
nltk.download('punkt')
In this repository, small_parallel_enja is used in example.ipynb
git clone https://github.com/odashi/small_parallel_enja
sh ./preprocess_data.sh
For usage, please see example.ipynb