chatbot-trainer

A chrome extension and related code for training the chatbot LightBlue.

A demo video: https://youtu.be/flt2GLLF8so

Workflow

The workflow of the whole set of code is:

Extract vocabulary words from the html page and save to file voc.py
Vocabulary analysis by word2vec and then plot the relationship between words
Do data preprocessing on raw training data
Manually revise the processed training data
Check vocabulary for training data
Train the chatbot by a chrome extension

Details will be explained below.

Vocabulary Extraction

Input html file: /chatbot_metadata/voc.html
Expected Output txt file: voc.txt
Not common words are saved at: voc_special.txt

Run:

python extract_voc.py

Vocabulary analysis

Run:

sh setup.txt
cd word2vec
python word2vec.py

This TensorFlow program will download training data from http://mattmahoney.net/dc/text8 and train the word2vec network to obtain word embedding vectors for 50000 common words. Then you can get a plot of words in voc.txt to see their relationships. A sample plot can be found at /word2vec/tsne_1000.png.

Datasets

In my case, my training data is from eslfast.com.

Put raw training data at /chrome_ext/data/topics_original/$INDEX$
Expected formatted data will be saved at /chrome_ext/data/topics/$INDEX$

Change the startind at data_processing/eslfast/main.py to $INDEX$ , then run:

cd data_processing/eslfast
python main.py

Vocabulary Checking

Input data is the modified conversations at /chrome_ext/data/topics/$INDEX$

Run

python check_voc.py

to see the words that not in voc.txt.

Chrome Extension

Load the unpacked extension in folder /chrome_ext to your Chrome follows official guide
Choose training file
Click start

What will the extension do during training?

Input the $QUESTION$
Modify the response to $ANSWER$ and change
Like the revised response
Train next one

The extension will start new session when meet <sss> in training document, Q&A should be formatted as $QUESTION$>>>$ANSWER$. Sample training paragraph:

welcome, smallblue, come on in!>>>hi, derek! what a nice home!
we enjoy it too!>>>how long have you live here?
about four year now.>>>well, it is very beautiful.
smallblue, have a seat and I will get us something to drink.>>>good! I am really hot. you know it really is hot outside!
I have different drink.>>>thank you!
<sss>
smallblue, welcome to my home!>>>it is so nice to see you. what a wonderful home!
we really like stay in this neighborhood.>>>how long have you have this house?
we just move here last year.>>>it is a beautiful home.
I get some drink for us in the kitchen.>>>that would be wonderful. it is really hot out.
I can offer you drink.>>>thank you!
<sss>

Author

Derek Mingyu MA, derek.ma, [email protected]

Acknowledgements

Most of the word2vec implementation in TensorFlow is borrowed from https://github.com/tensorflow/tensorflow/blob/r1.4/tensorflow/examples/tutorials/word2vec/word2vec_basic.py.

The paper of word2vec can be found at: https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chatbot-trainer

Workflow

Vocabulary Extraction

Vocabulary analysis

Datasets

Vocabulary Checking

Chrome Extension

Author

Acknowledgements

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
chatbot_metadata		chatbot_metadata
chrome_ext		chrome_ext
data_processing		data_processing
img		img
word2vec		word2vec
LICENSE		LICENSE
README.md		README.md
check_voc.py		check_voc.py
extract_voc.py		extract_voc.py
setup.txt		setup.txt
voc.txt		voc.txt
voc_special.txt		voc_special.txt

License

derekmma/chatbot-trainer

Folders and files

Latest commit

History

Repository files navigation

chatbot-trainer

Workflow

Vocabulary Extraction

Vocabulary analysis

Datasets

Vocabulary Checking

Chrome Extension

Author

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages