Package: new release coming soon.
OnPoint is a question answering service which levearages product user reviews. OnPoint saves you lots of time when you try to look for a product detail by providing you a short answer in seconds.
This repository explores the application of XL-Net on user review based question answering service. The base model and algorithm was inspired and based upon the XLNet: Generalized Autoregressive Pretraining for Language Understanding link and renatoviolin/xlnet link repo.
- onpoint : contains all the source code
- tst : contains all the unit tests
- data : contains data for unit test
- configs : contains config files for hyperparameters during finetuning and evaluation
git clone https://github.com/hairong-wang/OnPoint.git
cd OnPoint
- tensorflow==1.15
- absl-py==0.8.0
- Flask==1.1.1
- sentencepiece
- pandas==0.25.1
- numpy==1.17.2
- nltk
Linux
Optional, if you have multiple GPUs on your machine, then it's recommended that you use one GPU to run. Without this configuration, it might take unnecessary memory from addtional GPUs while this additional GPUs are not actually running.
export CUDA_VISIBLE_DEVICES=0
run
pip install -r requirement.txt
The datasets used in this project are:
- The Squad dataset is used in this proeject.
- The manual sampled and labeled AmazonQA and preprocessed dataset is available at Google Cloud Storage Buckets/xlnet_squad2/data/amazon, you can access the bucket from here. Download the datasets for finetuning by running the following:
cd onpoint
bin/data_ingestion
- The model checkpoints is available at Google Cloud Storage Buckets/xlnet_squad2/experiment/squad_and_amazon_8000steps_1000warmup, you can access the bucket from here. So far, the top performance model checkpoint is 'model.ckpt-4000' Download the model checkpoints by running:
bin/model_download
python3 app.py
Open your browser, and enter:
localhost:6001
Now you can paste the context you want to use to the left text box, and type in the question to the right text box.
Please log in to GCP compute engine instance for the following steps. For finetuning, you'll need a TPU instance and GCP storage bucket.
###Step1: Data processing
If you want to try other dataset, it needs to be converted to SQuAD format first using squad_converter.py
# Change the INFILE and OUTFILE path
python3 squad_converter.py
multi-processing available in bin/data_processing, need to change 'NUM_PROC=' to the number of core you'll use. Replace with your own gcp storage bucket
cd onpoint
export STORAGE_BUCKET=${YOUR GCP STORAGE BUCKET}
bin/data_processing
The model_building script contains two parts, the second part is for fintuning on Amazon, now it's commented. If you want to finetune on AmazonQA, please comment the first part, and uncomment the second part. V3-8 TPU is recommended, run the following to start your TPU engine:
ctpu up --tpu-size=v3-8
please replace with your own tpu name
export TPU_NAME=${YOUR TPU NAME}
export STORAGE_BUCKET=${YOUR GCP STORAGE BUCKET}
bin/model_building
The model_analysis script is to evaluate the model fintuned on SQUAD. If you'd like to evaluate other checkpoints, please modify the variables
export STORAGE_BUCKET=${YOUR GCP STORAGE BUCKET}
export TPU_NAME=${YOUR TPU NAME}
bin/model_analysis
Model inference takes two arguments, the first is the path name of the test dataset in JSON format, the second is the folder name of output(prediction) directory You can find the following folders in the tmp folder: null_odds.json: no answer probability nbest_predictions.json: the top n result, n can be set in run_squad.py
flags.DEFINE_integer("n_best_size", default=5,
help="n best size for predictions")
predictions.json: top 1 prediction results
bin/model_inference ${TEST DATASET JSON PATH NAME} ${OUTPUT FOLDER NAME}
The Pytorch BERT implementation with preloading takes 0.1 - 0.5 second to inference depending on the length of the context. The current tensorflow version doesn't support preloading directly, takes around 20 seconds. Tried to move the preloading part into global in app.py, didn't work, will try the tensorflow serving for next steps.
Model | Finetune Dataset | Validation Dataset | AmazonQA Sample Coverage | F1 |
---|---|---|---|---|
BERT-Large | SQuAD 2.0. | Augmented AmazonQA | 30% | 67.34 |
XLNet-Large | SQuAD 2.0. | Augmented AmazonQA | 40% | 66.20 |
XLNet-Large | Augmented AmazonQA | Augmented AmazonQA | 0% | 66.67 |
XLNet-Large | SQuAD 2.0 + Augmented AmazonQA | Augmented AmazonQA | 50% | 69.27 |