OnPoint: A Question Answering Service leveraging user reviews

Project Demo Slides

Package: new release coming soon.

OnPoint: A Question Answering Service leveraging user reviews

OnPoint is a question answering service which levearages product user reviews. OnPoint saves you lots of time when you try to look for a product detail by providing you a short answer in seconds.

This repository explores the application of XL-Net on user review based question answering service. The base model and algorithm was inspired and based upon the XLNet: Generalized Autoregressive Pretraining for Language Understanding link and renatoviolin/xlnet link repo.

The directory structure of this repo is the following:

onpoint : contains all the source code
tst : contains all the unit tests
- data : contains data for unit test
configs : contains config files for hyperparameters during finetuning and evaluation

Setup

Installation

git clone https://github.com/hairong-wang/OnPoint.git
cd OnPoint

Requisites

tensorflow==1.15
absl-py==0.8.0
Flask==1.1.1
sentencepiece
pandas==0.25.1
numpy==1.17.2
nltk

Operating system

Linux

Environment setup

Optional, if you have multiple GPUs on your machine, then it's recommended that you use one GPU to run. Without this configuration, it might take unnecessary memory from addtional GPUs while this additional GPUs are not actually running.

export CUDA_VISIBLE_DEVICES=0

Steps to run the Flask App

Step1: Install requirement

run

pip install -r requirement.txt

Step2: Download dataset and model checkpoints

- Download dataset

The datasets used in this project are:

The Squad dataset is used in this proeject.
The manual sampled and labeled AmazonQA and preprocessed dataset is available at Google Cloud Storage Buckets/xlnet_squad2/data/amazon, you can access the bucket from here. Download the datasets for finetuning by running the following:

cd onpoint
bin/data_ingestion

- Download model checkpoints

The model checkpoints is available at Google Cloud Storage Buckets/xlnet_squad2/experiment/squad_and_amazon_8000steps_1000warmup, you can access the bucket from here. So far, the top performance model checkpoint is 'model.ckpt-4000' Download the model checkpoints by running:

bin/model_download

Step3: Run the app

python3 app.py

Open your browser, and enter:

localhost:6001

Now you can paste the context you want to use to the left text box, and type in the question to the right text box.

Steps to finetune the model

Please log in to GCP compute engine instance for the following steps. For finetuning, you'll need a TPU instance and GCP storage bucket.

Step0: Download datasets and models(see above)

###Step1: Data processing

- Convert dataset to SQuAD format(Optional)

If you want to try other dataset, it needs to be converted to SQuAD format first using squad_converter.py

# Change the INFILE and OUTFILE path
python3 squad_converter.py

- Preprocess data

multi-processing available in bin/data_processing, need to change 'NUM_PROC=' to the number of core you'll use. Replace with your own gcp storage bucket

cd onpoint
export STORAGE_BUCKET=${YOUR GCP STORAGE BUCKET}
bin/data_processing

Step2: Train model

The model_building script contains two parts, the second part is for fintuning on Amazon, now it's commented. If you want to finetune on AmazonQA, please comment the first part, and uncomment the second part. V3-8 TPU is recommended, run the following to start your TPU engine:

ctpu up --tpu-size=v3-8

please replace with your own tpu name

export TPU_NAME=${YOUR TPU NAME}
export STORAGE_BUCKET=${YOUR GCP STORAGE BUCKET}
bin/model_building

Step3: Evaluate model

The model_analysis script is to evaluate the model fintuned on SQUAD. If you'd like to evaluate other checkpoints, please modify the variables

export STORAGE_BUCKET=${YOUR GCP STORAGE BUCKET}
export TPU_NAME=${YOUR TPU NAME}
bin/model_analysis

Step4: Inference model

Model inference takes two arguments, the first is the path name of the test dataset in JSON format, the second is the folder name of output(prediction) directory You can find the following folders in the tmp folder: null_odds.json: no answer probability nbest_predictions.json: the top n result, n can be set in run_squad.py

flags.DEFINE_integer("n_best_size", default=5,
                     help="n best size for predictions")

predictions.json: top 1 prediction results

bin/model_inference ${TEST DATASET JSON PATH NAME} ${OUTPUT FOLDER NAME}

Next steps:

Add 'preloading model' feature

The Pytorch BERT implementation with preloading takes 0.1 - 0.5 second to inference depending on the length of the context. The current tensorflow version doesn't support preloading directly, takes around 20 seconds. Tried to move the preloading part into global in app.py, didn't work, will try the tensorflow serving for next steps.

Final result:

Model	Finetune Dataset	Validation Dataset	AmazonQA Sample Coverage	F1
BERT-Large	SQuAD 2.0.	Augmented AmazonQA	30%	67.34
XLNet-Large	SQuAD 2.0.	Augmented AmazonQA	40%	66.20
XLNet-Large	Augmented AmazonQA	Augmented AmazonQA	0%	66.67
XLNet-Large	SQuAD 2.0 + Augmented AmazonQA	Augmented AmazonQA	50%	69.27

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.circleci		.circleci
configs		configs
onpoint		onpoint
tst		tst
LICENSE		LICENSE
MANIFEST		MANIFEST
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Demo Slides

Package: new release coming soon.

OnPoint: A Question Answering Service leveraging user reviews

The directory structure of this repo is the following:

Setup

Installation

Requisites

Operating system

Environment setup

Steps to run the Flask App

Step1: Install requirement

Step2: Download dataset and model checkpoints

- Download dataset

- Download model checkpoints

Step3: Run the app

Steps to finetune the model

Step0: Download datasets and models(see above)

- Convert dataset to SQuAD format(Optional)

- Preprocess data

Step2: Train model

Step3: Evaluate model

Step4: Inference model

Next steps:

Add 'preloading model' feature

Final result:

About

Releases

Packages

Languages

License

hairong-wang/OnPoint

Folders and files

Latest commit

History

Repository files navigation

Project Demo Slides

Package: new release coming soon.

OnPoint: A Question Answering Service leveraging user reviews

The directory structure of this repo is the following:

Setup

Installation

Requisites

Operating system

Environment setup

Steps to run the Flask App

Step1: Install requirement

Step2: Download dataset and model checkpoints

- Download dataset

- Download model checkpoints

Step3: Run the app

Steps to finetune the model

Step0: Download datasets and models(see above)

- Convert dataset to SQuAD format(Optional)

- Preprocess data

Step2: Train model

Step3: Evaluate model

Step4: Inference model

Next steps:

Add 'preloading model' feature

Final result:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages