Concerns About Vaccines with Explanations and Summaries (CAVES)

This repository contains the datasets, corresponding to the paper titled "CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines", which was accepted at ACM SIGIR 2022 (Resource Track). A preprint version is available on: arXiv.

NOTE: The dataset was updated since the publication of the paper, details have been updated in the preprint version

Data Description

The "gold_summaries" folder contains summaries of each of the classes by 3 different annotators. The "labelled_tweets" folder contains the labels and tweet IDs in standard CSV format, and the label-explanation tuples in standard JSON format. The "start" and "end" indices in the explanations represent the index of the corresponding tokens in the text when split just by whitespaces.

For example, for a start and end index of 2 and 6, the explanation for the tweet: "They are making huge $$$ profits ! won't take it!" will be "making huge $$$ profits".

For queries please mail me at: Email ID

If you use our data, please cite the following paper:

@inproceedings{poddar2022caves,
  title={CAVES: A dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines},
  author={Poddar, Soham and Samad, Azlaan Mustafa and Mukherjee, Rajdeep and Ganguly, Niloy and Ghosh, Saptarshi},
  booktitle={Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2022}
}

Classification Models on the Dataset

MuLX-QA is method that identifies multiple label-explanation tuples from social media posts. This method was accepted for the ACM Transactions on the Web (TWEB) Journal in 2024. Link to paper Link to Github
Cov-Gen is a method that uses a flan-T5 model to accurately classify the vaccine concerns (multi-labels). This method is part of the paper "How COVID-19 has Impacted the Anti-Vaccine Discourse: A Large-Scale Twitter Study Spanning Pre-COVID and Post-COVID Era" accepted at 18th International AAAI Conference on Web and Social Media (ICWSM). Link to paper Link to Github

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
gold_summaries		gold_summaries
labelled_tweets		labelled_tweets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Concerns About Vaccines with Explanations and Summaries (CAVES)

Data Description

Classification Models on the Dataset

About

Releases

Packages

License

sohampoddar26/caves-data

Folders and files

Latest commit

History

Repository files navigation

Concerns About Vaccines with Explanations and Summaries (CAVES)

Data Description

Classification Models on the Dataset

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages