Skip to content

nusnlp/LLM-TLS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-TLS

Repo for paper "From Moments to Milestones: Incremental Timeline Summarization Leveraging Large Language Models".

Event-TLS

Dataset

The tweet ID and summary data we used in ./data is from CrisisLTLSum. Due to tweet privacy policy, we cannot release the content of the tweet. We recommend to download the tweet content through https://www.tweepy.org/ using the tweet ID.

Workflow

  1. To prepare data for tweet timeline membership classification training, please refer to prepare_cls_data.py
python prepare_cls_data.py --input "./data" --output "./data/cls"
  1. To prepare data for timeline summarization training, please refer to preprocess_timelines.py and prepare_tls_data.py
python preprocess_timelines.py
python prepare_tls_data.py --input "./data" --output "./data/ft"

In our implementation, we host the classification model through vLLM using the below command in a separate process:

python -m vllm.entrypoints.openai.api_server --model "CLS MODEL PATH" --port 8000
  1. Perform incremental clustering process by generate_clusters.py
python generate_clusters.py --input "./data" --output "./cluster_output" --dataset_type "test" --cls_model_name "[CLS MODEL PATH]" --is_retrieval
  1. Perform timeline summarization and evaluation by cluster_tls_eval.py
python cluster_tls_eval.py --input "./data" --output "./cluster_output" --dataset_type "test" --sum_model_base "SUMMARIZATION MODEL PATH" --sum_model_lora_checkpoint "LORA CHECKPOINT PATH" --is_retrieval

Topic-TLS

Download Dataset

To download datasets(T17, Crisis, Entities), please refer to complementizer/news-tls .

Workflow

  1. To preprocess dataset articles to certain format, please refer to preprocess_articles.py
python preprocess_articles.py --ds_path "./datasets" --dataset "entities" --save_path "./corpus"

In our implementation, we host the model through vLLM using the below command in a separate process:

python -m vllm.entrypoints.openai.api_server --model "meta-llama/Llama-2-13b-hf" --port 8000
  1. Perform LLM for event generation by generate_events.py.
python generate_events.py --dataset entities --model "meta-llama/Llama-2-13b-hf" --extraction_path "./event_outputs"
  1. Perform incremental clustering process by generate_clusters.py
python generate_clusters.py \
    --model meta-llama/Llama-2-13b-hf \
    --output ./timelines_output \
    --input ./event_outputs/entities/Llama-2-13b-hf \
    --top_n 20 \
    --dataset entities \
    --incremental \
    --keyword all
  1. Perform timeline summarization and evaluation by cluster_tls_eval.py
python cluster_tls_eval.py \
    --timelines_path ./timelines_output/entities/Llama-2-13b-hf \
    --events_path ./event_outputs/entities/Llama-2-13b-hf \
    --output ./result/entities \
    --dataset entities \
    --text_rank

If you encounter any problem with the code running, please contact [email protected] .


License

The source code in this repository is licensed under the GNU General Public License Version 3. For commercial use of this code, separate commercial licensing is also available. Please contact:

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages