Maintaining Consistency in extended Multiple Text generation

In the present study, we examine the question how can we prompt LLMs to maintain consistency (plausibility and coherence) over extended multiple text generation and we particularly focus on medical personas. We find that a Cascaded Generator-Summarizer architecture is highly effective, including using MMR (Maximal Marginal Relevance) for summarization and Llama2-7b for generation. We conduct an experiment by creating ten parametrizable medical personas with asthma. We also investigate the question about how far into the future LLMs can maintain consistency (timescale) and to this end, we propose 5 new plausibility metrics. These are:- Original Correctness/Retrieval, New Facts per turn, and Novel Correctness/Retrieval.

This project was part of my Directed Study completed with Prof. Carolyn Rose at LTI, CMU. For more details refer to the report and the slides from my talk!

Main Contributions

We propose a Cascaded Summarizer-Generator architecture to generate extended texts in the form of medical personas (see figure below).
We propose 5 new Plausibility metrics:
- Original Retrieval: How many original facts does it retrieve in the blog?
- New Facts per turn: How many new facts does it introduce in the blog PER TURN
- Novel Correctness: What proportion of new facts PER TURN is consistent with the original? (Rewards Novelty and Penalizes Contradictions)
- Novel Retrieval: How many Novel facts does it retrieve in the blog? (Rewards Novelty, Penalizes Forgetting)
We propose an ontology of errors to categorize the most common mistakes made by the LLM while generating consistent texts.
We create a list of ten detailed medical personas (different severities of asthma), with highly detailed specifications of their condition, medication, allergens, and demographic variables.

Architecture

The cascading summarizer-generator architecture used to generate the blogs for the personas. S and T stand for the System and Task prompts fed to the generator model respectively, and Sum n stands for the nth blog summary.

Results

The newly proposed metrics averaged across eight distinct personas (to reduce noise in the data). OC = Original Correctness, OR = Original Recall, NF = New Facts per turn, NC = Novel Correctness, NR = Novel Retrieval.

Metric	Turn 1	Turn 2	Turn 3	Turn 4	Turn 5	Turn 6	Turn 7	Turn 8	Turn 9	Turn 10	Turn 11	Turn 12
OC	0.66	0.53	0.48	0.54	0.48	0.43	0.44	0.52	0.41	0.42	0.48	0.42
OR	0.45	0.36	0.31	0.29	0.34	0.27	0.24	0.33	0.27	0.29	0.24	0.24
NF	1.13	1.25	0.88	1.38	1.13	1.75	1.25	1.38	1.88	1.38	1.25	0.88
NC	0.5	1	0.63	0.81	1	0.58	0.56	0.83	0.84	0.63	0.46	0.44
NR	0.5	0.78	0.25	0.39	0.3	0.35	0.14	0.16	0.22	0.16	0.08	0.07

We observe that the Original Correctness and Retrieval stay stable with only a slightly decreasing trend, while the Novel Correctness and Retrieval metrics are much noisier and have a sharper decline. The Original metrics results are encouraging, serving as an indication that LLMs are capable of maintaining consistency over text generation, making this a promising area for future research.

For a more detailed results and discussion section refer to the report.

Directory Structure

The files are organized as follows:

generations_mmr_summary - Examples of long-form multi-turn generation using our architecture with MMR summarization. Contains 10 txt files with the blog post generations corresponding to the 10 personas.
generations_no_summary - Examples of long-form multi-turn generation without using our architecture.
metrics_no_summary - The Original precision metric for 10 personas (generated without summaries)
logs - Logs from previous runs.
old_blog_generation_code - Previous versions of the generate_parametrics_persona_blog_posts.py script which allows for customization of parameters like condition/blog length/past look over/time frame and so on.

generate_parametric_persona_blog_posts.py - Main script which generates the personas and summaries using llama2-7b-chat-hf and a summarizer model of the user's choice.
persona_variables.py - Contains ten distinct parametric medical personas to micro-average metrics across.

llama_summarizer.py - Abstracts the llama summarization function using zero-shot and few-shot prompts for llama2-7b-chat-hf.
mmr_summarizer.py - Abstracts the Maximal Marginal Relevance summarization function for the blog posts.
summarizer_module.py - Abstracts the BART and Flan-T5 summarization function for the blog posts.

question_answering_module.py - Abstracts the QA function used in metric.py
plausibility_metric.py - Contains the "Original Correctness" metric calculation function using flant5-large as a QA model.

environment.txt - Explicitly lists all the packages (with versions) in the virtual environment.
report.pdf - The report containing details about experimental design and results.
talk_slides.pdf - The slides to my presentation on this study.

Run Commands

Run generate_personas.py using the bash script. The arguments can take the following values:

summarizer_type : ["bart", "flant5", "mmr", "llama"] (llama and mmr are recommended for best performance).
persona_id: Any integer from 1 to 10 (inclusive). (Refer to persona_variables.py for more details)

Run Command: python generate_parametric_persona_blog_posts.py <summarizer_type> <persona_id> > persona_<persona_id>.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Maintaining Consistency in extended Multiple Text generation

Main Contributions

Architecture

Results

Directory Structure

Run Commands

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
generations_mmr_summary		generations_mmr_summary
generations_no_summary		generations_no_summary
logs		logs
metrics_no_summary		metrics_no_summary
README.md		README.md
environment.txt		environment.txt
generate_parametric_persona_blog_posts.py		generate_parametric_persona_blog_posts.py
llama_summarizer.py		llama_summarizer.py
mmr_summarizer.py		mmr_summarizer.py
persona_variables.py		persona_variables.py
plausibility_metric.py		plausibility_metric.py
question_answering_module.py		question_answering_module.py
report.pdf		report.pdf
run_1.sh		run_1.sh
summarizer_module.py		summarizer_module.py
talk_slides.pdf		talk_slides.pdf

Aadit3003/llm-medical-personas

Folders and files

Latest commit

History

Repository files navigation

Maintaining Consistency in extended Multiple Text generation

Main Contributions

Architecture

Results

Directory Structure

Run Commands

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages