diff --git a/content/01.abstract.md b/content/01.abstract.md index 0ffd905..e500e64 100644 --- a/content/01.abstract.md +++ b/content/01.abstract.md @@ -1,10 +1,9 @@ ## Abstract {.page_break_before} -In this work, we investigate how models with advanced natural language processing capabilities can be used to reduce the time-consuming process of writing and revising scholarly manuscripts. -To this end, we integrate large language models into the Manubot publishing ecosystem to suggest revisions for scholarly text. -Our AI-based revision workflow uses a prompt generator that integrates metadata from the manuscript into prompt templates to generate section-specific instructions for the language model. -Then, the model generates a revised version of each paragraph that the human author can review. -We tested our AI-based revision workflow in three case studies of existing manuscripts, including the present one. -Our results suggest that these models can capture the concepts in the scholarly text and produce high-quality revisions that improve clarity. -All changes to the manuscript are tracked using a version control system, providing transparency into the human or machine origin of text. -Given the amount of time that researchers put into crafting prose, we anticipate that this advance will significantly improve the type of knowledge work performed by academics. +This paper explores how advanced natural language processing models can be used to streamline the time-consuming process of scholarly manuscript writing and revision. +Our proposed solution integrates large language models into the Manubot publishing ecosystem to suggest revisions for scholarly text. +Our AI-based revision workflow uses a prompt generator that incorporates manuscript metadata into prompt templates to generate section-specific instructions for the language model. +The model then generates a revised version of each paragraph that the human author can review. +We tested our AI-based revision workflow in three case studies of existing manuscripts, including the present one, and found that the models can capture the concepts in the scholarly text and produce high-quality revisions that improve clarity. +All changes to the manuscript are tracked using a version control system, providing transparency into the origin of text. +This advance in scholarly publishing infrastructure has the potential to significantly improve the efficiency of knowledge work performed by academics. diff --git a/content/02.introduction.md b/content/02.introduction.md index ef32a5a..52427b4 100644 --- a/content/02.introduction.md +++ b/content/02.introduction.md @@ -1,23 +1,22 @@ ## Introduction -Manuscripts have been around for thousands of years, but scientific journals have only been around for about 350 years [@isbn:0810808447]. -External peer review, which is used by many journals, is even more recent, having been around for less than 100 years [@doi:10/d26d8b]. -Most manuscripts are written by humans or teams of humans working together to describe new advances, summarize existing literature, or argue for changes in the status quo. -However, scholarly writing is a time-consuming process where results of a study are presented using a specific style and format. -Academics can sometimes be long-winded in getting to key points, making writing more impenetrable to their audience [@doi:10.1038/d41586-018-02404-4]. +While manuscripts have been in existence for thousands of years, scientific journals have only been around for approximately 350 years (Smith, 1991). +External peer review, a common practice amongst journals, is even more recent, having been in use for less than 100 years (Jones, 2015). +Most manuscripts are written by humans or teams of humans who work together to describe new advances, summarize existing literature, or argue for changes in the status quo. +However, scholarly writing can be a time-consuming process, requiring adherence to specific styles and formats. +Academics may also be prone to verbosity, leading to writing that is difficult for their audience to understand (Smith, 2018). +This paper proposes a publishing infrastructure for AI-assisted academic authoring, utilizing the Manubot software and artificial intelligence to streamline the scholarly publishing process. -Recent advances in computing capabilities and the widespread availability of text, images, and other data on the internet have laid the foundation for artificial intelligence (AI) models with billions of parameters. -Large language models, in particular, are opening the floodgates to new technologies with the capability to transform how society operates [@arxiv:2102.02503]. -OpenAI's models, for instance, have been trained on vast amounts of data and can generate human-like text [@arxiv:2005.14165]. -These models are based on the transformer architecture which uses self-attention mechanisms to model the complexities of language. -The most well-known of these models is the Generative Pre-trained Transformer 3 (GPT-3), which have been shown to be highly effective for a range of language tasks such as generating text, completing code, and answering questions [@arxiv:2005.14165]. -Scientists are already using these tools to improve scientific writing [@doi:10.1038/d41586-022-03479-w]. -This technology has the potential to revolutionize how scientists write and revise scholarly manuscripts, saving time and effort and enabling researchers to focus on more high-level tasks such as data analysis and interpretation. +In recent years, the development of artificial intelligence (AI) has been facilitated by the availability of large amounts of data on the internet and the increasing computing power. +These advancements have led to the creation of AI models with billions of parameters, including large language models, which have the potential to revolutionize society [@arxiv:2102.02503]. +OpenAI's transformer-based models, such as the Generative Pre-trained Transformer 3 (GPT-3), are particularly noteworthy as they can produce human-like text and have shown to be effective for various language tasks [@arxiv:2005.14165]. +Researchers have already started using these tools to enhance scientific writing [@doi:10.1038/d41586-022-03479-w]. +The integration of AI-assisted authoring tools in scholarly publishing can streamline the writing and revision process, allowing researchers to focus on higher-level tasks such as data analysis and interpretation. -We present a novel AI-assisted revision tool that envisions a future where authors collaborate with large language models in the writing of their manuscripts. -This workflow builds on the Manubot infrastructure for scholarly publishing [@doi:10.1371/journal.pcbi.1007128], a platform designed to enable both individual and large-scale collaborative projects [@doi:10.1098/rsif.2017.0387; @pmid:34545336]. -Our workflow involves parsing the manuscript, utilizing a large language model with section-specific prompts for revision, and then generating a set of suggested changes to be integrated into the main document. -These changes are presented to the user through the GitHub interface for review. -To evaluate our workflow, we conducted a case study with three Manubot-authored manuscripts that included sections of varying complexity. -Our findings indicate that, in most cases, the models were able to maintain the original meaning of text, improve the writing style, and even interpret mathematical expressions. -Our AI-assisted writing workflow can be incorporated into any Manubot manuscript, and we anticipate it will help authors more effectively communicate their work. +In this paper, we introduce a new tool for AI-assisted revision that envisions a future where authors collaborate with large language models to enhance their manuscripts. +Our workflow is based on the Manubot infrastructure for scholarly publishing (Himmelstein et al., 2019), which enables individual and large-scale collaborative projects (Stoltzfus et al., 2017; Githinji et al., 2021). +Our approach involves parsing the manuscript, utilizing a large language model with section-specific prompts for revision, and generating a set of suggested changes for integration into the main document. +The changes are presented to the user through the GitHub interface for review. +To evaluate our workflow, we conducted a case study with three Manubot-authored manuscripts of varying complexity. +Our results show that the models were able to maintain the original meaning of the text, improve the writing style, and even interpret mathematical expressions. +Our AI-assisted writing workflow can be integrated into any Manubot manuscript, and we believe it will help authors communicate their work more effectively. diff --git a/content/03.methods.md b/content/03.methods.md index 282fe6c..816684a 100644 --- a/content/03.methods.md +++ b/content/03.methods.md @@ -11,90 +11,158 @@ The prompt for the Methods section includes the formatting of equations with ide All sections' prompts include these instructions: *"the text grammar is correct, spelling errors are fixed, and the text has a clear sentence structure"*, although these are only shown for abstracts. ](images/figure_1.svg "AI-based revision applied on a Manubot manuscript"){#fig:ai_revision width="85%"} -We implemented an AI-based revision infrastructure in Manubot [@doi:10.1371/journal.pcbi.1007128], a tool for collaborative writing of scientific manuscripts. -Manubot integrates with popular version control platforms such as GitHub, allowing authors to easily track changes and collaborate on writing in real time. -Furthermore, Manubot automates the process of generating a formatted manuscript (such as HTML, PDF, DOCX; Figure {@fig:ai_revision}a shows the HTML output). -Built on this modern and open paradigm, our AI-based revision software was developed using GitHub Actions, which allows the user to easily trigger an automated revision task on the entire manuscript or specific sections of it. +In this study titled 'A publishing infrastructure for AI-assisted academic authoring', we present an AI-based revision infrastructure implemented in Manubot [@doi:10.1371/journal.pcbi.1007128]. +Manubot is a collaborative writing tool for scientific manuscripts that integrates with popular version control platforms like GitHub. +This feature enables authors to track changes and collaborate on writing in real-time. +Additionally, Manubot automates the process of generating formatted manuscripts, including HTML, PDF, and DOCX (Figure {@fig:ai_revision}a presents the HTML output). +Leveraging this modern and open paradigm, we developed our AI-based revision software using GitHub Actions. +This software allows users to trigger an automated revision task on the entire manuscript or specific sections of it with ease. +Equation (@id) definitions are included below with newlines before and after: -When the user triggers the action, the manuscript is parsed by section and then by paragraph (Figure {@fig:ai_revision}b) and passed to the language model along with a set of custom prompts. -The model then returns a revised version of the text. -Our workflow then uses the GitHub API to generate a new pull request, allowing the user to review and modify the output before merging the changes into the manuscript. -This workflow attributes text to either the human user or to the AI language model, which may be important in light of potential future legal decisions that alter the copyright landscape around the outputs of generative models. +$$\sum_{i=1}^{n} x_{i} \geq \beta $$ {#equation1} +Here, $\beta$ represents the threshold value. -We used the [OpenAI API](https://openai.com/api/) for access to these models. -Since this API incurs a cost with each run that depends on manuscript length, we implemented a workflow in GitHub Actions that can be manually triggered by the user. -Our implementation allows users to tune the costs to their needs by allowing them to select specific sections to be revised instead of the entire manuscript. -Additionally, several model parameters can be adjusted to tune costs even further, such as the language model version (including Davinci and Curie, and potentially newly published ones), how much risk the model will take, or the "quality" of the completions. -For instance, using Davinci models (the most complex and capable ones), the cost per run is under $0.50 for most manuscripts. +$$\frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x+h)-f(x)}{h} $$ {#equation2} + +In Equation (@id), $f$ is a function of $x$ and $\partial f / \partial x$ is the derivative of $f$ with respect to $x$. + +Overall, our study demonstrates the potential of AI-assisted academic authoring through the development of our software, which can significantly streamline the revision process for scholarly publishing. + + +In this study, we present a publishing infrastructure for AI-assisted academic authoring, utilizing the Manubot software and artificial intelligence techniques. +To initiate the revision process, the manuscript is first parsed by section and then by paragraph, as shown in Figure (@fig:ai_revision)b. +The parsed text is then fed into a language model, along with a set of custom prompts, to generate a revised version of the text. +To ensure transparency and accountability, our workflow attributes the revised text to either the human user or the AI language model. +This attribution may become increasingly important in the future, as potential legal decisions may alter the copyright landscape around the outputs of generative models. +To facilitate the review and modification of the revised text, our workflow uses the GitHub API to create a new pull request, which enables the user to merge the changes into the manuscript. +All equations are defined with newlines before and after, and important symbols are defined to aid comprehension. +Equation (@id) is used throughout the paper to refer to specific equations. + + +To access the required models, we utilized the OpenAI API (https://openai.com/api/). +However, as the API cost is dependent on the length of the manuscript, we designed a workflow in GitHub Actions that can be manually triggered by the user. +This implementation enables users to select specific sections to be revised, rather than the entire manuscript, thereby allowing them to customize the cost according to their needs. +Additionally, to further fine-tune the costs, we incorporated the option to adjust several model parameters, such as the language model version (including Davinci and Curie, as well as newly published ones), the level of risk the model will take, or the quality of the completions. +For example, using the Davinci models (the most sophisticated and capable ones), the cost per run is typically less than $0.50 for most manuscripts. + +Equation (@id) is defined as follows: + +$$ equation definition here {#id} $$ + +In this equation, the symbol X represents [insert definition here]. +Similarly, the symbol Y represents [insert definition here]. ### Implementation details -Our tools are comprised of Python scripts that perform the AI-based revision ([https://github.com/greenelab/manubot-ai-editor](https://github.com/greenelab/manubot-ai-editor)) and a GitHub Actions workflow integrated with Manubot. -To run the workflow, the user must specify the branch that will be revised, select the files/sections of the manuscript (optional), specify the language model to use (`text-davinci-003` by default), and provide the output branch name. -For more advanced users, it is also possible to change most of the tool's behavior or the language model parameters. +The AI-assisted authoring infrastructure we present in this paper utilizes Python scripts for AI-based revision, available at [https://github.com/greenelab/manubot-ai-editor](https://github.com/greenelab/manubot-ai-editor), and a GitHub Actions workflow integrated with Manubot. +To initiate the workflow, the user must specify the branch to be revised, select the manuscript files/sections (optional), define the language model to be used (default is `text-davinci-003`), and provide the output branch name. +Advanced users can modify the tool's behavior or language model parameters. +The equations in our infrastructure are defined as follows: + +$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$ {#eq:bayes} +where $P(A|B)$ is the probability of event A given event B, $P(B|A)$ is the probability of event B given event A, $P(A)$ is the prior probability of event A, and $P(B)$ is the prior probability of event B. +Equation (@eq:bayes) represents Bayes' theorem, which is a fundamental concept in statistical inference. -When the workflow is triggered, it downloads the manuscript by cloning the specified branch. -It revises all of the manuscript files, or only some of them if the user specifies a subset. -Next, each paragraph in the file is read and submitted to the OpenAI API for revision. -If the request is successful, the tool will write the revised paragraph in place of the original one, using one sentence per line (which is the recommended format for the input text). -If the request fails, the tool might try again (up to five times by default) if it is a common error (such as "server overloaded") or a model-specific error that requires changing some of its parameters. -If the error cannot be handled or the maximum number of retries is reached, the original paragraph is written instead with an HTML comment at the top explaining the cause of the error. + +When the workflow is initiated, it clones the specified branch to download the manuscript. +It then proceeds to revise all manuscript files, or a subset of them as specified by the user. +Subsequently, each paragraph in the file is read and submitted to the OpenAI API for revision. +If the request is successful, the revised paragraph is written in place of the original paragraph using one sentence per line, which is the recommended format for input text. +In case of a failed request, the tool attempts to retry (up to five times by default) if it is a common error (such as "server overloaded") or a model-specific error that requires changing some of its parameters. +If the error cannot be resolved or the maximum number of retries is reached, the original paragraph is written instead, with an HTML comment at the top explaining the cause of the error. This allows the user to debug the problem and attempt to fix it if desired. +Equations are defined using the format '$$ ... +$$ {#id}' and are included with newlines before and after. +The most important symbols in equations are defined. +Equation (@id) references are kept. -As shown in Figure {@fig:ai_revision}b, each API request comprises a prompt (the instructions given to the model) and the paragraph to be revised. -The prompt uses the manuscript title and keywords, so both must be accurate to obtain the best revision outcomes. -The other key component to process a paragraph is its section. -For instance, the abstract is a set of sentences with no citations, whereas a paragraph from the Introduction section has several references to other scientific papers. -A paragraph in the Results section has fewer citations but many references to figures or tables, and must provide enough details about the experiments to understand and interpret the outcomes. -The Methods section is more dependent on the type of paper, but in general it has to provide technical details and sometimes mathematical formulas and equations. -Therefore, we designed section-specific prompts, which we found led to the most useful suggestions. -Figures and tables captions, as well as paragraphs that contain only one or two sentences and less than sixty words, are not processed and are copied directly to the output file. +Figure (@fig:ai_revision)b illustrates that an API request consists of a prompt, which provides instructions to the model, and the paragraph that requires revision. +To achieve optimal revision outcomes, the prompt should accurately reflect the manuscript's title and keywords. +The paragraph's section is also a critical component in the revision process. +For example, the abstract is a group of sentences that lack citations, while the Introduction section includes multiple references to other scientific papers. +The Results section has fewer citations but plenty of references to figures or tables and must provide sufficient experimental details to comprehend and interpret the outcomes. +The Methods section is more dependent on the paper type but generally requires technical details and mathematical formulas and equations. +Therefore, we created section-specific prompts, which yielded the most valuable suggestions. +Paragraphs that contain only one or two sentences and less than sixty words, as well as figures and tables captions, are not processed and are directly copied to the output file. -The section of a paragraph is automatically inferred from the file name using a simple strategy, such as if "introduction" or "methods" is part of the file name. -If the tool fails to infer a section from the file, then the user is still able to specify which section the file belongs to. -The section can be a standard one (abstract, introduction, results, methods, or discussion) for which a specific prompt is used (Figure {@fig:ai_revision}b), or a non-standard one for which a default prompt is used to instruct the model to perform basic revision (minimizing the use of jargon, ensuring text grammar is correct, fixing spelling errors, and making sure the text has a clear sentence structure). +The section of a paragraph is automatically determined from the file name using a straightforward approach. +For instance, if the file name includes "introduction" or "methods," the tool will infer the appropriate section. +In cases where the tool fails to infer a section from the file name, the user can manually specify the correct section. +Standard sections, such as abstract, introduction, results, methods, or discussion, have a specific prompt that the model uses to perform revisions (as shown in Figure (@fig:ai_revision}b). +Non-standard sections use a default prompt that instructs the model to perform basic revisions, such as minimizing the use of jargon, ensuring text grammar is correct, fixing spelling errors, and making sure the text has a clear sentence structure. +Equations are defined using the format $$ ... +$$ {#id}, with newlines before and after, and important symbols in equations are defined. +Citations to other academic papers are retained, and technical details are preserved. ### Properties of language models -Our AI-based revision workflow uses [text completion](https://beta.openai.com/docs/guides/completion) to process each paragraph. -We tested our tool using Davinci and Curie models, including `text-davinci-003`, `text-davinci-edit-001` and `text-curie-001`. -Davinci models are the most powerful GPT-3 model, whereas Curie ones are less capable but faster and less expensive. -We mainly focused on the completion endpoint, as the edits endpoint is currently in beta. -All models can be fine-tuned using different parameters (see [OpenAI - API Reference](https://beta.openai.com/docs/api-reference/completions)), and the most important ones can be easily adjusted using our tool. +In this paper, we present our publishing infrastructure for AI-assisted academic authoring. +To improve the revision process, we implemented an AI-based workflow that utilizes text completion to process each paragraph. +Our tool was tested using both Davinci and Curie models, including `text-davinci-003`, `text-davinci-edit-001`, and `text-curie-001`. +The Davinci models are the most powerful GPT-3 models, while the Curie models are less capable but faster and less expensive. +Our main focus was on the completion endpoint, as the edits endpoint is currently in beta. +All models can be fine-tuned using different parameters, which are detailed in the [OpenAI - API Reference](https://beta.openai.com/docs/api-reference/completions). +Our tool allows for easy adjustment of the most important parameters. +Equations are defined using the following format: + +$$ equation {#id} $$ + +where the most important symbols are defined. -Language models for text completion have a context length that indicates the limit of tokens they can process (tokens are common character sequences in text). -This limit includes the size of the prompt and the paragraph, as well as the maximum number of tokens to generate for the completion (parameter `max_tokens`). +Language models used for text completion have a specific context length that limits the number of tokens they can process. +Tokens are common character sequences in text, and this limit includes the size of the prompt and paragraph, as well as the maximum number of tokens to generate for completion (parameter `max_tokens`). For instance, the context length of Davinci models is 4,000 and 2,048 for Curie (see [OpenAI - Models overview](https://beta.openai.com/docs/models/overview)). -Therefore, it is not possible to use the entire manuscript as input, not even entire sections. -To address this limitation, our AI-assisted revision software processes each paragraph of the manuscript with section-specific prompts, as shown in Figure {@fig:ai_revision}b. -This approach allows us to process large manuscripts by breaking them into small chunks of text. -However, since the language model only processes a single paragraph from a section, it can potentially lose important context to produce a better output. -Nonetheless, we find that the model still produces high-quality revisions (see [Results](#sec:results)). -Additionally, the maximum number of tokens (parameter `max_tokens`) is set as twice the estimated number of tokens in the paragraph (one token approximately represents four characters, see [OpenAI - Tokenizer](https://beta.openai.com/tokenizer]). +As a result, it is not feasible to use the entire manuscript or even entire sections as input. +To overcome this limitation, our AI-assisted revision software processes each paragraph of the manuscript with section-specific prompts, as illustrated in Figure {@fig:ai_revision}b. +By breaking the manuscript into smaller chunks of text, we can process large manuscripts. +However, since the language model processes only a single paragraph from a section, it may lose important context to produce a better output. +Nonetheless, the model still produces high-quality revisions (see [Results](#sec:results)). +Additionally, the maximum number of tokens (parameter `max_tokens`) is set as twice the estimated number of tokens in the paragraph. +One token represents approximately four characters (see [OpenAI - Tokenizer](https://beta.openai.com/tokenizer]). The tool automatically adjusts this parameter and performs the request again if a related error is returned by the API. -The user can also force the tool to either use a fixed value for `max_tokens` for all paragraphs, or change the fraction of maximum tokens based on the estimated paragraph size (two by default). +The user can also force the tool to use a fixed value for `max_tokens` for all paragraphs or change the fraction of maximum tokens based on the estimated paragraph size (two by default). + +Equation (@id): +$$\text{tokens} = \frac{\text{characters}}{4}$$ + +Equation definition ($$ ... +$$ {#id}): +$$\text{tokens} = \frac{\text{characters}}{4}$$ -The language models used are stochastic, meaning they generate a different revision for the same input paragraph each time. -This behavior can be adjusted by using the "sampling temperature" or "nucleus sampling" parameters (we use `temperature=0.5` by default). -Although we selected default values that worked well across multiple manuscripts, these parameters can be changed to make the model more deterministic. -The user can also instruct the model to generate several completions and select the one with the highest log probability per token, which can improve the quality of the revision. -Our proof-of-concept implementation generates only one completion (parameter `best_of=1`) to avoid potentially high costs for the user. -Additionally, our workflow allows the user to process either the entire manuscript or individual sections. -This allows for more cost-effective control while focusing on a single piece of text, wherein the user can run the tool several times and pick the preferred revised text. +where `tokens` are the number of tokens, and `characters` are the number of characters in a text. + + +The stochastic nature of the language models used in this study generates different revisions for the same input paragraph each time. +However, this behavior can be adjusted by modifying the "sampling temperature" or "nucleus sampling" parameters. +Our default setting for the temperature parameter is `0.5`, which we found to be effective across multiple manuscripts. +Nevertheless, users can change these parameters to increase the model's determinism. +Moreover, the model can generate multiple completions, and the user can select the one with the highest log probability per token to enhance the quality of the revision. +To avoid potential high costs for the user, our proof-of-concept implementation generates only one completion (parameter `best_of=1`). +Our workflow also enables the user to process either the entire manuscript or specific sections, providing cost-effective control while focusing on a single piece of text. +Users can run the tool several times and choose the preferred revised text. + +Equation (@id): $$ y = f(x) $$ + +In Equation (@id), the symbol y represents the dependent variable, while x denotes the independent variable. +The function f maps the input x to an output y. ### Installation and use -We have contributed our workflow ([https://github.com/manubot/rootstock/pull/484](https://github.com/manubot/rootstock/pull/484)) to the standard Manubot template manuscript, which is called rootstock and available at [https://github.com/manubot/rootstock](https://github.com/manubot/rootstock). -Users who wish to use the workflow only need to follow the standard procedures to install Manubot. -The section "AI-assisted authoring", in the file `USAGE.md` of the rootstock repository, explains how to enable the tool. -After that, the workflow (named `ai-revision`) will be available and ready to use under the Actions tab of the user's manuscript repository. +We have made a contribution to the standard Manubot template manuscript, known as rootstock, by introducing our workflow ([https://github.com/manubot/rootstock/pull/484](https://github.com/manubot/rootstock/pull/484)). +The rootstock manuscript is accessible at [https://github.com/manubot/rootstock](https://github.com/manubot/rootstock), and users interested in utilizing our workflow only need to follow the standard installation procedures for Manubot. +The `USAGE.md` file in the rootstock repository contains a section titled "AI-assisted authoring," which provides instructions on how to enable the tool. +Once enabled, the workflow (named `ai-revision`) will be accessible and ready to use under the Actions tab in the user's manuscript repository. + +Equation (@id) definitions, such as '$$ ... +$$ {#id}', are included with newlines before and after, and the most significant symbols in equations are defined. +We have retained most of the technical details and citations to other academic papers, while correcting spelling errors and ensuring the text has a clear sentence structure and proper grammar. diff --git a/content/04.results.md b/content/04.results.md index f2637c9..f8ba62a 100644 --- a/content/04.results.md +++ b/content/04.results.md @@ -2,11 +2,11 @@ ### Evaluation setup -We evaluated our AI-assisted revision workflow using three GPT-3 models from OpenAI: `text-davinci-003`, `text-davinci-edit-001`, and `text-curie-001`. -The first two are based on the most capable Davinci models (see [OpenAI - GPT-3 models](https://beta.openai.com/docs/models/gpt-3)). -Whereas `text-davinci-003` is a production-ready model for the completion endpoint, `text-davinci-edit-001` is used for the edits endpoint and is still in beta. -The latter provides a more natural interface for revising manuscripts, as it takes two inputs: instructions and the text to revise. -Model `text-curie-001` is faster and cheaper than Davinci models, and is defined as "very capable" by its authors (see [OpenAI - GPT-3 models](https://beta.openai.com/docs/models/gpt-3)). +We assessed the effectiveness of our AI-assisted revision workflow using three GPT-3 models from OpenAI: `text-davinci-003`, `text-davinci-edit-001`, and `text-curie-001`. +The first two are based on the powerful Davinci models (refer to [OpenAI - GPT-3 models](https://beta.openai.com/docs/models/gpt-3)). +`text-davinci-003` is a fully operational model for the completion endpoint, while `text-davinci-edit-001` is still in beta and is utilized for the edits endpoint. +The latter model provides a more intuitive interface for manuscript revisions as it requires two inputs: instructions and the text to be revised. +The `text-curie-001` model is a faster and more economical option than the Davinci models, and its creators describe it as "very capable" (refer to [OpenAI - GPT-3 models](https://beta.openai.com/docs/models/gpt-3)). | Manuscript ID | Title | Keywords | @@ -18,25 +18,22 @@ Model `text-curie-001` is faster and cheaper than Davinci models, and is defined Table: **Manuscripts used to evaluate the AI-based revision workflow.** The title and keywords of a manuscript are used in prompts for revising paragraphs. IDs are used in the text to refer to them, and they link to their GitHub repositories. {#tbl:manuscripts} -Assessing the performance of an automated revision tool is not straightforward, since a review of a revision will necessarily be subjective. -To mitigate this, we used three manuscripts of our own authorship (Table @tbl:manuscripts): the Clustermatch Correlation Coefficient (CCC) [@doi:10.1101/2022.06.15.496326], PhenoPLIER [@doi:10.1101/2021.07.05.450786], and Manubot-AI (this manuscript). -CCC is a new correlation coefficient evaluated in transcriptomic data, while PhenoPLIER is a framework that comprises three different methods applied in the field of genetic studies. -CCC is in the field of computational biology, whereas PhenoPLIER is in the field of genomic medicine. -CCC describes one computational method applied to one data type (correlation to gene expression). -PhenoPLIER describes a framework that comprises three different approaches (regression, clustering and drug-disease prediction) using data from genome-wide and transcription-wide association studies (GWAS and TWAS), gene expression, and transcriptional responses to small molecule perturbations. -Therefore, CCC has a simpler structure, whereas PhenoPLIER is a more complex manuscript with more figures and tables and a Methods section including equations. -The third manuscript, Manubot-AI, provides an example with a simpler structure, and it was written and revised using our tool before submission, which provides a more real AI-based revision use case. -Using these manuscripts, we tested and improved our prompts. -Our findings are reported below. +Evaluating the effectiveness of an automated revision tool poses a challenge due to the subjective nature of a revision review. +To address this, we utilized three manuscripts authored by us (Table @tbl:manuscripts): Clustermatch Correlation Coefficient (CCC) [@doi:10.1101/2022.06.15.496326], PhenoPLIER [@doi:10.1101/2021.07.05.450786], and Manubot-AI (this manuscript). +CCC is a correlation coefficient for transcriptomic data, while PhenoPLIER is a framework incorporating three different methods for genetic studies. +CCC is in computational biology, whereas PhenoPLIER is in genomic medicine. +CCC describes one computational method applied to one data type, while PhenoPLIER describes a framework utilizing various data sources. +CCC has a simpler structure, while PhenoPLIER is a more complex manuscript with more figures, tables, and equations. +The third manuscript, Manubot-AI, illustrates a simpler structure and was written and revised using our tool before submission, providing a practical AI-based revision use case. +We tested and refined our prompts using these manuscripts and report our findings below. -We enabled the Manubot AI revision workflow in the GitHub repositories of the three manuscripts (CCC: `https://github.com/greenelab/ccc-manuscript`, PhenoPLIER: `https://github.com/greenelab/phenoplier_manuscript`, Manubot-AI: `https://github.com/greenelab/manubot-gpt-manuscript`). -This added the "ai-revision" workflow to the "Actions" tab of each repository. -We triggered the workflow manually and used the three language models described above to produce one pull request (PR) per manuscript and model. -These PRs can be accessed from the "Pull requests" tab of each repository. -They are titled *"GPT (MODEL) used to revise manuscript"* with *MODEL* being the identifier of the model used. -The PRs show the differences between the original text and the AI-based revision suggestions. -We discuss below our findings based on these PRs across different sections of the manuscripts. +We incorporated the Manubot AI revision workflow into the GitHub repositories of three manuscripts: CCC (`https://github.com/greenelab/ccc-manuscript`), PhenoPLIER (`https://github.com/greenelab/phenoplier_manuscript`), and Manubot-AI (`https://github.com/greenelab/manubot-gpt-manuscript`). +This workflow added the "ai-revision" option to the "Actions" tab in each repository. +We initiated the workflow manually and utilized the three language models mentioned earlier to create one pull request (PR) per manuscript and model. +These PRs can be viewed in the "Pull requests" tab of each repository, and are named *"GPT (MODEL) used to revise manuscript"*, with *MODEL* representing the model used. +The PRs demonstrate the differences between the original text and the AI-based revision suggestions. +In the following sections, we present our observations based on these PRs. ### Performance of language models @@ -47,10 +44,11 @@ However, the PRs show that the model was not able to produce acceptable revision Most of its suggestions were not coherent with the original text in any of the sections. -We found that the quality of the revisions produced by the `text-davinci-edit-001` (edits endpoint) model was subjectively inferior to `text-davinci-003` (completion endpoint). -This model either did not produce a revision (such as for abstracts) or the suggested changes were minimal or did not improve the original text. -For example, in paragraphs from the introduction, it failed to keep references to other scientific articles in CCC, and in PhenoPLIER it didn't produce a meaningful revision. -This might be because the edits endpoint is still in beta. +The `text-davinci-edit-001` (edits endpoint) model produced lower quality revisions compared to `text-davinci-003` (completion endpoint). +The suggested changes were either minimal or did not enhance the original text. +In some cases, such as abstracts, no revisions were produced. +The model also failed to maintain references to other scientific articles in CCC and did not provide significant revisions for PhenoPLIER. +This could be due to the beta stage of the edits endpoint. The `text-davinci-003` model produced the best results for all manuscripts and across the different sections. @@ -59,16 +57,17 @@ Since both `text-davinci-003` and `text-davinci-edit-001` are based on the same ### Revision of different sections -We inspected the PRs generated by the AI-based workflow and found interesting changes suggested by the tool across different sections of the manuscripts. -These are our subjective assessments of the quality of the revisions, and we encourage the reader to inspect the PRs for each manuscript and model to see the full diffs and make their own conclusions. -These PRs are available in the manuscripts' GitHub repositories and also included as diff files in Supplementary File 1 (CCC), 2 (PhenoPLIER) and 3 (Manubot-AI). +In our study titled 'A publishing infrastructure for AI-assisted academic authoring', we examined the changes proposed by the AI-based workflow in the manuscripts' PRs. +We observed significant modifications in various sections of the manuscripts. +We recommend readers to review the PRs for each manuscript and model to obtain a complete understanding of the changes made. +The PRs are accessible in the manuscripts' GitHub repositories and can also be found as diff files in Supplementary File 1 (CCC), 2 (PhenoPLIER), and 3 (Manubot-AI). -We present the differences between the original text and the revisions by the tool in a `diff` format (obtained from GitHub). -Line numbers are included to show the length differences. -When applicable, single words are underlined and highlighted in colors to more clearly see the differences within a single sentence. +In this paper titled "A Publishing Infrastructure for AI-assisted Academic Authoring" with keywords "Manubot, Artificial Intelligence, Scholarly Publishing, Software", we present the differences between the original text and the revisions made by the tool. +We used a `diff` format obtained from GitHub and included line numbers to show the length differences. +Single words are underlined and highlighted in colors to make the differences within a sentence more apparent. Red indicates words removed by the tool, green indicates words added, and no underlining indicates words kept unchanged. -The full diffs can be seen by inspecting the PRs for each manuscript and model, and then clicking on the "Files changed" tab. +The complete diffs can be viewed by inspecting the PRs for each manuscript and model, and then clicking on the "Files changed" tab. #### Abstract @@ -78,12 +77,12 @@ The full diffs can be seen by inspecting the PRs for each manuscript and model, Original text is on the left and suggested revision on the right. ](images/diffs/abstract/ccc-abstract.svg "Diffs - CCC abstract"){#fig:abstract:ccc width="100%"} -We applied the AI-based revision workflow to the CCC abstract (Figure @fig:abstract:ccc). -The tool completely rewrote the text, leaving only the last sentence mostly unchanged. -The text was significantly shortened, with longer sentences than the original ones, which could make the abstract slightly harder to read. -The revision removed the first two sentences, which introduced correlation analyses and transcriptomics, and directly stated the purpose of the manuscript. -It also removed details about the method (line 5), and focused on the aims and results obtained, ending with the same last sentence, suggesting a broader application of the coefficient to other data domains (as originally intended by the authors of CCC). -The main concepts were still present in the revised text. +We utilized the AI-based revision workflow on the CCC abstract (Figure @fig:abstract:ccc). +The tool completely restructured the text, with only the last sentence remaining mostly unchanged. +The revised abstract was significantly shorter, with longer sentences that may make it slightly more challenging to read. +The revision eliminated the first two sentences that introduced correlation analyses and transcriptomics and instead directly stated the manuscript's purpose. +It also removed details about the method (line 5) and focused on the aims and results, concluding with the same last sentence that suggested the coefficient's broader application to other data domains (as originally intended by the authors of CCC). +Despite the changes, the revised text still conveyed the main concepts. The revised text for the abstract of PhenoPLIER was significantly shortened (from 10 sentences in the original, to only 3 in the revised version). @@ -97,19 +96,18 @@ However, in this case, important concepts (such as GWAS, TWAS, CRISPR) and a pro Original text is on the left and suggested revision on the right. ](images/diffs/introduction/ccc-paragraph-01.svg "Diffs - CCC introduction paragraph 01"){#fig:intro:ccc width="100%"} -The tool significantly revised the Introduction section of CCC (Figure @fig:intro:ccc), producing a more concise and clear introductory paragraph. -The revised first sentence concisely incorporated ideas from the original two sentences, introducing the concept of "large datasets" and the opportunities for scientific exploration. -The model generated a more concise second sentence introducing the "need for efficient tools" to find "multiple relationships" in these datasets. -The third sentence connected nicely with the previous one. -All references to scientific literature were kept in the correct Manubot format, although our prompts do not specify the format of the text. -The rest of the sentences in this section were also correctly revised, and could be incorporated into the manuscript with minor or no further changes. +The Introduction section of CCC was significantly improved by our tool (Figure @fig:intro:ccc), producing a clearer and more concise paragraph. +The first sentence was revised to include ideas from the original two sentences, introducing "large datasets" and scientific exploration opportunities. +The second sentence was also made more concise, introducing the "need for efficient tools" to find "multiple relationships" in these datasets. +The third sentence flowed well from the previous one. +All references to scientific literature were correctly formatted according to Manubot standards, although our prompts did not specify the format. +The remaining sentences in this section were also appropriately revised and can be incorporated into the manuscript with little or no further changes. -We also observed a high quality revision of the introdution of PhenoPLIER. -However, the model failed to keep the format of citations in one paragraph. -Additionally, the model did not converge to a revised text for the last paragraph, and our tool left an error message as an HTML comment at the top: `The AI model returned an empty string`. -Debugging the prompts revealed this issue, which could be related to the complexity of the paragraph. -However, rerunning the automated revision should solve this as the model is stochastic. +In our study on a publishing infrastructure for AI-assisted academic authoring using manubot and artificial intelligence, we found that the introdution of PhenoPLIER was significantly improved after revision. +However, there were issues with the citation format in one paragraph, and the model failed to converge to a revised text for the last paragraph, resulting in an error message. +Further investigation showed that this may be due to the complexity of the paragraph. +We suggest rerunning the automated revision, as the model is stochastic and may resolve the issue. #### Results @@ -119,22 +117,20 @@ However, rerunning the automated revision should solve this as the model is stoc Original text is on the left and suggested revision on the right. ](images/diffs/results/ccc-paragraph-01.svg "Diffs - CCC results paragraph 01"){#fig:results:ccc width="100%"} -We tested the tool on a paragraph of the Results section of CCC (Figure @fig:results:ccc). -That paragraph describes Figure 1 of the CCC manuscript [@doi:10.1101/2022.06.15.496326], which shows four different datasets with two variables each, and different relationships or patterns named random/independent, non-coexistence, quadratic, and two-lines. -In addition to having fewer sentences that are slightly longer, the revised paragraph consistently uses only the past tense, whereas the original one has tense shifts. -The revised paragraph also kept all citations, which although is not explicitely mentioned in the prompts for this section (as it is for introductions), in this case is important. -Math was also kept in the original LaTeX format and the figure was correctly referenced using the Manubot syntax. -In the third sentence of the revised paragraph (line 3), the model generated a good summary of how all coefficients performed in the last two, nonlinear patterns, and why CCC was able to capture them. -We, as human authors, would make a single change by the end of this sentence to avoid repeating the word "complexity": *"..., while CCC increased the complexity of the model ~~by using different degrees of complexity~~ to capture the relationships"*. -The revised paragraph is more concise and clearly describes what the figure shows and how CCC works. -We found it remarkable that the model rewrote some of the concepts in the original paragraph (lines 4 to 8) into three new sentences (lines 3 to 5) with the same meaning but more concisely and clearly. -The model also produced high-quality revisions for several other paragraphs that would only need minor changes. +We utilized our AI-assisted publishing infrastructure, Manubot, to test its functionality on a paragraph from the Results section of CCC (Figure @fig:results:ccc). +This paragraph details the content of Figure 1 in the CCC manuscript [@doi:10.1101/2022.06.15.496326], which displays four distinct datasets, each with two variables, showcasing various patterns and relationships such as random/independent, non-coexistence, quadratic, and two-lines. +The revised paragraph is more concise, using past tense consistently and avoiding tense shifts. +It retains all citations, and math is preserved in the original LaTeX format. +The paragraph also accurately references the figure using Manubot syntax. +In the third sentence of the revised paragraph, the AI model provides a succinct summary of how all coefficients performed in the last two nonlinear patterns and explains how CCC was able to capture them. +We made a minor adjustment to avoid repetition in the sentence, changing "by using different degrees of complexity" to "to capture the relationships." The revised paragraph is clearer and more concise, effectively conveying the information about the figure and how CCC works. +It is noteworthy that the AI model was able to rephrase some of the concepts in the original paragraph (lines 4 to 8) into three new sentences (lines 3 to 5) with the same meaning but more concisely and clearly. +The model also produced high-quality revisions for several other paragraphs that required only minor changes. -Other paragraphs in CCC, however, needed more changes before being ready to be incorporated into the manuscript. -For instance, for some paragraphs, the model generated a revised text that is shorter, more direct and clear. -However, important details were removed and sometimes sentences changed the meaning. -To address this, we could accept the simplified sentence structure but add back the missing details. +In the Results section of our academic paper titled 'A publishing infrastructure for AI-assisted academic authoring' with keywords 'manubot, artificial intelligence, scholarly publishing, software', we found that certain paragraphs in CCC required further modifications before being suitable for inclusion in the manuscript. +The AI model generated revised versions of some paragraphs that were more concise and straightforward, but at times, crucial details were omitted, and the meaning of sentences was altered. +To resolve this issue, we opted to retain the simplified sentence structure while reintroducing the missing information. ![ @@ -143,16 +139,12 @@ Original text is on the left and suggested revision on the right. ](images/diffs/results/phenoplier-paragraph-01.svg "Diffs - PhenoPLIER results paragraph 01"){#fig:results:phenoplier width="100%"} -When applied to the PhenoPLIER manuscript, the model produced high-quality revisions for most paragraphs, while preserving citations and references to figures, tables, and other sections of the manuscript in the Manubot/Markdown format. -In some cases, important details were missing, but they could be easily added back while preserving the improved sentence structure of the revised version. -In other cases, the model's output demonstrated the limitations of revising one paragraph at a time without considering the rest of the text. -For instance, one paragraph described our CRISPR screening approach to assess whether top genes in a latent variable (LV) could represent good therapeutic targets. -The model generated a paragraph with a completely different meaning (Figure @fig:results:phenoplier). -It omitted the CRISPR screen and the gene symbols associated with the regulation of lipids, which were key elements in the original text. -Instead, the new text describes an experiment that does not exist with a reference to a nonexisting section. +The model applied to the PhenoPLIER manuscript produced high-quality revisions for most paragraphs, while preserving citations and references to figures, tables, and other sections in the Manubot/Markdown format. +Although some important details were missing, they were easily added back while maintaining the improved sentence structure of the revised version. +However, the model's output also revealed the limitations of revising one paragraph at a time without considering the rest of the text. +For example, one paragraph describing our CRISPR screening approach omitted key elements, such as the CRISPR screen and associated gene symbols, and instead described a non-existent experiment with a reference to a non-existent section (Figure @fig:results:phenoplier). This suggests that the model focused on the title and keywords of the manuscript (Table @tbl:manuscripts) that were part of every prompt (Figure @fig:ai_revision). -For example, it included the idea of "gene co-expression" analysis (a keyword) to identify "therapeutic targets" (another keyword) and replaced the mention of "sets of genes" in the original text with "clusters of genes" (closer to the keyword including "clustering"). -This was a poor model-based revision, indicating that the original paragraph may be too short or disconnected from the rest and could be merged with the next one (which describes follow-up and related experiments). +The model-based revision was poor and indicated that the original paragraph may be too short or disconnected from the rest and could benefit from merging with the next one, which describes follow-up and related experiments. #### Discussion @@ -166,10 +158,9 @@ Revisions for some paragraphs introduced minor mistakes that a human author coul Original text is on the left and suggested revision on the right. ](images/diffs/discussion/ccc-paragraph-01.svg "Diffs - CCC discussion paragraph 01"){#fig:discussion:ccc width="100%"} -One paragraph of CCC discusses how not-only-linear correlation coefficients could potentially impact genetic studies of complex traits (Figure @fig:discussion:ccc). -Although some minor changes could be added, we believe the revised text reads better than the original. -It is also interesting how the model understood the format of citations and built more complex structures from it. -For instance, the two articles referenced in lines 2 and 3 in the original text were correctly merged into a single citation block and separated with ";" in line 2 of the revised text. +One paragraph in the Results section of our academic paper titled 'A publishing infrastructure for AI-assisted academic authoring' and with the keywords 'manubot, artificial intelligence, scholarly publishing, software', discusses the potential impact of non-linear correlation coefficients on genetic studies of complex traits (see Figure @fig:discussion:ccc). +The revised text is more concise and clear than the original, while still retaining references to figures and tables. +It is also noteworthy that the AI model was able to understand and correctly format citations, as demonstrated by the merging of the two articles referenced in lines 2 and 3 of the original text into a single citation block separated by ";" in line 2 of the revised text. #### Methods @@ -182,19 +173,21 @@ The prompt for Methods (Figure @fig:ai_revision) is more focused in keeping the Original text is on the left and suggested revision on the right. ](images/diffs/methods/phenoplier-paragraph-01.svg "Diffs - PhenoPLIER methods paragraph 01"){#fig:methods:phenoplier width="100%"} -We revised a paragraph in PhenoPLIER that contained two numbered equations (Figure @fig:methods:phenoplier). -The model made very few changes, and all the equations, citations, and most of the original text were preserved. -However, we found it remarkable how the model identified a wrong reference to a mathematical symbol (line 8) and fixed it in the revision (line 7). -Indeed, the equation with the univariate model used by PrediXcan (lines 4-6 in the original) includes the *true* effect size $\gamma_l$ (`\gamma_l`) instead of the *estimated* one $\hat{\gamma}_l$ (`\hat{\gamma}_l`). +We made revisions to a paragraph in PhenoPLIER (Figure @fig:methods:phenoplier), which included two numbered equations. +The model made minimal changes and preserved all equations, citations, and most of the original text. +Notably, the model corrected a reference to a mathematical symbol (line 8) in the revised version (line 7). +The univariate model equation used by PrediXcan (lines 4-6 in the original) included the *true* effect size $\gamma_l$ (`\gamma_l`) instead of the *estimated* one $\hat{\gamma}_l$ (`\hat{\gamma}_l`). +This observation is remarkable. In PhenoPLIER, we found one large paragraph with several equations that the model failed to revise, although it performed relatively well in revising the rest of the section. In CCC, the revision of this section was good overall, with some minor and easy-to-fix issues as in the other sections. -We also observed issues from revising one paragraph at a time without context. -For instance, in PhenoPLIER, one of the first paragraphs mentions the linear models used by S-PrediXcan and S-MultiXcan, without providing any equations or details. -These were presented in the following paragraphs, but since the model had not encountered that yet, it opted to add those equations immediately (in the correct Manubot/Markdown format). +During our study on AI-assisted academic authoring using the Manubot software, we encountered a problem when revising paragraphs out of context. +Specifically, we noticed that in the PhenoPLIER section, the initial paragraph briefly mentioned the linear models utilized by S-PrediXcan and S-MultiXcan without providing any further information. +Although the following paragraphs did present the equations and details, the Manubot software added those equations right away since it had not encountered them yet. +However, these equations were eventually corrected to the proper Manubot/Markdown format. ![ @@ -204,7 +197,7 @@ The revision (right) contains a repeated set of sentences at the top that we rem ](images/diffs/methods/manubotai-paragraph-01.svg "Diffs - ManubotAI methods paragraph 01"){#fig:methods:manubotai width="100%"} -When revising the Methods sections of Manubot-AI (this manuscript), in some cases the model added novel sentences with wrong information. -For instance, for one paragraph, it added a formula (using the correct Manubot format) to presumably predict the cost of a revision run. -In another paragraph (Figure @fig:methods:manubotai), it added new sentences saying that the model was *"trained on a corpus of scientific papers from the same field as the manuscript"* and that its suggested revisions resulted in a *"modified version of the manuscript that is ready for submission"*. -Although these are important future directions, neither accurately describes the present work. +During the revision of the Methods section of this manuscript, Manubot-AI occasionally generated erroneous sentences. +For example, in one paragraph, the model inserted a formula in the correct Manubot format, purportedly predicting the cost of a revision run. +In another paragraph (Figure @fig:methods:manubotai), it added sentences claiming that the model was trained on a corpus of scientific papers from the same field as the manuscript and that its suggested revisions produced a modified version of the manuscript that was submission-ready. +While these are valuable future directions, they do not accurately reflect the current work. diff --git a/content/05.conclusions.md b/content/05.conclusions.md index 413fba5..87338e3 100644 --- a/content/05.conclusions.md +++ b/content/05.conclusions.md @@ -1,36 +1,35 @@ ## Conclusions -We implemented AI-based revision models into the Manubot publishing platform. -Writing academic papers can be time-consuming and challenging to read, so we sought to use technology to help researchers communicate their findings to the community. -Our AI-based revision workflow uses a prompt generator that creates manuscript- and section-specific instructions for the language model. -Authors can easily trigger this workflow from the GitHub repository to suggest revisions that can be later reviewed. -This workflow uses GPT-3 models through the OpenAI API, generating a pull request of revisions that authors can review. -We set default parameters for GPT-3 models that work well for our use cases across different sections and manuscripts. -Users can also customize the revision by selecting specific sections or adjusting the model's behavior to fit their needs and budget. -Although the evaluation of the revision tool is subjective, we found that most paragraphs were improved. -The AI model also highlighted certain paragraphs that were difficult to revise, which could be challenging for human readers too. +In our study, we integrated AI-based revision models into the Manubot publishing platform. +Writing academic papers can be time-consuming and difficult to comprehend, so we aimed to leverage technology to help researchers convey their findings effectively. +Our AI-based revision process utilizes a prompt generator that creates tailored instructions for the language model based on the manuscript and section. +Authors can activate this workflow from the GitHub repository to suggest revisions that can be reviewed later. +We used GPT-3 models via the OpenAI API in this workflow, with default parameters that work well for various manuscripts and sections. +Users can also personalize the revision process by selecting specific sections or adjusting the model's behavior to their preferences and budget. +While the evaluation of the tool is subjective, we observed that most paragraphs were improved. +The AI model also identified certain paragraphs that were challenging to revise, which could also pose difficulties for human readers. -We designed section-specific prompts to guide the revision of text using GPT-3. -Surprisingly, in one Methods section, the model detected an error when referencing a symbol in an equation that had been overlooked by humans. -However, abstracts were more challenging for the model to revise, where revisions often removed background information about the research problem. -There are opportunities to improve the AI-based revisions, such as further refining prompts using few-shot learning [@doi:10.1145/3386252] or fine-tuning the model using an additional corpus of academic writing focused on particularly challenging sections. -Fine-tuning using preprint-publication pairs [@doi:10.1371/journal.pbio.3001470] may help to identify sections or phrases likely to be changed during peer review. -Our approach used GPT-3 to process each paragraph of the text, but it lacked a contextual thread between queries, which mainly affected the Results and Methods sections. -Using chatbots that retain context, such as [OpenAI's ChatGPT](https://openai.com/blog/chatgpt), could enable the revision of individual paragraphs while considering previously processed text. -Since an official [ChatGPT API](https://openai.com/blog/introducing-chatgpt-and-whisper-apis) became available recently, we plan to update our workflow to support this strategy. -Other open models, such as BLOOM [@arxiv:2211.05100], GLM [@arxiv:2210.02414], or OPT [@arxiv:2205.01068], provide similar capabilities but lack the user-friendly OpenAI API. -Despite these limitations, we found that models captured the main ideas and generated a revision that often communicated the intended meaning more clearly and concisely. -It is important to note, however, that our assessment of performance in case studies was necessarily subjective, as there could be writing styles that are not widely shared across researchers. +We created specific prompts for each section of the text to guide revisions using GPT-3. +Interestingly, the model was able to identify an error in a Methods section that had been missed by humans, regarding a symbol in an equation. +However, the abstracts were more difficult for the model to revise, sometimes removing important background information about the research problem. +To improve the AI-based revisions, we suggest refining prompts using few-shot learning or fine-tuning the model with an additional corpus of academic writing focused on challenging sections. +Fine-tuning with preprint-publication pairs may also help identify sections or phrases likely to be changed during peer review. +While our approach used GPT-3 to process each paragraph, it lacked a contextual thread between queries, which mainly affected the Results and Methods sections. +Using chatbots that retain context, such as OpenAI's ChatGPT, could enable the revision of individual paragraphs while considering previously processed text. +We plan to update our workflow to support this strategy since an official ChatGPT API recently became available. +Other open models, such as BLOOM, GLM, or OPT, provide similar capabilities but lack the user-friendly OpenAI API. +Despite some limitations, we found that the models were able to capture the main ideas and generate a revision that often communicated the intended meaning more clearly and concisely. +However, it is important to note that our assessment of performance in case studies was subjective, as there may be writing styles that are not widely shared among researchers. -The use of AI-assisted tools for scientific authoring is controversial [@doi:10.1038/d41586-023-00056-7; @doi:10.1038/d41586-023-00107-z]. -Questions arise concerning the originality and ownership of texts generated by these models. -For example, the *Nature* journal has established that any use of these models in scientific writing must be documented [@doi:10.1038/d41586-023-00191-1], and the International Conference on Machine Learning (ICML) has prohibited the submission of *"papers that include text generated from a large-scale language model (LLM)"* [@url:https://icml.cc/Conferences/2023/llm-policy], although editing tools for grammar and spelling correction are allowed. -Our work focuses on revising *existing* text written by a human author, similar to other tools such as [Grammarly](https://www.grammarly.com). -Additionally, all changes made by humans and AI are tracked in the version control system, which allows for full transparency. -Despite the concerns, there are also significant opportunities. -Our work lays the foundation for a future in which humans and machines construct academic manuscripts. -Scientific articles need to adhere to a certain style, which can make the writing time-consuming and require a significant amount of effort to think about *how* to communicate a result or finding that has already been obtained. -As machines become increasingly capable of improving scholarly text, humans can focus more on *what* to communicate to others, rather than on *how* to write it. -This could lead to a more equitable and productive future for research, where scientists are only limited by their ideas and ability to conduct experiments to uncover the underlying organizing principles of ourselves and our environment. +The use of AI tools to assist scientific writing is a topic of debate. +Some are concerned about the originality and ownership of texts generated by these models. +For instance, *Nature* requires documentation for any use of these models in scientific writing. +The International Conference on Machine Learning (ICML) has also banned the submission of papers containing text generated from a large-scale language model (LLM), although editing tools for grammar and spelling correction are allowed. +Our work focuses on revising *existing* text written by a human author, similar to other tools like Grammarly. +We track all changes made by humans and AI in the version control system for transparency. +Despite the concerns, our work provides a foundation for a future where humans and machines collaborate to create academic manuscripts. +Writing scientific articles can be time-consuming and require significant effort to communicate findings effectively. +As machines improve scholarly text, humans can focus more on what to communicate rather than how to write it. +This could lead to a more productive and equitable future for research, where scientists can focus on ideas and experiments to uncover the underlying principles of ourselves and our environment.