AUTOGEN: A Personalized Large Language Model for Academic Enhancement-Ethics and Proof of Principle

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

AUTOGEN : A Personalized Large Language Model for Academic Enhancement-Ethics and Proof of Principle. / Porsdam Mann, Sebastian; Earp, Brian D; Møller, Nikolaj; Vynn, Suren; Savulescu, Julian.

I: American Journal of Bioethics, Bind 23, Nr. 10, 2023, s. 28-41.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Porsdam Mann, S, Earp, BD, Møller, N, Vynn, S & Savulescu, J 2023, 'AUTOGEN: A Personalized Large Language Model for Academic Enhancement-Ethics and Proof of Principle', American Journal of Bioethics, bind 23, nr. 10, s. 28-41. https://doi.org/10.1080/15265161.2023.2233356

APA

Porsdam Mann, S., Earp, B. D., Møller, N., Vynn, S., & Savulescu, J. (2023). AUTOGEN: A Personalized Large Language Model for Academic Enhancement-Ethics and Proof of Principle. American Journal of Bioethics, 23(10), 28-41. https://doi.org/10.1080/15265161.2023.2233356

Vancouver

Porsdam Mann S, Earp BD, Møller N, Vynn S, Savulescu J. AUTOGEN: A Personalized Large Language Model for Academic Enhancement-Ethics and Proof of Principle. American Journal of Bioethics. 2023;23(10):28-41. https://doi.org/10.1080/15265161.2023.2233356

Author

Porsdam Mann, Sebastian ; Earp, Brian D ; Møller, Nikolaj ; Vynn, Suren ; Savulescu, Julian. / AUTOGEN : A Personalized Large Language Model for Academic Enhancement-Ethics and Proof of Principle. I: American Journal of Bioethics. 2023 ; Bind 23, Nr. 10. s. 28-41.

Bibtex

@article{20322ff541cd4116a53b715b55605978,
title = "AUTOGEN: A Personalized Large Language Model for Academic Enhancement-Ethics and Proof of Principle",
abstract = "In this article, we explore the potential of enhancing academic prose and idea generation by fine-tuning a large language model (here, GPT-3) on one's own previously published writings: AUTOGEN ({"}AI Unique Tailored Output GENerator{"}). We develop, test, and describe three distinct AUTOGEN models trained on the prior scholarly output of three of the current authors (SBM, BDE, JS), with a fourth model trained on the combined works of all three. Our AUTOGEN models demonstrate greater variance in quality than the base GPT-3 model, with many outputs outperforming the base model in format, style, overall quality, and novel idea generation. As proof of principle, we present and discuss examples of AUTOGEN-written sections of existing and hypothetical research papers. We further discuss ethical opportunities, concerns, and open questions associated with personalized academic prose and idea generators. Ethical opportunities for personalized LLMs such as AUTOGEN include increased productivity, preservation of writing styles and cultural traditions, and aiding consensus building. However, ethical concerns arise due to the potential for personalized LLMs to reduce output diversity, violate privacy and intellectual property rights, and facilitate plagiarism or fraud. The use of coauthored or multiple-source trained models further complicates issues surrounding ownership and attribution. Open questions concern a potential credit-blame asymmetry for LLM outputs, the legitimacy of licensing agreements in authorship ascription, and the ethical implications of coauthorship attribution for data contributors. Ensuring the output is sufficiently distinct from the source material is crucial to maintaining ethical standards in academic writing. These opportunities, risks, and open issues highlight the intricate ethical landscape surrounding the use of personalized LLMs in academia. We also discuss open technical questions concerning the integration of AUTOGEN-style personalized LLMs with other LLMs, such as GPT-4, for iterative refinement and improvement of generated text. In conclusion, we argue that AUTOGEN-style personalized LLMs offer significant potential benefits in terms of both prose generation and, to a lesser extent, idea generation. If associated ethical issues are appropriately addressed, AUTOGEN alone or in combination with other LLMs can be seen as a potent form of academic enhancement.",
author = "{Porsdam Mann}, Sebastian and Earp, {Brian D} and Nikolaj M{\o}ller and Suren Vynn and Julian Savulescu",
year = "2023",
doi = "10.1080/15265161.2023.2233356",
language = "English",
volume = "23",
pages = "28--41",
journal = "American Journal of Bioethics",
issn = "1526-5161",
publisher = "Routledge",
number = "10",

}

RIS

TY - JOUR

T1 - AUTOGEN

T2 - A Personalized Large Language Model for Academic Enhancement-Ethics and Proof of Principle

AU - Porsdam Mann, Sebastian

AU - Earp, Brian D

AU - Møller, Nikolaj

AU - Vynn, Suren

AU - Savulescu, Julian

PY - 2023

Y1 - 2023

N2 - In this article, we explore the potential of enhancing academic prose and idea generation by fine-tuning a large language model (here, GPT-3) on one's own previously published writings: AUTOGEN ("AI Unique Tailored Output GENerator"). We develop, test, and describe three distinct AUTOGEN models trained on the prior scholarly output of three of the current authors (SBM, BDE, JS), with a fourth model trained on the combined works of all three. Our AUTOGEN models demonstrate greater variance in quality than the base GPT-3 model, with many outputs outperforming the base model in format, style, overall quality, and novel idea generation. As proof of principle, we present and discuss examples of AUTOGEN-written sections of existing and hypothetical research papers. We further discuss ethical opportunities, concerns, and open questions associated with personalized academic prose and idea generators. Ethical opportunities for personalized LLMs such as AUTOGEN include increased productivity, preservation of writing styles and cultural traditions, and aiding consensus building. However, ethical concerns arise due to the potential for personalized LLMs to reduce output diversity, violate privacy and intellectual property rights, and facilitate plagiarism or fraud. The use of coauthored or multiple-source trained models further complicates issues surrounding ownership and attribution. Open questions concern a potential credit-blame asymmetry for LLM outputs, the legitimacy of licensing agreements in authorship ascription, and the ethical implications of coauthorship attribution for data contributors. Ensuring the output is sufficiently distinct from the source material is crucial to maintaining ethical standards in academic writing. These opportunities, risks, and open issues highlight the intricate ethical landscape surrounding the use of personalized LLMs in academia. We also discuss open technical questions concerning the integration of AUTOGEN-style personalized LLMs with other LLMs, such as GPT-4, for iterative refinement and improvement of generated text. In conclusion, we argue that AUTOGEN-style personalized LLMs offer significant potential benefits in terms of both prose generation and, to a lesser extent, idea generation. If associated ethical issues are appropriately addressed, AUTOGEN alone or in combination with other LLMs can be seen as a potent form of academic enhancement.

AB - In this article, we explore the potential of enhancing academic prose and idea generation by fine-tuning a large language model (here, GPT-3) on one's own previously published writings: AUTOGEN ("AI Unique Tailored Output GENerator"). We develop, test, and describe three distinct AUTOGEN models trained on the prior scholarly output of three of the current authors (SBM, BDE, JS), with a fourth model trained on the combined works of all three. Our AUTOGEN models demonstrate greater variance in quality than the base GPT-3 model, with many outputs outperforming the base model in format, style, overall quality, and novel idea generation. As proof of principle, we present and discuss examples of AUTOGEN-written sections of existing and hypothetical research papers. We further discuss ethical opportunities, concerns, and open questions associated with personalized academic prose and idea generators. Ethical opportunities for personalized LLMs such as AUTOGEN include increased productivity, preservation of writing styles and cultural traditions, and aiding consensus building. However, ethical concerns arise due to the potential for personalized LLMs to reduce output diversity, violate privacy and intellectual property rights, and facilitate plagiarism or fraud. The use of coauthored or multiple-source trained models further complicates issues surrounding ownership and attribution. Open questions concern a potential credit-blame asymmetry for LLM outputs, the legitimacy of licensing agreements in authorship ascription, and the ethical implications of coauthorship attribution for data contributors. Ensuring the output is sufficiently distinct from the source material is crucial to maintaining ethical standards in academic writing. These opportunities, risks, and open issues highlight the intricate ethical landscape surrounding the use of personalized LLMs in academia. We also discuss open technical questions concerning the integration of AUTOGEN-style personalized LLMs with other LLMs, such as GPT-4, for iterative refinement and improvement of generated text. In conclusion, we argue that AUTOGEN-style personalized LLMs offer significant potential benefits in terms of both prose generation and, to a lesser extent, idea generation. If associated ethical issues are appropriately addressed, AUTOGEN alone or in combination with other LLMs can be seen as a potent form of academic enhancement.

U2 - 10.1080/15265161.2023.2233356

DO - 10.1080/15265161.2023.2233356

M3 - Journal article

C2 - 37487183

VL - 23

SP - 28

EP - 41

JO - American Journal of Bioethics

JF - American Journal of Bioethics

SN - 1526-5161

IS - 10

ER -

ID: 383102556