Evaluation of the Accuracy of ChatGPT-generated Information on Human Papillomavirus: A Physician-based Assessment Study
PDF
Cite
Share
Request
Original Article
VOLUME: 64 ISSUE: 2
P: 85 - 91
March 2026

Evaluation of the Accuracy of ChatGPT-generated Information on Human Papillomavirus: A Physician-based Assessment Study

Med Bull Haseki 2026;64(2):85-91
1. University of Health Sciences Türkiye, Gulhane Training and Research Hospital, Clinic of Urology, Ankara, Türkiye
2. MEDICLIN Kraichgau-Klinik, Department of Urology, Bad Rappenau, Germany
3. University of Health Sciences Türkiye, Dr. Abdurrahman Yurtaslan Ankara Oncology Training and Research Hospital, Clinic of Urology, Ankara, Türkiye
4. Ankara Bilkent City Hospital, Clinic of Dermatology, Ankara, Türkiye
5. Acıbadem Ankara Hospital, Clinic of Urology, Ankara, Türkiye
No information available.
No information available
Received Date: 12.09.2025
Accepted Date: 08.03.2026
Online Date: 27.03.2026
Publish Date: 27.03.2026
PDF
Cite
Share
Request

Abstract

Aim

Artificial intelligence (AI) applications are widely used to identify solutions to patients' problems. This study aims to evaluate the scientific validity of information that patients can access about human papillomavirus (HPV) -related topics using Chat-Generative Pre-Trained Transformer (ChatGPT).

Methods

This study was conducted between July 1 and August 1, 2025. A physician developed a structured set of HPV-related questions. The responses generated by ChatGPT were independently evaluated by three clinicians with clinical experience in HPV management. Each response was rated using a five-point Likert scale based on accuracy and clinical relevance. Inter-rater reliability among reviewers was assessed using Cohen's kappa statistic.

Results

The mean scores given by the reviewers for evaluating the accuracy of ChatGPT's answers to HPV-related questions were 4.9±0.3, 4.75±0.44, and 4.75±0.55, respectively. The percentages of correct scores assigned to ChatGPT by the reviewers were 90%, 75%, and 80%, respectively. The approximately equal percentages of correct and incorrect scores were 0, 0, and 5, respectively. The percentages of nearly correct scores were 10, 25, and 15, respectively.

Conclusion

Chat-Generative Pre-Trained Transformer 4.0 demonstrated high efficacy in providing general information regarding HPV, with an 81.6% accuracy rate and a 90% near-accuracy rate. Incorporating AI tools into the facilitation of patient access to information could enhance learning processes. However, it is essential that these tools be continuously refined and utilized to complement rather than substitute for the critical judgment of medical professionals and patients.

Keywords:
Artificial Intelligence, humans, human papillomavirus viruses, patient education as a topic

Introduction

Human papillomavirus (HPV) infection is one of the most widespread sexually transmitted diseases in the world, affecting both men and women (1). Human papillomavirus is transmitted through skin-to-skin contact (2). Low-risk HPV can cause genital warts, which can negatively affect patients’ social, physical, and sexual lives. High-risk types of HPV can cause cancers of the anus, vagina, vulva, penis, mouth, lungs, and throat (3, 4). Infections with HPV can also lead to adverse reproductive outcomes, including reduced sperm quality and decreased sperm concentration and motility (5, 6).

Large language models (LLMs) are natural language processing (NLP) models that use deep learning algorithms to process and generate text in a manner similar to human language. Chat-Generative Pre-Trained Transformer (ChatGPT) is an NLP model developed by OpenAI that was introduced at the end of 2022 (7). Chat-Generative Pre-Trained Transformer can generate highly accurate and appropriate responses for patient education purposes. Its alignment with professional medical guidelines demonstrates its high potential for patient education (8, 9). Additionally, ChatGPT can be used by physicians to facilitate research innovation and comprehensive health management, as well as for diagnostic reasoning in some diseases, particularly in areas where rapid information retrieval and analysis are crucial for patient care (10, 11).

In the present study, we hypothesized that clinicians with HPV-related clinical experience could evaluate ChatGPT’s information about HPV and thereby determine the accuracy of the information that patients can access through ChatGPT. Patients are using chatbots that use artificial intelligence (AI) more and more to obtain health information, but it’s still not clear how reliable the information they provide about HPV-related topics is. Consequently, assessing the precision of AI-generated information is crucial for ascertaining the safe application of such tools in patient education.

Materials and Methods

Compliance with Ethical Standards

The study did not require approval by the institutional review board or ethics committee because no patient data was used. Informed consent was not specifically obtained from respondents. However, it was indicated in their survey responses. We adhered to the Strengthening the Reporting of Observational Studies in Epidemiology guidelines for reporting (12).

Study Design

This observational study was conducted from 1 July to 1 August 2025. A physician generated a set of questions. (M.Y) (Table 1). All questions were chosen subjectively to represent each physician’s area of expertise, with the aim of providing a robust and balanced overview of the relevant topics. The physician was requested to provide questions with clearly defined, evidence-based answers based on guidelines from the European Association of Urology, the American Urological Association, and the European Academy of Dermatology and Venereology. The physician developed binary (yes/no), descriptive, or multiple-correct-answer questions, all of which had similar difficulty ratings. A Likert scale can be useful for assessing the accuracy, completeness, and reliability of knowledge (13). The physician who created the questions intended to use a predefined Likert scale to assess their accuracy.

The accuracy scale was a 5-point Likert scale, with the following options: (1: completely incorrect; 2: more incorrect than correct; 3: approximately equally correct and incorrect; 4: nearly correct; and 5: completely correct.)

To assess the consistency of all questions related to HPV, a chatbot version 4.0 was used. To ensure consistency, one investigator (H.C.A.) entered all questions into the chatbot, prompting it with the phrase “Please ensure that information is medically accurate and based on current best practices and guidelines for urology, dermatology, obstetrics, and gynecology” and using unconditional prompts for each new chat. Three specialists (two urologists and one dermatologist) who specialize in the medical or surgical treatment of HPV warts in outpatient clinics were invited to assess the answers (Table 1). To avoid potential bias, respondents were instructed not to use the chatbot to screen the questions themselves. Artificial intelligence-generated answers were reviewed by three clinicians using a five-point Likert scale. The study process is summarized in Figure 1.

Statistical Analysis

Statistical analyses were performed using SPSS version 22.0 (IBM Inc., Armonk, NY, USA). Descriptive statistics were used to summarize the data. Continuous variables were expressed as mean ± standard deviation. Categorical variables were described in terms of frequency and percentage. Cohen’s kappa is used to assess inter-rater reliability (IRR). Cohen’s kappa correlation values were described as poor agreement: 0.00, slight agreement: 0.00-0.20, fair agreement: 0.21-0.40, moderate agreement: 0.41-0.60, substantial agreement: 0.61-0.80, and almost perfect agreement: 0.81-1.00 (14). A p<0.05 was considered significant.

Results

70% of the respondents confirmed that the answers to questions 1, 2, 4-10, 13, 16-18, and 20 were entirely correct. One reviewer confirmed that the answers to questions 3, 14, 15, and 19 were entirely correct, while the other two reviewers confirmed that they were nearly correct. Two reviewers confirmed that the answer to question number 12 was entirely correct. One reviewer confirmed that it was nearly correct. Three reviewers assessed the answer to question 11 as follows: one considered it entirely correct, one considered it nearly correct, and one considered it approximately equally correct and incorrect.

The mean scores given by the reviewers for assessing the accuracy of ChatGPT’s answers to HPV-related questions were 4.9±0.3, 4.75±0.44, and 4.75±0.55 respectively. The reviewers’ evaluations are presented in Table 1.

The correct score percentages given by the reviewers to ChatGPT were 90, 75, and 80, respectively. The approximately equal correct and incorrect score percentages were 0, 0, and 5, respectively. The nearly correct score percentages were 10, 25, and 15, respectively. IRR values between reviewer 1 vs. 2, reviewer 1 vs. 3, and reviewer 2 vs. 3 were 0.500 (p=0.01), -0.132 (p=0.473), and 0.448 (p=0.021), respectively. Although no statistically significant agreement was found between the first and third reviewers regarding the accuracy of ChatGPT’s responses, statistically significant agreement was observed among the other reviewers. The reviewers’ IRR values are shown in Table 2.

Discussion

Large language models can be utilized by humans to identify health issues and direct the treatment process. Chat-Generative Pre-Trained Transformer, developed by OpenAI, is one of the LLMs used for consulting on health issues. This study was designed to determine how accurately ChatGPT could answer questions posed by patients with HPV attending the outpatient clinic and to inform patients about the HPV treatment process.

According to (Surveillance, Epidemiology, and End Results) statistics, the incidence of anal carcinoma increased by an average of 2.2% per year between 2013 and 2022. During this period, it accounted for 0.5% of all new cancer diagnoses in the United States (15). Anal cancer incidence is rising, and predominantly HPV type 16 causes a high-grade squamous intraepithelial lesion (16). High-grade squamous intraepithelial lesion is the precursor lesion of anal squamous cell carcinoma (SCC). It is caused by the uncontrolled growth of squamous epithelial cells in the perianal area or the anal canal transformation zone and is a direct result of an HPV infection. These premalignant lesions may develop into anal SCC if treatment is not received (17). Approximately 88% of anal cancer cases test positive for HPV DNA, indicating a strong association between HPV and anal cancer, second only to cervical cancer. Consequently, HPV has a significant impact on the development of anal cancer (18). Although ChatGPT stated that HPV-16 had a 70% likelihood of causing anal cancer, its report that this strain is the most frequently found in anal cytology was accurate. The use of ChatGPT for medical research regarding the ratio of anal cancer cases associated with HPV 16 is not substantiated by empirical data.

A self-sampling strategy combining HPV detection in urine samples with accessible polymerase chain reaction (PCR) tools was developed as an alternative to cervical swab-based HPV screening to improve participation rates. The PCR kit can detect 14 types of HPV, including HPV-52, HPV-16, and HPV-18, in cervical and urine samples. Urine samples show promise in terms of their accuracy for HPV detection, which could increase cervical cancer screening (19). Our study revealed that ChatGPT correctly identified certain facts relating to cervical cancer screening tests, advanced urine sample options, and the PCR requirement.

The presence of HPV in sperm is associated with male infertility, as indicated by an elevated risk of oligozoospermia and asthenospermia (20). HPV in women has potentially been caused by cervical or tubal factor infertility. However, a scoping review concluded that any studies investigating HPV infection in relation to female fertility had not been conducted (21). Chat-Generative Pre-Trained Transformer has conducted a thorough review of the extant literature on HPV-related infertility in women and men, offering a comprehensive interpretation of the subject. Furthermore, it provides more comprehensive answers to the question by offering information on fertilization, implantation, the probability of pregnancy, and the transmission risk to the partner or the fetus.

Head and neck SCCs (HNSCCs) emerge from the mucosal epithelium of the oral cavity, larynx, and pharynx. The primary risk factor for HNSCCs of the larynx and oral cavity is smoking. Oropharyngeal tumors are increasingly being associated with a history of infection with carcinogenic strains of the HPV, particularly HPV-16. To a lesser extent, this association has also been observed with HPV-18 and other strains (22). The overall prevalence of oropharyngeal HPV in healthy adults in the United States of America and in Europe was reported to be between 3.6% and 6.8% in females and between 6.6% and 15.0% in males (23). The highest oral HPV prevalence was described in South America and the lowest was described in Asia, at 12.4% and 2.6%, respectively (24). Chat-Generative Pre-Trained Transformer has stated that the prevalence of oral HPV is around 7%, oral HPV-16 around 1%, and HPV-related infections around 1% per year. This information does not align with the existing literature on the subject. Furthermore, ChatGPT has stated that HPV-related infections may be temporary and may clear naturally within six to twelve months. This information conflicts with what is published in the literature. (25).

Men who have sex with men, HIV‐positive status, sexual history (number of lifetime sex partners, number of recent oral or anal sex partners, age of sexual debut), and smoking are associated with adult HPV infection (26). However, ChatGPT noted that lack of male circumcision, other sexually transmitted infections (STIs), and long-term oral contraceptive use are risk factors. The use of oral contraceptives showed an independent association with HPV16-18 infection rates (27). Nevertheless, no clear association exists between HPV infection and circumcision. Additionally, a history of STIs can affect penile, cervical, or vaginal infections.

HPV is transmitted through sexual contact; from mother to fetus; through skin contact (e.g., via the hands or contact with underwear or other inanimate objects); and by high-temperature evaporation treatment. Visible warts can be treated with physical therapy or surgery. The recurrence rate of subclinical infections caused by warts up to 1 cm can be reduced by applying laser therapy, cryotherapy, topical imiquimod, and photodynamic therapy (25). Chat-Generative Pre-Trained Transformer stated that autoinoculation, which is considered a form of skin-to-skin transmission, is a distinct cause of transmission. The information provided by ChatGPT is accurate but insufficient. Furthermore, it noted that indirect contact was very rare and clinically insignificant. Additionally, ChatGPT noted that there is no treatment for asymptomatic HPV infections and that vaccination and surveillance are sufficient. These recommendations do not align with the existing literature on the subject.

Vaccination is the most practical and cost-effective method for avoiding HPV-related health issues. HPV vaccination has been shown to prevent more than 90% of HPV-related cancers (28). In a different study, HPV vaccination was found to be associated with a reduced incidence of several types of cancer among females aged between 9 and 26 years (29). Chat-Generative Pre-Trained Transformer recommended vaccination by age and risk group; this recommendation aligns with the literature on the subject. Furthermore, it emphasized the significance of HPV vaccination and the adoption of safe sex practices, including limiting the number of sexual partners, using condoms, and enhancing immune function through smoking cessation.

The clearance time for HPV has been referred to as being less than 4 weeks (±4 weeks) in the short term and 12 months (±6 months) in the long term (25). Albero et al. (30) concluded that the period during which the HPV virus was cleared ranged from 1.3 to 42.1 months. According to ChatGPT, the median clearance time for HPV is between 6 and 18 months. HPV infections usually clear spontaneously within one to two years in immunocompetent individuals. It stated that persistence may last 12-24 months or longer, particularly for high-risk types.

Study Limitations

This study was conducted with only the ChatGPT 4.0 model, and other AI-based models were not included in the evaluation. The accuracy of the responses was assessed by a limited number of physicians, thereby restricting the diversity of perspectives. Only three reviewers, physicians from overlapping specialties, were included. The inclusion of a larger group of experts could have increased the reliability of the evaluations. Future studies involving multiple AI models and a broader range of physicians may strengthen the generalizability of the findings, particularly by including diverse specialties and practice settings to ensure a more comprehensive evaluation of the AI’s effectiveness across different patient populations. Despite these limitations, the study will enable patients to obtain preliminary information based on highly accurate data evaluated by reviewers specified by ChatGPT during the interval between seeking medical attention and visiting a physician.

Conclusion

Chat-Generative Pre-Trained Transformer 4.0 has demonstrated high efficacy in providing general information about HPV, with an 81.6% accuracy rate and a 90% near-accuracy rate. Despite the evident potential demonstrated by ChatGPT, these findings should be regarded as preliminary indications of its promise, particularly in research related to human health. Furthermore, these results should not be construed as validation of ChatGPT’s clinical adequacy. Incorporating AI tools into the facilitation of patient access to information could enhance learning processes. However, it is essential that these tools be continuously refined and utilized to complement, rather than substitute, the critical judgment of medical professionals and patients.

Ethics

Ethics Committee Approval: The study did not require approval from the institutional review board or ethics committee because no patient data was used.
Informed Consent: Since this study is based on responses generated by artificial intelligence, patient consent was not required.

Authorship Contributions

Concept: H.C.A., Design: L.T., Data Collection or Processing: M.Y.,  Analysis or Interpretation: H.C.A., Literature Search: M.D., E.K.N., I.D., Writing: H.C.A., M.Y.
Conflict of Interest: No conflict of interest was declared by the authors.
Financial Disclosure: The authors declared that this study received no financial support.

References

1
Khan I, Harshithkumar R, More A, Mukherjee A. Human papilloma virus: an unraveled enigma of universal burden of malignancies. Pathogens. 2023;12:564.
2
Szymonowicz KA, Chen J. Biological and clinical aspects of HPV-related cancers. Cancer Biol Med. 2020;17:864-78.
3
Dominiak-Felden G, Cohet C, Atrux-Tallau S, Gilet H, Tristram A, Fiander A. Impact of human papillomavirus-related genital diseases on quality of life and psychosocial wellbeing: results of an observational, health-related quality of life study in the UK. BMC Public Health. 2013;13:1065.
4
Kohli M, Bunker CB, Kravvas G. Human papillomavirus: an update. Clin Dermatol. 2026;44:54-66.
5
Cao X, Wei R, Zhang X, Zhou J, Lou J, Cui Y. Impact of human papillomavirus infection in semen on sperm progressive motility in infertile men: a systematic review and meta-analysis. Reprod Biol Endocrinol. 2020;18:38.
6
Wang QH, Ye JJ, Chen ZY, et al. Current risk factors for male infertility and semen parameters: an umbrella review of systematic reviews and meta-analyses. Asian J Androl. 2026.
7
Gilson A, Safranek CW, Huang T, et al. How does ChatGPT perform on the United States Medical Licensing Examination (USMLE)? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312. Erratum in: JMIR Med Educ. 2024;10:e57594.
8
Emile SH, Horesh N, Garoufalia Z, Gefen R, Boutros M, Wexner SD. Assessment of the utility of artificial intelligence-based chatbots in patient education: a systematic review and meta-analysis. The American Surgeon TM. 2026;92:258-69.
9
Pandya S, Alessandri Bonetti M, Liu HY, Jeong T, Ziembicki JA, Egro FM. Burn patient education in the modern age: a comparative analysis of chatgpt and google performance answering common questions on burn injury and management. J Burn Care Res. 2025;46:533-41.
10
Lanzafame LRM, Gulli C, Mazziotti S, et al. Chatbots in radiology: current applications, limitations and future directions of chatgpt in medical imaging. Diagnostics (Basel). 2025;15:1635.
11
Zhu Y, Luo D, Shen X, et al. Application of ChatGPT-based artificial intelligence in the diagnosis and management of polycystic ovary syndrome. BMC Med Inform Decis Mak. 2025;25:271.
12
von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61:344-9.
13
Molena KF, Macedo AP, Ijaz A, et al. Assessing the accuracy, completeness, and reliability of artificial intelligence-generated responses in dentistry: a pilot study evaluating the ChatGPT model. Cureus. 2024;16:e65658.
14
Fageeh HI, Fageeh HN, Bhati AK, et al. Assessing the reliability of miller’s classification and cairo’s classification in classifying gingival recession defects: a comparison study. Medicina (Kaunas). 2024;60:205.
15
Surveillance, Epidemiology, and End Results Program. Cancer Stat Facts: Anal cancer. 2025. Available from: https://seer.cancer.gov/statfacts/html/anus.html. Accessed March 3, 2025.
16
Rozemeijer K, Dias Gonçalves Lima F, Kuyvenhoven EJ, et al. Swab-based anal cancer screening in men living with HIV: projected outcomes for different screening algorithms. Int J Cancer. 2025;157:2259-68.
17
Araujo ROC, Valadão M, Silva JADDCE, et al. Implementation of a screening program for high-grade anal dysplasia in high-risk patients in a tertiary cancer center. J Surg Oncol. 2025;131:151-9.
18
Ebrahimi F, Rasizadeh R, Jafari S, Baghi HB. Prevalence of HPV in anal cancer: exploring the role of infection and inflammation. Infect Agent Cancer. 2024;19:63.
19
Intan NS, Utama R, Wulandari D, et al. Urinary detection of high-risk HPV DNA to enhance cervical cancer screening in developing countries. Microbiol Spectr. 2025;13:e0193824.
20
Priam A, Bozec AL, Meireles VD, et al. Human papillomavirus carriage in the semen of men consulting for infertility: prevalence and correlations with sperm characteristics. Asian J Androl. 2025;27:196-203.
21
Kristensen TS, Foldager A, Laursen ASD, Mikkelsen EM. Sexually transmitted infections (Chlamydia trachomatis, genital HSV, and HPV) and female fertility: a scoping review. Sex Reprod Healthc. 2025;43:101067.
22
Johnson DE, Burtness B, Leemans CR, Lui VWY, Bauman JE, Grandis JR. Head and neck squamous cell carcinoma. Nat Rev Dis Primers. 2020;6:92. Erratum in: Nat Rev Dis Primers. 2023;9:4.
23
Alemany L, Felsher M, Giuliano AR, et al. Oral human papillomavirus (HPV) prevalence and genotyping among healthy adult populations in the United States and Europe: results from the PROGRESS (PRevalence of Oral hpv infection, a Global aSSessment) study. EClinicalMedicine. 2025;79:103018.
24
Yu S, Zhu Y, He H, et al. Prevalence and risk factors of oral human papillomavirus infection among 4212 healthy adults in Hebei, China. BMC Infect Dis. 2023;23:773.
25
Zhu P, Qi RQ, Yang Y, et al. Clinical guideline for the diagnosis and treatment of cutaneous warts (2022). J Evid Based Med. 2022;15:284-301.
26
Del Pino M, Vorsters A, Joura EA, et al. Risk factors for human papillomavirus infection and disease: a targeted literature summary. J Med Virol. 2024;96:e29420.
27
Li L, Wu H, Chen Y, et al. Influence of pre-vaccination HPV status on vaccine effectiveness among chinese women: a multicenter cross-sectional study. Cancer Rep (Hoboken). 2025;8:e70294.
28
Stuart R, Theopold N, Miall N, et al. The role of HPV single-dose vaccination in expanding access in GAVI-supported countries during a period of supply constraints. Vaccine. 2026;75:128187.
29
Hung YM, Lin TT, Wang SI, Wu PJ, Chang R, Wei JC. HPV vaccination is associated with lower risk of cancers among females. Am J Med. 2026;139:311-20.e12.
30
Albero G, Castellsagué X, Giuliano AR, Bosch FX. Male circumcision and genital human papillomavirus: a systematic review and meta-analysis. Sex Transm Dis. 2012;39:104-13.