Main findings
Our findings support that there was large variability in TZ assessment performed by different VIA experts, with fair inter-observer agreement in both rounds (kappa 0.313 and 0.288). TZ classification in clinical practice appears to be a method associated with low reliability and large variation in its interpretation. This suggests that TZ assessment is challenging to interpreted and reproduced, with TZ2 showing the highest heterogeneity.
Interpretation
Vallikad et al. [17] reported in a colposcopy context among three reviewers [17], reported higher inter-observer (kappa 0.53–0.66) and intra-observer (kappa 0.60–0.86) agreement for TZ type classification than in our study, but like our findings, the lowest agreement between observers was found for TZ2 [17, 18].
In real-life conditions, the manipulation of the cervix to differentiate TZ2 from TZ3 might reduce TZ2 heterogeneity. An exploratory analysis was therefore performed combining images classified as [TZ2 or TZ3] versus TZ1 (Additional file 1). Our results showed that inter- and intra-observer agreement were improved by combining TZ2 with TZ3. In contrast, TZ1 being fully ectocervical, its interpretation is not expected to depend on cervical manipulation. Nevertheless, combining TZ1 with TZ2 also showed improved intra- and inter-observer agreement, suggesting that the increase in agreement in both cases of combined TZ types may be in part due to the lower number of categories being compared. Furthermore, despite the improved Kappa after combining TZ2 and TZ3, overall agreement remains relatively low (only 10% of inter-observer comparisons showing substantial agreement across both rounds), supporting the hypothesis that even in real-life conditions, the level of heterogeneity in the interpretation of TZ remains significant. Further studies should confirm this by assessing TZ agreement based on on-site interpretation with the possibility to manipulate the cervix.
The IFCPC TZ classification was primarily developed to improve colposcopy reporting and to define the type of excisional therapy (generally LLETZ) indicated in cases of precancerous lesions [12]. Current diagnostic procedures of colposcopy in high-income countries are cervical biopsy in cases of TZ1 or TZ2 and endocervical curettage (ECC) for TZ3 to obtain fragments of squamous epithelium from inside the cervical canal. However, in low-resource settings, these procedures (colposcopy, biopsy, LLETZ) are not readily available most of the time and are not feasible in a “screen-and-treat” approach, requiring a multi-visit approach with referral for further evaluation.
Reducing the number of clinical visits is a strategy recommended by the WHO in LMICs because it increases compliance and follow-up while reducing program costs [2]. In this context, the endorsed TZ classification by the WHO should help clinicians to determine which patients can be safely evaluated with VIA and treated by ablation, and those who are inadequately evaluated by VIA and require referral for additional management [2, 19]. The TZ3 prevalence observed in our population was 26.6% (Table 4, first round), indicating that a significant number of women may require referral and additional investigation. In the literature from high-income countries, a great variation in TZ3 prevalence was reported, ranging between 16.3 and 80% [17,18,19]. In low-resource contexts, the front-line provider’s decision to refer women with TZ3, has important consequences for both the women and the health care system, with notable impacts on logistics and service delivery, as this requires additional time, equipment, financial resources, and transportation.
Strengths and limitations
The main limitation of this study is that the observers were aware that their interpretations were not used for clinical decision-making; therefore, the results may not fully reflect real-life practice.
There are also some strengths to highlight. In this study, the cases were not selected other than for image quality. Images were presented in a consecutive order, with cases corresponding to a real-life distribution of TZ types in a routine screening setting. Furthermore, the TZ interpretation was performed by international experts with extensive experience in VIA.
Practical and research recommendations
Considering the heterogeneity of TZ interpretation and its consequences on patient management, the importance of long-term follow-up of HPV-positive patients should be emphasized to make up for potentially missed diagnoses or inappropriate treatment. These considerations should be integrated in the initial and continuous training of health care providers practicing VIA and treatment of precancerous cervical lesions.
Further investigations to optimise the management of TZ3 in low-resource contexts as well as to reduce the variability in TZ3 interpretation should be explored. Investigation of surrogate markers that may help to stratify the risk of HPV-positive women with TZ3 and determine who can be safely offered conservative management should be pursued. In addition, recent development of an artificial intelligence algorithm might assist front-line providers, not only in the detection of precancerous lesions, but also to define participants who are eligible for treatment [20].