A prospective protocol was developed to assess the impact of QUADAS on ten systematic reviews performed over the period 2004-2007. These systematic reviews were selected as they were all performed by the authors, according to prospective protocols and recommended methodology, with prospective assessment of methodological quality using the QUADAS checklist thus uniform assessment could be ensured. We included reviews of minimal and non invasive tests to determine the lymph node status in gynaecological cancers [5–7] and reviews of Down's serum screening markers and uterine artery Doppler to predict small for gestational age fetuses in obstetrics [8, 9]. The checklist was also tailored to take into account the nature of each review e.g. the nature of the index test (the tailored checklists are available as appendices to the published reviews). We addressed the following questions: What is the quality of studies in these fields? Is there a difference in quality between studies in Obstetrics and Gynaecology? Did the introduction of QUADAS improve quality? Does study size correlate with quality? Is there a geographical pattern to quality? Is there a relationship between compliance with STARD and QUADAS? Which quality items are associated with bias?
The QUADAS checklist was applied to each of the studies included in the reviews with the quality item being determined as either present, absent, unclear or not applicable (additional file 1). All studies were assessed in duplicate by TJS and RKM, where there was disagreement this was resolved by consensus with a third reviewer (KSK). All studies were also assessed for reporting quality using the STARD checklist. Results of individual studies were summarized in two by two tables from which the DOR was calculated as a measure of diagnostic accuracy . DOR is the odds of a positive result in a diseased person relative to the odds of a positive result in a non diseased person. In the case of zero entities in the two by two tables 0.5 was added to the cells to enable calculation of DOR . In the event that several tests had been applied to the same patient, the results including the largest number of patients were used in this study or where there was no difference, one index test was selected at random, this ensured patients were only included once.
The percentage compliance of studies with QUADAS items was compared between both specialties, before and after the introduction of QUADAS, using the unpaired t test to assess the effect of QUADAS on the methodological quality of studies. With the publication of QUADAS in 2003 the assumption was made that all studies published pre 2005 were published without the benefit of this directorate.
We examined the relationship between sample size and compliance with QUADAS using Spearman's rank correlation coefficient (Rho). Kruskal Wallis was used to investigate any relationship between geographical distribution and reporting quality. The country of origin of a study was determined by the country of the corresponding author. Where a significant result was found, pairways comparison was made using Conover Inman procedure. Countries were grouped depending on the number of articles published and the mean journal impact factor and adjusted for gross domestic product and population, based on previous publication . Where there was a large disparity in number of studies per geographical area, some studies were re grouped to avoid large differences in group size and potentially spurious results. For obstetric reviews geographical areas were Oceania, USA, Canada, Asia, Japan, Africa, Eastern Europe and Western Europe and for gynaecology studies there were no studies from Oceania or Canada, but Latin America was added.
If the standard of reporting of a study is poor then this can potentially limit the assessment of the quality of study design. To investigate the relationship between reporting and methodological quality, the studies' compliance with STARD and QUADAS was compared using Spearman correlation coefficient. The difference in compliance with the two checklists between obstetrics and gynaecology was assessed using unpaired t test.
The final analysis performed was a meta-regression analysis to assess which quality items were associated with bias. Multiple logistic regression models were adjusted to test the effect of individual QUADAS quality items on diagnostic accuracy, measured as the diagnostic odds ratio (DOR) . This methodology  has been used successfully in demonstrating empirically the effect of bias related to methodological flaws in clinical trials [26–28] and in diagnostic studies . The dependent variable in each logistic model was a binary variable representing disease status (diseased verses non diseased) from each meta-analysis. The independent variables included a variable representing test threshold (i.e. the sum of logits of sensitivity and 1-specificity); a binary variable for test result (positive versus negative); indicator variables to control for the effect of the primary studies and the "QUADAS item (dichotomized as Yes versus all other) by test result" interaction terms to analyze its association with estimates of diagnostic accuracy. The estimated effect of a quality characteristic on average diagnostic accuracy is given by the coefficient of this latter variable whose exponentiation gives the diagnostic performance (DOR) of studies failing to satisfy the methodological criterion relative to its performance in studies with that feature. This is the Relative Diagnostic Odds Ratio (RDOR). If this ratio is greater than 1 then the accuracy of studies without that feature overestimates the diagnostic performance compared to studies with that feature. Only meta-analyses that contained studies with and without the characteristic could contribute to this estimate. We used the RDOR as the summary measure of accuracy and dependant variable in the analyses as it is useful as a single indicator of test performance.
In the initial analysis those quality items coded as unclear and not applicable were excluded. For all of the above analysis, due to the uncertainty of whether reporting items coded as unclear represented methodological failure, sensitivity analysis was performed excluding unclear as a code and adding it to the not reported group for all comparisons. Similarly sensitivity analysis was also performed to assess the effect of those items assessed as not applicable, with their initial exclusion in the analysis and then addition as if they were reported i.e. "yes" so as not to penalise studies which had a larger number of not applicable items and would therefore potentially have a seemingly lower compliance with QUADAS.