Skip to main content
  • Research article
  • Open access
  • Published:

The temporal reliability of serum estrogens, progesterone, gonadotropins, SHBG and urinary estrogen and progesterone metabolites in premenopausal women



There is little existing research to guide researchers in estimating the minimum number of measurement occasions required to obtain reliable estimates of serum estrogens, progesterone, gonadotropins, sex hormone-binding globulin (SHBG), and urinary estrogen and progesterone metabolites in premenopausal women.


Using data from a longitudinal study of 34 women with a mean age of 42.3 years (SD = 2.6), we calculated the minimum number of measurement occasions required to obtain reliable estimates of 12 analytes (8 in blood, 4 in urine). Five samples were obtained over 1 year: at baseline, and after 1, 3, 6, and 12 months. We also calculated the percent of true variance accounted for by a single measurement and intraclass correlation coefficients (ICC) between measurement occasions.


Only 2 of the 12 analytes we examined, SHBG and estrone sulfate (E1S), could be adequately estimated by a single measurement using a minimum reliability standard of having the potential to account for 64% of true variance. Other analytes required from 2 to 12 occasions to account for 81% of the true variance, and 2 to 5 occasions to account for 64% of true variance. ICCs ranged from 0.33 for estradiol (E2) to 0.88 for SHBG. Percent of true variance accounted for by single measurements ranged from 29% for luteinizing hormone (LH) to 92% for SHBG.


Experimental designs that take the natural variability of these analytes into account by obtaining measurements on a sufficient number of occasions will be rewarded with increased power and accuracy.

Peer Review reports


Several active research programs are investigating the risk associated with serum estrogens, gonadotropins and urinary sex hormone metabolites for a variety of diseases including breast cancer [1], endometrial cancer [2], and osteoporosis [3]. The results of the few published studies suggest that the natural temporal variability (true variation over time, not variation due to storage or other factors) of some serum estrogens, gonadotropins and urinary sex hormone metabolites is sufficiently great that a single measurement occasion may be inadequate to ensure a reliable estimate [46]. Published intraclass correlation coefficients (ICC) vary between 0.06 and 0.62 for estradiol (E2) and between 0.52 and 0.69 for estrone (E1) [4]. Only the percent of free E2 and of SHBG-bound E2 have been found to be sufficiently reliable to account for as much as 50% of the variance in the true mean (ICC > 0.7).

The term reliability can refer either to the consistency of a measuring procedure or to the temporal stability of the target of measurement [7]. The definition of temporal reliability used in this study includes both those dimensions, but emphasizes the latter. While researchers can control error due to insufficient repeated measures by increasing the number of measurement occasions, obtaining measurements is expensive. It is therefore useful to have evidence-based guidelines for estimating the minimum number of occasions required to obtain a given degree of reliability for a particular analyte.

All types of measurement error distort, confound, or attenuate the tests of association that constitute one of the primary products of research [8, 9]. Figure 1, though not exhaustive, shows the sources of variance in a measurement and the interrelationships between error and tests of model fit or significance.

Figure 1
figure 1

Total observed variance

The relation of a measurement to the object being measured can be represented as: σ O = σ T + σ E , where σ O = variance in the observed measurement of the target, σ T = variance in the true value of the target, and σ E = random variance, or error. If the true value of the target is invariant across measurements, i.e., if σ O = σ E , the observed variance will be purely a function of the unreliability of the measuring instrument. Conversely, if perfectly error-free measurement of the target could be assumed, i.e., if σ E = 0, then σ O = σ T and the observed variance would be purely a function of the temporal stability of the target. If σ E ≠ 0 and σ T ≠ 0, the observed variance will be a function of both the temporal stability of the target and of the unreliability of the measuring instrument.

Measurement error can result from a variety of factors, including true variance not captured by a particular measurement strategy, which may complicate the interpretation of temporal reliability estimates. These other factors include variance due to: fluctuations across cycle phases within each woman's menstrual cycle [10]; duration of sample storage prior to analysis [11]; limitations of the assay; multiple analysis batches [10]; multiple types of assays [12]; and multiple laboratories [10]. Ideally, estimates of as many sources of error as possible should be included when considering the impact of temporal reliability on measurement strategy. The objective of this study was to determine the following for various serum estrogens, gonadotropins, and urinary sex hormone metabolites: the minimum number of repeated measurements required for reliable estimates; the ICCs; and the amount of true variance accounted for by single measurements.


Experimental design

The data for this study come from a randomized double-blind study investigating the effects of a 100 mg/day soy isoflavone regimen on estrogen levels in 34 premenopausal women. A detailed description of the study design and the results of the intervention were reported in Maskarinec et al., 2002).)[13]. The Committee on Human Studies at the University of Hawaii approved the study protocol. Written informed consent was obtained from each subject, prior to participation. The study group consisted of 17 premenopausal women per group. Four women left the study before the end of the year and another was able to give only four blood draws for health reasons. Eligibility criteria included: an age range of 35–46 years; an average intake of less than 7 servings of soy foods per week; no prior cancer diagnosis (except basal cell skin carcinoma); no use of oral contraceptives or hormone preparations within the past three months; no intention of becoming pregnant within the next year; an intact uterus and ovaries; self-defined regular menstrual periods; no serious medical condition. Subjects had a mean age of 42.3 years (SD = 2.6), and a mean weight of 65.6 kg (SD = 12.8). Subjects were ethnically diverse: 18 were Caucasian; 6 were Chinese; 5 were Japanese; 5 were Hawaiian.

Sample collection

Subjects were asked to donate 5 urine and blood samples, one at baseline and one after 1, 3, 6, and 12 months of participation. All samples were collected approximately 5 days after the ovulation (approximately day 19 in a 28 day cycle). Subjects used ovulation kits (Ovuquick test kits from Quidel, La Jolla, CA) to determine the time of ovulation. This kit detects the mid-cycle rise of LH using morning urine with a sensitivity of 35 mIU/mL of LH and its predictive validity with respect to ovulation has been estimated as 93% [14]. Although the use of a minimum progesterone value to exclude data from anovulatory cycles from the analyses helped ensure acquisition of the mid-luteal phase samples, only 52% of samples were obtained on exactly the 5th day from ovulation. Ninety-one percent were obtained between the 4th and the 6th day from ovulation. Blood samples were drawn at a commercial laboratory, in the morning between 7 and 9 o'clock to control for circadian rhythm in hormone levels. Serum and urine samples were stored at -80°C after separation and aliquoting.

Serum analysis

Hormone assays were conducted at the Department of Obstetrics and Gynecology, University of Southern California (Los Angeles, CA) in the Reproductive Endocrine Research Laboratory. The analyses for E2, free E2, E1, E1S, progesterone, SHBG, follicle stimulating hormone (FSH), and LH were conducted in 2 batches. Samples of these analytes collected at baseline, month 1 and month 3 were analyzed in batch 1, and 6-month and 12-month samples were analyzed in batch 2 one year later. E2, E1, progesterone, FSH, LH, and SHBG were quantified in serum by specific and sensitive radioimmunoassays (RIAs). Prior to RIA, E1 and E2 were first extracted with ethyl acetate: hexane (2:3) and then purified by Celite column partition chromatography, using ethylene glycol as stationary phase [15]. E1 and E2 were eluted off the column with 15% and 40% toluene in isooctane, respectively. 3H-E1 and 3H-E2 were used as internal standards to follow procedural losses. FSH and LH levels were determined using an immunoradiometric assay (IRMA). E1S, progesterone and SHBG were measured by direct RIAs using kits obtained from Diagnostic Systems Laboratories, Webster, Texas. Free E2 (non-SHBG or albumin-bound-E2) was determined by calculation using a computerized algorithm described previously).)[16]. The majority of intra-assay CVs for all analytes were below 10% (Table 1) indicating good quality control in the laboratory. They ranged from <0.5% for SHBG to 13.0% in the low concentration range of batch 1 for E1.

Table 1 Coefficients of variation for all analytes

Urine analysis

Urine samples were analyzed for estrone-3-glucuronide (E1-G), pregnanediol-3-glucuronide (PDG), 16α-hydroxyestrone (16α-OHE1) and 2-hydroxyestrone (2-OHE1). E1-G and PDG were measured directly in urine by enzyme immunoassay [17]. Commercially available enzyme-linked immunosorbent assay kits (Estramet: Immuna Care Corporation, Bethlehem, PA) were used to determine levels of 16α-OHE1 and 2-OHE1 in urine [18]. All results are relative to creatinine excretion.

Statistical analysis

The SAS statistical software package version 8.2 (SAS Institute Inc., Cary, NC, 1999–2001) was used to perform the statistical analyses. All statistics were computed using logged values when raw values were not normally distributed. To ensure that all measurements in the analysis were from the same time in the menstrual cycle, observations were only included if the concurrent progesterone values were at least 5 ng/mL, a minimum value after an ovulation has occurred. Because analyses for 8 of 12 analytes were conducted in two batches, we included consideration of error due to between batch variance in our analysis of the temporal stability of these analytes. Therefore, estimates of temporal stability for the 8 analytes were calculated for the total number of samples and for the first and second batches separately.

Two types of estimates of the number of measurement occasions (O) necessary to obtain an adequately reliable estimate were computed. The first, the relative type (O R ) includes the between-subject variance. O R was computed using the formula proposed by Nelson et al. [19]: where r is the correlation between the observed and the true mean analyte values for an individual over a year, s W 2 is the within-subject variance, and s B 2 is the between-subject variance. Setting r to 0.9 results in a calculation of the number of measurement occasions required to obtain an estimate that would account for 0.92 or 81% of the true variance in the target. Ninety-five percent confidence intervals (95% CI) for O R were computed using a published method).)[20].

The second estimate of the number of measurement occasions necessary to obtain an adequately reliable estimate, the absolute type (O A ), includes only within-subject variance. O A was calculated as , where σw is the within-subject variance [21]. By adjusting the denominator, this method allows for the desired approximation to the true mean to be specified as a percentage. Setting the denominator to 0.2 results in a calculation of the number of occasions required to obtain an estimate that is within 20% of the true mean. A SAS macro using Proc Varcomp and Proc Means to produce estimates of O R , O A , and related statistics is available from the authors.

ICCs measure the proportion of variance attributable to targets of measurement as a ratio of within-subject variance to total variance [22] and are suitable to compare variables of the same measurement class [23]. We computed two types of ICCs using the notation developed by Shrout and Fleiss [22]: ICC(2,1) was computed for each analyte using all 5 measurement occasions to estimate the temporal reliability of the analyte; ICC(2,k) was computed between batches to estimate the contribution of between-batch variance to the temporal reliability estimate. ICC(2,1) was computed as ICC(2,1) = , where BMS is the between-subjects mean square, EMS is the error mean square, k is the number of observations, OMS is the observations mean square, and n is the number of subjects [22]. ICC(2,k) was computed as ICC(2,k) = . We applied the formulas by Shrout and Fleiss [22] to obtain 95% CIs.

To estimate the percentage of true variance accounted for by a single measurement, we assumed that the best available estimate of the true variance was the total variance for all occasions.

After calculating the Pearson correlation of each occasion with all other occasions, we considered the squared average of these correlations as the estimate of the most likely percent of true variance for which a single occasion could account. We used the formula

, where % σ T is the percent of true variance, r T is the Pearson correlation of each occasion with the total of all other occasions, and o is the number of occasions.


Overall means, number of samples, and means by measurement occasion for all analytes (Table 2) indicate the overall stability for the analytes over one year. Although estrogen and progesterone levels were on the average 7% higher and gonadotropins and urinary sex hormone metabolites 10% lower in the intervention than in the control group (data not shown), none of the differences was even close to statistical significance (p values ranged from p = 0.16 to p = 0.90 for Estrone-sulfate and Estrone respectively). Because of this homogeneity, results in this study were collapsed across experimental groups. The decrease in E2 and E1 are the result of laboratory drift and were independent of intervention status).)[13].

Table 2 Basic descriptive data for all measurement occasions of all analytes

The measurement occasions required to obtain a reliable estimate differed considerably by analyte (Table 3). Using the relative method to account for 81% of the true variance, the number of occasions required ranged from O R = 0.48 to O R = 11.43 (for SHBG and E1 respectively). To account for 64% of the true variance, the number of occasions ranged from O R = 0.20 to O R = 4.77 (for SHBG and E1 respectively). Using the absolute method, the number of occasions required to obtain an estimate to within 20% of the true mean, ranged from O A = 0.34 to O A = 10.27 (for E2 and PDG respectively). It appears that, except for SHBG and E1S, using a single measurement for any of the analytes in this analysis may be problematic for the typical purposes of epidemiological research because the results of typical epidemiological research center on analyses of the mean value obtained from one group vs. the mean value obtained from another, e.g. a group of cases or an intervention group vs. a control group.

Table 3 Minimum occasions required to obtain a reliable estimate, intraclass correlation coefficients, and percent of true variance accounted for by single measurements

Figures 2 to 4 illustrate the different relationship of between- to within-subject variance and the corresponding difference between O R and O A .

Figure 2
figure 2

Sex hormone-binding globulin values for all participants by measurement occasion

Figure 3
figure 3

Pregnanediol-3-glucuronide values for all participants by measurement occasion

Figure 4
figure 4

Logged estradiol values for all participants by measurement occasion

In the case of SHBG (Figure 2), within-subject variance is small relative to between-subject variance. There is little variation within subjects relative to the variation between subjects, resulting in small O R and O A estimates (0.48 and 1.78 respectively). The PDG values (Figure 3) illustrate the case in which within subject variation is high and overlap one another considerably, resulting in relatively large O R and O A estimates (5.17 and 10.27 respectively). Finally, Figure 4 depicts the case in which within-subject variance is small, but so is the variance between subjects. In this case, the small within-subject variance results in a small O A estimate (0.34), but because the within-subject variance is not small relative to the between-subject variance, the O R is relatively large (8.26).

Because ICCs include both within- and between-subject variance, ICCs closely followed O R rather than O A estimates.ICC(2,1) ranged from ICC(2,1) = 0.30 to ICC(2,1) = 0.88 (for LH and SHBG respectively, Table 3). The intraclass correlation coefficient ICCs for absolute agreement between the two analysis batches ranged from ICC (2,1) = 0.47 to ICC (2,1) = 0.96 (for E1 and SHBG respectively, Table 4). Estimates of ICCs were, generally, consistent across batches, with similar estimates based on analysis of all 5 occasions and for estimates based on each batch. The between batch ICC for E1, however, was less than 0.5, suggesting that the batch 1 ICC may be a better indicator than the ICC based on all samples. The percent of true variance accounted for by a single measurement ranged from 29% to 92% for LH and SHBG respectively.

Table 4 Intraclass correlation coefficients between batches for analytes analyzed in 2 batches


We have provided estimates to the minimum number of measurement occasions required to ensure adequate reliability for two types of experimental aims. Analyses in epidemiologic studies involve calculations in which between-subject as well as within-subject variance is important. Therefore, O R will usually be the appropriate index of the minimum number of occasions needed to obtain a reliable estimate. Estimates of O R based on our sample suggest that only SHBG and E1S had sufficient temporal stability to be adequately reliable with a single measurement when the desired amount of variance to account for was set as low as 64%. A single measurement of any of the other analytes would be unlikely to account for even 50% of the true variance. For cases in which the within-subject variance is the only variance of interest, e.g., when the measured value of an analyte will be compared with a fixed standard, O A will be the appropriate index. The omission of between-subject variance from the formula for calculating this statistic produces very different results from O R . Several of the analytes that were adequately reliable with a single measurement or very few measurements, when between-subject variance was a factor, required higher numbers of measures when only within-subject variance was involved and vice versa.

This study confirms previous findings that SHBG may be reliably measured in premenopausal women using a single occasion. It also indicates that E1S may be reliably measured using one sample only. More importantly, our results suggest that none of the other analytes examined meet minimal reliability requirements that would permit confidence in single measures. These results are in agreement with the wide range if ICCs reported in previous studies [46]. Our conclusions are limited to the collection of samples at midluteal phase, however, and may not generalize to other phases of the menstrual cycle.

The use of ICCs to estimate the agreement between analysis batches differs from their use as an index of temporal reliability. The appropriate type of ICC for this purpose uses a mean of several values rather than single values and is typically higher than that calculated using single values. Though the ICCs between batches were higher than those estimating temporal reliability, they were relatively low, demonstrating the importance of measuring all samples in one batch when possible. As was previously noted [11], error due to time in storage will affect estimates of temporal reliability. Analyzing in multiple batches is one means of decreasing this source of error, but runs the risk of increasing error due to multiple batches. Until better estimates of the impact of storage time on each of these analytes are available, however, it will be difficult to draw conclusions about whether error due to multiple analysis batches or error due to storage time has the more detrimental effect on temporal reliability.

Several sources of error are effectively beyond researchers' capacity to control. For example, the validity and reliability of the best assay available for measuring a given analyte cannot be increased through improving study design. Other sources of error, however, can be dramatically reduced through the use of appropriate designs. These strategies may include, increasing the sample size to reduce the impact of random error, analyzing all samples in one batch, and using a sufficient number of repeated measures to obtain an adequately reliable estimate. It is also possible, though not uncontroversial, to control error statistically by correcting for attenuation using validation data [24].

Several improvements, in addition to a larger sample and more repeated measures, would have increased confidence in the results of our study. First, if the effects of storage time on the analytes were known, we could have taken into account the contributions of this source of variance to our temporal reliability estimates and distinguished its impact from that due to assay reliability. Second, obtaining blood and urine samples on day 5 following ovulation was most appropriate for the measurement of progesterone and near-optimal for SHBG, but may not have been the best day to obtain estimates of the other analytes [25]. Third, though our data were drawn from an intervention study in which no results approached significance, a more clearly homogeneous sample would have been preferable. Fourth, variation in menstrual cycle length and variance due to pulsatility of excretion were additional sources of error.

Finally, our estimates were based on targets that changed across measurements, and we could not assume error-free measurements. Consequently, we were not able to precisely distinguish between the contributions of assay reliability and the contributions of each analyte's natural variability to our estimates of temporal reliability. However, despite some limitations, this study provided significant new insights into the variability of sex hormones, gonadotropins, and urinary hormone metabolites in premenopausal women during a one-year period. Our estimates of temporal reliability represent the combined computation of the consistency of a measure across repeated measurements and the temporal fluctuations in the target of measurement.


Given the relatively large sample size for this analysis and the strictly controlled protocol to collect samples on the same day of the menstrual cycle, our results will be useful for designing future research projects exploring the role of sex hormones in the etiology of cancer and other diseases.


  1. Muti P, Bradlow HL, Micheli A, Krogh V, Freudenheim JL, Schunemann HJ, et al: Estrogen metabolism and risk of breast cancer: a prospective study of the 2:16alpha-hydroxyestrone ratio in premenopausal and postmenopausal women. Epidemiology. 2000, 11: 635-640. 10.1097/00001648-200011000-00004.

    Article  CAS  PubMed  Google Scholar 

  2. Parslov M, Lidegaard O, Klintorp S, Pedersen B, Jonsson L, Eriksen PS, et al: Risk factors among young women with endometrial cancer: a Danish case-control study. Am J Obstet Gynecol. 2000, 182: 23-29.

    Article  CAS  PubMed  Google Scholar 

  3. Moreira Kulak CA, Schussheim DH, McMahon DJ, Kurland E, Silverberg SJ, Siris ES, et al: Osteoporosis and low bone mass in premenopausal and perimenopausal women. Endocr Pract. 2000, 6: 296-304.

    Article  CAS  PubMed  Google Scholar 

  4. Michaud DS, Manson JE, Spiegelman D, Barbieri RL, Sepkovic DW, Bradlow HL, et al: Reproducibility of plasma and urinary sex hormone levels in premenopausal women over a one-year period. Cancer Epidemiol Biomarkers Prev. 1999, 8: 1059-1064.

    CAS  PubMed  Google Scholar 

  5. Muti P, Trevisan M, Micheli A, Krogh V, Bolelli G, Sciajno R, et al: Reliability of serum hormones in premenopausal and postmenopausal women over a one-year period. Cancer Epidemiol Biomarkers Prev. 1996, 5: 917-922.

    CAS  PubMed  Google Scholar 

  6. Toniolo P, Koenig KL, Pasternack BS, Banerjee S, Rosenberg C, Shore RE, et al: Reliability of measurements of total, protein-bound, and unbound estradiol in serum. Cancer Epidemiol Biomarkers Prev. 1994, 3: 47-50.

    CAS  PubMed  Google Scholar 

  7. Nunnally JC, Bernstein IH: Psychometric Theory. 1994, New York: McGraw-Hill, Inc, 3

    Google Scholar 

  8. Greenland S: Basic methods for sensitivity analysis of biases. Int J Epidemiol. 1996, 25: 1107-1116.

    Article  CAS  PubMed  Google Scholar 

  9. Wong MY, Day NE, Wareham NJ: Measurement error in epidemiology: the design of validation studies II: bivariate situation. Stat Med. 1999, 18: 2831-2845. 10.1002/(SICI)1097-0258(19991115)18:21<2831::AID-SIM282>3.3.CO;2-V.

    Article  CAS  PubMed  Google Scholar 

  10. Gail MH, Fears TR, Hoover RN, Chandler DW, Donaldson JL, Hyer MB, et al: Reproducibility studies and interlaboratory concordance for assays of serum hormone levels: estrone, estradiol, estrone sulfate, and progesterone. Cancer Epidemiol Biomarkers Prev. 1996, 5: 835-844.

    CAS  PubMed  Google Scholar 

  11. Bolelli G, Muti P, Micheli A, Sciajno R, Franceschetti F, Krogh V, et al: Validity for epidemiological studies of long-term cryoconservation of steroid and protein hormones in serum and plasma. Cancer Epidemiol Biomarkers Prev. 1995, 4: 509-513.

    CAS  PubMed  Google Scholar 

  12. Falk RT, Gail MH, Fears TR, Rossi SC, Stanczyk F, Adlercreutz H, et al: Reproducibility and validity of radioimmunoassays for urinary hormones and metabolites in pre- and postmenopausal women. Cancer Epidemiol Biomarkers Prev. 1999, 8: 567-577.

    CAS  PubMed  Google Scholar 

  13. Maskarinec G, Williams A, Inouye J, Stanczyk F, Franke A: A Randomized isoflavone intervention among premenopausal women. Cancer Epidemiol Biomarkers Prev. 2002, 11: 195-201.

    CAS  PubMed  Google Scholar 

  14. Rudy EB, Estok P: Professional and lay interrater reliability of urinary luteinizing hormone surges measured by OvuQuick test. J Obstet Gynecol Neonatal Nurs. 1992, 21: 407-411.

    Article  CAS  PubMed  Google Scholar 

  15. Goebelsmann U, Bernstein GS, Gale JA, Kletzky OA, Nakamura RM, Coulson AH, et al: Serum gonadotropin testosterone estradiol and estrone levels prior to and following bilateral vasectomy. In Vasectomy: Immunologic and Pathophysiologic Effects In Animals And Man. Edited by: Lepow IH, Crozier R. 1979, New York: Academic Press, 165.

    Google Scholar 

  16. Sodergard R, Backstrom T, Shanbag V, Carstensen H: Calculation of free and bound fractions of testosterone and estradiol-17α to human plasma proteins at body temperature. Steroid Biochem Mol Biol. 1982, 16: 801-810.

    Article  CAS  Google Scholar 

  17. Munro CJ, Stabenfeldt GH, Cragun JR, Addlego LA, Overstreet JW, Lasley BL: Relationship of serum estradiol and progesterone concentrations to the excretion profiles of their major urinary metabolites as measured by enzyme immunoassay and radioimmunoassay. Clin Chem. 1991, 37: 638-644.

    Google Scholar 

  18. Falk RT, Rossi SC, Fears TR, Sepkovic DW, Migella A, Adlercreutz H, et al: A new ELISA kit for measuring urinary 2-hydroxyestrone, 16alpha-hydroxyestrone, and their ratio: reproducibility, validity, and assay performance after freeze-thaw cycling and preservation by boric acid. Cancer Epidemiol Biomarkers Prev. 2000, 9: 81-87.

    CAS  PubMed  Google Scholar 

  19. Nelson M, Black AE, Morris JA, Cole TJ: Between- and within-subject variation in nutrient intake from infancy to old age: estimating the number of days required to rank dietary intakes with desired precision. American Journal of Clincal Nutrition. 1989, 50: 155-167.

    CAS  Google Scholar 

  20. Wilkens LR, Le Marchand L, Harwood P, Cooney RV: Use of Breath Hydrogen and Methane as Markers of Colonic Fermentation In Epidemiological Studies: Variability in Exretion. Cancer Epidemiol Biomarkers Prev. 1994, 3: 149-153.

    CAS  PubMed  Google Scholar 

  21. Beaton GH, Milner J, Corey P, McGuire V, Cousins M, Stewart E, et al: Sources of variance in 24-hour dietary recall data: Implications for nutrition study desigh and interpretation. American Journal of Clinical Nutrition. 1979, 32: 2546-2549.

    CAS  PubMed  Google Scholar 

  22. Shrout PE, Fleiss JL: Intraclass Correlations: Uses in Assessing Rater Reliability. Psychological Bulletin. 1979, 86: 420-428. 10.1037//0033-2909.86.2.420.

    Article  CAS  PubMed  Google Scholar 

  23. McGraw KO, Wong SP: Forming Inferences About Some Intraclass Correlation Coefficients. Psychological Methods. 1996, 1: 30-46. 10.1037//1082-989X.1.1.30.

    Article  Google Scholar 

  24. Wong MY, Day NE, Bashir SA, Duffy SW: Measurement error in epidemiology: the design of validation studies I: univariate situation. Stat Med. 1999, 18: 2815-2829. 10.1002/(SICI)1097-0258(19991115)18:21<2815::AID-SIM280>3.3.CO;2-R.

    Article  CAS  PubMed  Google Scholar 

  25. Ahmad N, Pollard M, Unwin N: The optimal timing of blood collection during the menstrual cycle for the assessment of endogenous sex hormones: can interindividual differences in levels over the whole cycle be assessed on a single day?. Cancer Epidemiol Biomarkers Prev. 2002, 11: 147-151.

    CAS  PubMed  Google Scholar 

Pre-publication history

Download references


The authors gratefully acknowledge the valuable assistance, advice, and guidance provided by Lynne R. Wilkens, Dr PH, and Ian Pagano, MA, both of the Cancer Research Center of Hawaii. We are grateful to the women who donated their time and effort to participate in this study. The project was funded by a contract from the Pharmavite Corporation in San Fernando, California and by a Developmental Funds award from the Cancer Center Support grant to the Cancer Research Center of Hawaii (P30CA071789).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Gertraud Maskarinec.

Additional information

Competing interests

This project was supported by the Pharmavite Corporation in San Fernando, California and a Developmental Funds award from the Cancer Center Support grant to the Cancer Research Center of Hawaii (P30CA071789).

Authors' contributions

AW conceived of the study and performed the statistical analyses. GM was the primary investigator on the original study from which the data for this study was drawn and contributed to the design of this study. FS carried out the immunoassays and contributed to the writing-up of this study. AF participated in the study design and consulted with the authors.

Andrew E Williams, Gertraud Maskarinec, Adrian A Franke and Frank Z Stanczyk contributed equally to this work.

Authors’ original submitted files for images

Rights and permissions

Reprints and permissions

About this article

Cite this article

Williams, A.E., Maskarinec, G., Franke, A.A. et al. The temporal reliability of serum estrogens, progesterone, gonadotropins, SHBG and urinary estrogen and progesterone metabolites in premenopausal women. BMC Women's Health 2, 13 (2002).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: