This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Find out more here
Helping doctors make better decisions
Epidemiological studies measure characteristics of populations. The parameter of interest may be a disease rate, the prevalence of an
exposure, or more often some measure of the association between an exposure and disease. Because studies are carried out on
people and have all the attendant practical and ethical constraints, they are almost invariably subject to bias.
Selection bias occurs when the subjects studied are not representative of the target population about which conclusions are to be
drawn. Suppose that an investigator wishes to estimate the prevalence of heavy alcohol consumption (more than 21 units a week) in
adult residents of a city. He might try to do this by selecting a random sample from all the adults registered with local general
practitioners, and sending them a postal questionnaire about their drinking habits. With this design, one source of error would be the
exclusion from the study sample of those residents not registered with a doctor. These excluded subjects might have different patterns
of drinking from those included in the study. Also, not all of the subjects selected for study will necessarily complete and return
questionnaires, and non-responders may have different drinking habits from those who take the trouble to reply. Both of these
deficiencies are potential sources of selection bias. The possibility of selection bias should always be considered when defining a study
sample. Furthermore, when responses are incomplete, the scope for bias must be assessed. The problems of incomplete response to
surveys are considered further in.
The other major class of bias arises from errors in measuring exposure or disease. In a study to estimate the relative risk of congenital
malformations associated with maternal exposure to organic solvents such as white spirit, mothers of malformed babies were
questioned about their contact with such substances during pregnancy, and their answers were compared with those from control
mothers with normal babies. With this design there was a danger that "case" mothers, who were highly motivated to find out why their
babies had been born with an abnormality, might recall past exposure more completely than controls. If so, a bias would result with a
tendency to exaggerate risk estimates.
Another study looked at risk of hip osteoarthritis according to physical activity at work, cases being identified from records of admission
to hospital for hip replacement. Here there was a possibility of bias because subjects with physically demanding jobs might be more
handicapped by a given level of arthritis and therefore seek treatment more readily.
Bias cannot usually be totally eliminated from epidemiological studies. The aim, therefore, must be to keep it to a minimum, to identify
those biases that cannot be avoided, to assess their potential impact, and to take this into account when interpreting results. The motto
of the epidemiologist could well be "dirty hands but a clean mind" (manus sordidae, mens pura).
As indicated above, errors in measuring exposure or disease can be an important source of bias in epidemiological studies In
4. Measurement error and bias | BMJ http://www.bmj.com/about-bmj/resources-readers/publicati...
第1页 共4页 2013/4/30 17:27
conducting studies, therefore, it is important to assess the quality of measurements. An ideal survey technique is valid (that is, it
measures accurately what it purports to measure). Sometimes a reliable standard is available against which the validity of a survey
method can be assessed. For example, a sphygmomanometer's validity can be measured by comparing its readings with intraarterial
pressures, and the validity of a mammographic diagnosis of breast cancer can be tested (if the woman agrees) by biopsy. More often,
however, there is no sure reference standard. The validity of a questionnaire for diagnosing angina cannot be fully known: clinical
opinion varies among experts, and even coronary arteriograms may be normal in true cases or abnormal in symptomless people. The
pathologist can describe changes at necropsy, but these may say little about the patient's symptoms or functional state. Measurements
of disease in life are often incapable of full validation.
In practice, therefore, validity may have to be assessed indirectly. Two approaches are used commonly. A technique that has been
simplified and standardised to make it suitable for use in surveys may be compared with the best conventional clinical assessment. A
self administered psychiatric questionnaire, for instance, may be compared with the majority opinion of a psychiatric panel.
Alternatively, a measurement may be validated by its ability to predict future illness. Validation by predictive ability may, however,
require the study of many subjects.
When a survey technique or test is used to dichotomise subjects (for example, as cases or non-cases, exposed or not exposed) its
validity is analysed by classifying subjects as positive or negative, firstly by the survey method and secondly according to the standard
reference test. The findings can then be expressed in a contingency table as shown below.
Table 4.1 Comparison of a survey test with a reference test
Survey test result Reference test result Totals
Positive Negative
Positive True positives correctly identified = (a) False positives = (b) Total test positives = (a + b)
Negative False negatives = (c) True negatives correctly identified = (d) Total test negatives = (c + d)
Totals Total true positives = (a + c) Total true negatives = (b + d) Grand total = (a + b + c + d)
From this table four important statistics can be derived:
Sensitivity - A sensitive test detects a high proportion of the true cases, and this quality is measured here by a/a + c.
Specificity- A specific test has few false positives, and this quality is measured by d/b + d.
Systematic error - For epidemiological rates it is particularly important for the test to give the right total count of cases. This is
measured by the ratio of the total numbers positive to the survey and the reference tests, or (a + b)/(a + c).
Predictive value-This is the proportion of positive test results that are truly positive. It is important in screening, and will be discussed
further in Chapter 10.
It should be noted that both systematic error and predictive value depend on the relative frequency of true positives and true negatives
in the study sample (that is, on the prevalence of the disease or exposure that is being measured).
If the criteria for a positive test result are stringent then there will be few false positives but the test will be insensitive. Conversely, if
criteria are relaxed then there will be fewer false negatives but the test will be less specific. In a survey of breast cancer alternative
diagnostic criteria were compared with the results of a reference test (biopsy). Clinical palpation by a doctor yielded fewest false
positives(93% specificity), but missed half the cases (50% sensitivity). Criteria for diagnosing "a case" were then relaxed to include all
the positive results identified by doctor's palpation, nurse's palpation, or xray mammography: few cases were then missed (94%
4. Measurement error and bias | BMJ http://www.bmj.com/about-bmj/resources-readers/publicati...
第2页 共4页 2013/4/30 17:27
sensitivity), but specificity fell to 86%.
By choosing the right test and cut off points it may be possible to get the balance of sensitivity and specificity that is best for a particular
study. In a survey to establish prevalence this might be when false positives balance false negatives. In a study to compare rates in
different populations the absolute rates are less important, the primary concern being to avoid systematic bias in the comparisons: a
specific test may well be preferred, even at the price of some loss of sensitivity.
When there is no satisfactory standard against which to assess the validity of a measurement technique, then examining its
repeatability is often helpful. Consistent findings do not necessarily imply that the technique is valid: a laboratory test may yield
persistently false positive results, or a very repeatable psychiatric questionnaire may be an insensitive measure of, for example,
"stress". However, poor repeatability indicates either poor validity or that the characteristic that is being measured varies over time. In
either of these circumstances results must be interpreted with caution.
Repeatability can be tested within observers (that is, the same observer performing the measurement on two separate occasions) and
also between observers (comparing measurements made by different observers on the same subject or specimen). Assessment of
repeatability may be built into a study - a sample of people undergoing a second examination or a sample of radiographs, blood
samples, and so on being tested in duplicate. Even a small sample is valuable, provided that (1) it is representative and (2) the
duplicate tests are genuinely independent. If testing is done "off line" (perhaps as part of a pilot study) then particular care is needed to
ensure that subjects, observers, and operating conditions are all adequately representative of the main study. It is much easier to test
repeatability when material can be transported and stored - for example, deep frozen plasma samples, histological sections, and all
kinds of tracings and photographs. However, such tests may exclude an important source of observer variation - namely the techniques
of obtaining samples and records.
Independent replicate measurements in the same subjects are usually found to vary more than one's gloomiest expectations. To
interpret the results, and to seek remedies, it is helpful to dissect the total variability into its four components:
Within observer variation - Discovering one's own inconsistency can be traumatic; it highlights a lack of clear criteria of measurement
and interpretation, particularly in dealing with the grey area between "normal" and "abnormal". It is largely random-that is, unpredictable
in direction.
Between observer variation - This includes the first component (the instability of individual observers), but adds to it an extra and
systematiccomponent due to individual differences in techniques and criteria. Unfortunately, this may be large in relation to the real
difference between groups that it is hoped to identify. It may be possible to avoid this problem, either by using a single observer or, if
material is transportable, by forwarding it all for central examination. Alternatively, the bias within a survey may be neutralised by
random allocation of subjects to observers. Each observer should be identified by a code number on the survey record; analysis of
results by observer will then indicate any major problems, and perhaps permit some statistical correction for the bias.
Random subject variation -When measured repeatedly in the same person, physiological variables like blood pressure tend to show a
roughly normal distribution around the subject's mean. Nevertheless, surveys usually have to make do with a single measurement, and
the imprecision will not be noticed unless the extent of subject variation has been studied. Random subject variation has some
important implications for screening and also in clinical practice, when people with extreme initial values are recalled. Thanks to a
statistical quirk this group then seems to improve because its members include some whose mean value is normal but who by chance
had higher values at first examination: on average, their follow up values necessarily tend to fall ( regression to the mean). The size of
this effect depends on the amount of random subject variation. Misinterpretation can be avoided by repeat examinations to establish an
adequate baseline, or (in an intervention study) by including a control group.
Biased (systematic) subject variation -Blood pressure is much influenced by the temperature of the examination room, as well as by
less readily standardised emotional factors. Surveys to detect diabetes find a much higher prevalence in the afternoon than in the
morning; and the standard bronchitis questionnaire possibly elicits more positive responses in winter than in summer. Thus conditions
and timing of an investigation may have a major effect on an individual's true state and on his or her responses. As far as possible,
studies should be designed to control for this - for example, by testing for diabetes at one time of day. Alternatively, a variable such as
room temperature can be measured and allowed for in the analysis.
4. Measurement error and bias | BMJ http://www.bmj.com/about-bmj/resources-readers/publicati...
第3页 共4页 2013/4/30 17:27
The repeatability of measurements of continuous numerical variables such as blood pressure can be summarised by the standard
deviation of replicate measurements or by their coefficient of variation(standard deviation mean). When pairs of measurements have
been made, either by the same observer on two different occasions or by two different observers, a scatter plot will conveniently show
the extent and pattern of observer variation.
For qualitative attributes, such as clinical symptoms and signs, the results are first set out as a contingency table:
Table 4.2 Comparison of results obtained by two observers
Observer 1
Positive Negative
Observer 2 Positive a b
Negative c d
The overall level of agreement could be represented by the proportion of the total in cells a and d. This measure unfortunately turns out
to depend more on the prevalence of the condition than on the repeatability of the method. This is because in practice it is easy to
agree on a straightforward negative; disagreements depend on the prevalence of the difficult borderline cases. Instead, therefore,
repeatability is usually summarised by the statistic, which measures the level of agreement over and above what would be expected
from the prevalence of the attribute.
up
BMJ Group
Privacy and Cookie Policy
Website T & Cs
Revenue Sources
HighWire Press
Feedback
Help
© 2013 BMJ Publishing Group Ltd
4. Measurement error and bias | BMJ http://www.bmj.com/about-bmj/resources-readers/publicati...
第4页 共4页 2013/4/30 17:27
本文档为【4. measurement error and bias】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑,
图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。