How does sensitivity change with prevalence
If we test in a high prevalence setting, it is more likely that persons who test positive truly have disease than if the test is performed in a population with low prevalence.. Sensitivity is two-thirds, so the test is able to detect two-thirds of the people with disease. The test misses one-third of the people who have disease. In other words, 45 persons out of 85 persons with negative results are truly negative and 40 individuals test positive for a disease which they do not have.
The sensivity and specificity are characteristics of this test. Using the same test in a population with higher prevalence increases positive predictive value. Conversely, increased prevalence results in decreased negative predictive value. When considering predictive values of diagnostic or screening tests, recognize the influence of the prevalence of disease. Philadelphia, WB Saunders, , p. Minimizing false positives is important when the costs or risks of followup therapy are high and the disease itself is not life-threatening The two contexts i.
Of particular importance, although it is desirable to have tests with high sensitivity and specificity, the values for those two metrics should not be relied on when making decisions about individual people in screening situations. The lack of correspondence between sensitivity, specificity, and predictive values is illustrated by the inconsistent pattern of entries in Table 1 and should become more obvious in the next section. Because the pairs of categories into which people are placed when sensitivity and specificity values are calculated are not the same as the pairs of categories that pertain in a screening context, there are not only important distinctions between sensitivity and PPV, and between specificity and NPV, but there are also distinct limitations on sensitivity and specificity for screening purposes.
Akobeng [ 9 , p. Sensitivity does not provide the basis for informed decisions following positive screening test results because those positive test results could contain many false positive outcomes that appear in the cell labeled b in Figure 1.
Those outcomes are ignored in determining sensitivity cells a and c are used for determining sensitivity. Therefore, of itself a positive result on a screening test, even if that test has high sensitivity, is not at all useful for definitely regarding a condition as being present in a particular person.
Conversely, specificity does not provide an accurate indication about a negative screening test result because negative outcomes from a screening test could contain many false negative results that appear in the cell labeled c, which are ignored in determining specificity cells b and d are used for determining specificity. Therefore, of itself , a negative result on a screening test with high specificity is not at all useful for definitely ruling out disease in a particular person. Failing to appreciate the above major constraints on sensitivity and specificity arises from what is known in formal logic as confusion of the inverse An example of this with regard to sensitivity, consciously chosen in a form that makes the problem clear, would be converting the logical proposition This animal is a dog; therefore it is likely to have four legs into the illogical proposition This animal has four legs; therefore it is likely to be a dog.
A parallel confusion of the inverse can occur with specificity. An example of this would be converting the logical proposition This person is not a young adult; therefore this person is not likely to be a university undergraduate into the illogical proposition This person is not a university undergraduate; therefore this person is not likely to be a young adult. These examples demonstrate the flaws in believing that a positive result on a highly sensitive test indicates the presence of a condition and that a negative result on a highly specific test indicates the absence of a condition.
Instead, it should be emphasized that a highly sensitive test, when yielding a positive result, by no means indicates that a condition is present many animals with four legs are not dogs , and a highly specific test, when yielding a negative result, by no means indicates that a condition is absent many young people are not university undergraduates.
Despite the above reservations concerning sensitivity and specificity in a screening situation, sensitivity and specificity can be useful in two circumstances but only if they are extremely high. First, because a highly sensitive screening test is unlikely to produce false negative outcomes there will be few entries in cell c of Figure 1 , people who test negative on that kind of screening test i.
Expressed differently, high sensitivity permits people to be confidently regarded as not having a condition if their screening test yields a negative result. Second, because a highly specific screening test is unlikely to produce false positive results there will be few entries in cell b in Figure 1 , people are very unlikely to be categorized as having a condition if they indeed do not have it.
Expressed differently, high specificity permits people to be confidently regarded as having a condition if their diagnostic test yields a positive result. The mnemonics snout and spin , it must be emphasized, pertain only when sensitivity and specificity are high. Their pliability, therefore, has some strong limitations. Furthermore, these mnemonics are applied in a way that might seem counterintuitive.
In addition, Pewsner et al. As a consequence, both sensitivity and specificity remain unhelpful for making decisions about individual people in most screening contexts, and PPV and NPV should be retained as the metrics of choice in those contexts. Considerations might also include over- versus under-application of diagnostic procedures as well as the possibility of premature versus inappropriately delayed application of diagnostic procedures.
Input from clinicians and policymakers is likely to be particularly informative in any deliberations. Decisions about desirable PPVs and NPVs can be approached from two related and complementary, but different, directions.
One approach involves the extent to which true positive and true negative results are desirable on a screening test. The other approach involves the extent to which false positive and false negative results are tolerable or even acceptable.
A high PPV is desirable, implying that false positive outcomes are minimized, under a variety of circumstances. Some of these are when, relative to potential benefits, the costs including costs associated with finances, time, and personnel for health services, as well as inconvenience, discomfort, and anxiety for clients are high.
A high PPV, with its concomitant few false positive screening test results, is also desirable when the risk of harm from follow-up diagnosis or therapy including hemorrhaging and infection is high despite the benefits from treatment also being high, or when the target condition is not life-threatening or progresses slowly. Under these circumstances, false positive outcomes can be associated with overtreatment and unnecessary costs and prospect of iatrogenic complications.
False positive outcomes may also be annoying and distressing for both the providers and the recipients of health care. A moderate PPV with its greater proportion of false positive screening test outcomes might be acceptable under a number of circumstances, most of which are the opposite of the situations in which a high PPV is desirable.
For example, a certain percentage of false positive outcomes might not be objectionable if follow-up tests are inexpensive, easily and quickly performed, and not stressful for clients. In addition, false positive screening outcomes might be quite permissible if no harm is likely to be done to clients in protecting them against a target condition even if that condition is not present. For example, people who are mistakenly told that they have peripheral artery disease, despite not actually having it, are likely to benefit from adopting advice to exercise appropriately, improve their diet, and discontinue smoking.
A high NPV is desirable, implying that false negatives are minimized, under a different set of circumstances. Some of these are a condition being serious, largely asymptomatic, or contagious, or if treatment for a condition is advisable early in its course, particularly if the condition can be treated effectively and is likely to progress quickly.
Clinical Examination. An Introduction to the Arclight. Eye Drops Overview. Prescribing in Renal Impairment. Interpreting Hepatitis B Serology. Medicine Flashcard Collection. A collection of surgery revision notes covering key surgical topics.
Surgery Flashcard Collection. Septic Arthritis. Compartment Syndrome. Anatomical Planes. Anatomy Flashcard Collection. The Inguinal Canal. A man with penile swelling. A man with blood in his urine. PSA Question Bank. Medical Student Finals Question Bank. Sackett and colleagues demonstrated early on that for any given sensitivity and specificity the false-to-true positive ratio will decrease and the positive predictive value will increase with increasing prevalence[ 14 ].
It is also well acknowledged that sensitivity and specificity would likely change with varying prevalence, although this may be a manifestation of changing patient spectrum, with prevalence playing a secondary role[ 11 ]. The incorporation of prevalence along with sensitivity and specificity has further been described in conducting meta-analyses of diagnostic tests[ 15 ].
In addition, the issue of pre-test probability has been considered particularly relevant when using tests with an implicit or subjective threshold, where clinicians may move their subjective threshold in response to the perception of increased prevalence[ 16 , 17 ]. In the current analysis, we used screening for LTBI to demonstrate the importance of considering disease prevalence when evaluating such trade-offs in testing strategy decisions.
We chose TB as an example because of its growing worldwide importance, its variations in prevalence Figure 1 , its diagnostic issues such as comorbidities and latent-versus-active disease, and the critical role of health systems and resources in determining optimal screening and treatment programs[ 3 , 4 ]. This slowing is in part due to the lower sensitivity and specificity of the standard TB skin test among certain populations[ 6 — 8 ] as well as the challenges in determining appropriate testing strategies in settings of highly varied levels of disease prevalence and resource constraints.
For example, in lower-burdened and higher-resourced countries such as the US, TB-control strategies target high-specificity LTBI screening and treatment to prevent later conversion to active TB. So although the introduction of newer tests such as QFT-IT and T-Spot can offer improved operating characteristics, improvements in outcomes depend on the establishment of testing strategies that are specific to each setting. Thus, with the recent introduction of new tests for TB and the publication of WHO and FDA guidelines regarding their implementation, this analysis provides a timely demonstration that highly specific IGRA tests cause more harm and generate fewer benefits when used in high-prevalence countries, where there would be too many false negatives, too little treatment of diseased individuals, and more future illness and disease spread.
The lower the prevalence, the more specific the test should be. Our analysis is not only useful for making decisions between tests but also in determining setting-specific positive test thresholds. This decision recognized that changing the threshold would decrease test sensitivity, yet it would increase specificity and result in improved outcomes for this setting.
However, because of the inherent nature of the test, it is possible that this changed threshold may not increase specificity to the levels of the QFT-IT[ 10 ]. Therefore, if the revised T-Spot sensitivity and specificity values were known and included in the current analysis, the magnitude of differences in outcomes between the two tests would clearly diminish; however; the degree of decline is uncertain and is unlikely to be absolute. More research is needed to clearly determine the specificity of IGRAs in settings of varying prevalence, and in particular of the T-Spot assay with the revised US threshold.
Our TB example is a good demonstration of the issue of determining appropriate diagnostic testing strategies when the optimal sensitivity-specificity balance varies throughout the world. Consider, for example, two countries: one developed, the other not. Healthcare in the developed country is generally good, multi-drug resistant TB is relatively rare, and TB prevalence is relatively low.
In the developing country, healthcare access is more limited, resistant TB is more common, and TB prevalence is higher. The developing country may find that with higher disease prevalence, the greater increase in early detection is worth the increased treatment of false-positive cases, especially given the poorer access to medical services.
This is not to say that the trade-off is not worthwhile in the developed country or that it is worthwhile in the developing country. Resources and local priorities and values should determine that. Rather, one should not expect the trade-off to be similar in different areas; indeed, it may differ by orders of magnitude as prevalence varies.
Despite this differential impact between settings, testing decisions do not always consider specific populations and disease characteristics, and like those for QFT-IT and T-Spot, positive-result thresholds are usually set at a global level by manufacturers and applied consistently across countries[ 12 , 13 ].
Given that the prevalence of many diseases varies worldwide, encouraging policymakers to explicitly incorporate disease prevalence in their testing decisions and allowing them to choose setting-specific thresholds — or to choose from a menu of possible choices — could increase the value of a given test by optimizing test performance and improving health and economic outcomes.
Tuberculosis is a good example for demonstrating the impact of prevalence in decisions regarding positive thresholds and test strategies because of issues such as the challenges of estimating accurate test operating characteristics, the varying disease prevalence, and the differences between active and latent infection. Although such issues apply when testing for any disease, they must be taken into account when interpreting the implications of our analysis.
For example, the impact of incorrect LTBI diagnoses can be particularly difficult to estimate because of low treatment compliance and the challenge of estimating the impact of delayed diagnoses.
This analysis also ignores other issues involved in testing for the less-prevalent active TB[ 6 — 8 ]. To name but a few: variation in estimates of test sensitivity and specificity e. Testing programs may maximize benefit, minimize risk, and successfully prevent and treat disease only when all such factors are considered. Although the examples discussed herein come from only one disease TB , this should not be considered a limitation of the study.
Rather, this analysis demonstrates an epidemiologic principle that holds true for any disease, even though the magnitude of effect will vary from one disease to another. No matter what the sensitivity and specificity of a test are, the prevalence determines the absolute numbers of missed cases and over-treated non-cases.
Authors of primary studies and systematic reviews of diagnostic accuracy could be more aware of this issue. Future research should evaluate the benefit-risk trade-offs involved in incorporating new and standard tests, at varying positive test thresholds, and in high- and low-prevalence settings.
0コメント