Interpreting COVID-19 test accuracy

Sarah Bunn

DOI: https://doi.org/10.58248/RR45

High-quality diagnostic tests to detect a current infection, and antibody tests to detect a previous infection, are available.
No diagnostic or antibody test is 100% accurate. This results in both false positive and false negative results. The extent to which either false result is likely depends on several factors.
There is no gold standard reference against which to judge tests’ performance. Different tests may be evaluated against different reference standards. This make evaluating and comparing tests challenging.
The UK Government has set out detailed guidance for test manufacturers on the key features that tests should have, how they should be evaluated and the standards of accuracy that they should meet.
No one test is perfectly suited to all the purposes that they can be used for (such as diagnosing a patient, testing if someone is infectious, or as a screening tool for a large population).
The accuracy of tests quoted by manufacturers does not necessarily translate into their reliability when they are used outside the laboratory. This depends on several factors, including the purpose for which they are being used and how they are used.
The British Medical Journal has an interactive COVID-19 test calculator where you can explore how the features of a test influence the accuracy of results.
This is part of our rapid response content on COVID-19. You can view all our reporting on this topic under COVID-19.

Glossary of terms used in discussing SARS-CoV-2 tests

Antibody test: detects antibodies to SARS-CoV-2 produced during a current or previous infection.

Antigen test: detects viral material indicating a current infection.

Diagnostic test: a test that can confirm if someone is currently infected with SARS-CoV-2.

False negative: an incorrect result when someone with a SARS-CoV-2 infection tests negative.

False positive: an incorrect result when someone who does not have a SARS-CoV-2 infection tests positive.

Mass screening/testing: using tests in a large sample of healthy people to detect those who are currently infected.

Molecular test: a test that detects viral genetic material through PCR or newer laboratory techniques.

PCR test: Polymerase Chain Reaction, a type of molecular test.

Sensitivity: how well a test reports a positive result for people who have COVID-19 or SARS-CoV-2 antibodies.

Specificity: how well a test reports a negative result for people who do not have COVID-19 or SARS-CoV-2 antibodies.

Types of COVID-19 tests

There are two main types of test used to identify COVID-19 caused by infections with the SARS-CoV-2 virus. They either detect the presence of the virus or an immune response to it.

Detecting an active infection using a molecular or antigen test. This detects if someone has a current infection. There are several laboratory methods that can be used. These tests are used in the test, trace and isolate programmes in operation across the UK. They can diagnose or screen for infections so that decisions can be made about clinical treatment, and public health decisions, such as whether someone must self-isolate. They are also used as a research tool so that scientists and public health bodies can monitor the rate and spread of infections in the population, in defined regions, communities or specific groups.
Detecting a previous infection using an antibody test. This detects antibodies produced by the body against SARS-CoV-2. These tests indicate if someone has had the infection long enough to produce antibodies against it or has previously had an infection. They are sometimes called serological tests. You can read more about how they work and what they are used for in our article Antibody tests for COVID-19. At present their principal use is in surveillance and research programmes that seek to determine what proportion of a given population have been infected and to study the immune response to infection. This is an important part of vaccine development.

How reliable are tests?

As with any medical diagnostic test, data about how confident we can be about their accuracy and reliability is crucial. This is complex because it depends on several factors. These include how a test is evaluated and how the performance of a test may change when it is used in the real world rather than in a highly controlled laboratory environment. Depending on the context and purpose for which tests are used, different test characteristics may be more important than others. The accuracy of testing also depends on what proportion of the population have an infection (or antibodies) at any given time. This is explained later.

Comparing test performance

The accuracy of diagnostic tests is usually benchmarked against a highly reliable reference standard, sometimes called a ‘gold standard’. There is no gold standard reference test for COVID-19 and no generally accepted reference standard against which to measure a diagnostic test’s performance. Therefore, no test can claim 100% accuracy. This is also the case for antibody tests.

So far, most tests that detect SARS-CoV-2 infections are benchmarked against the testing type that is seen as the most accurate available so far. This is the RT-PCR (reverse transcription polymerase chain reaction) test which is carried out in a laboratory. It uses a technique and special equipment to increase the amount of viral genetic material from the sample so that it can be detected. This test is the mainstay of COVID-19 testing in the UK. Test samples are sent to and processed in NHS Trust laboratories, national public health agency laboratories and the UK Lighthouse Labs Network (a network of diagnostic centres focused on COVID-19 testing).

Similarly, there is no agreed reference standard for antibody tests. The pragmatic solution is to compare a test with a composite standard based on samples containing antibodies taken from patients with confirmed disease at an appropriate stage of infection. This is particularly important for antibody tests, because it takes time to build up levels of antibodies after becoming infected, typically about 2 weeks, although this can be longer in people with mild or no symptoms. The timing of the test is also important as levels of SARS-CoV-2 antibodies decrease over time. This could mean that it would be possible that someone who had been infected 6 months ago could now test negative. This might be because they no longer have antibodies or because they are present at a low level that the test is not able to detect.

Guidance for manufacturers on test standards

The National Institute for Health and Care Excellence and the Medicines and Healthcare products Regulatory Agency have published detailed guidance for test manufacturers about the essential test features and the standards that diagnostic and antibody tests should meet. This guidance sets out both the best approaches to evaluating test performance and the minimum reference standards to use, as well as more detailed information about the minimum levels for sensitivity and specificity of tests according to the context in which they are to be used. It also gives manufacturers a clear idea of the UK Government’s requirements on usability, safety and how quickly results need to be produced. This will help a manufacturer determine their capacity to supply tests at the volume required in order for them to be used nationally, in the NHS or screening programmes.

Many commercial tests to detect a current infection or antibodies are available. Comparing the relative performance of different commercial tests is difficult since manufacturers may compare their test’s performance using different reference standards. For this reason, public health agencies in the UK designed their own reference standard for diagnostic tests and evaluate the performance of several commercial tests against them in order to work out which ones are best suited for use by government, such as in the NHS or in infection surveillance projects. Public Health England has also carried out head-to-head evaluations of several commercial antibody tests. Data on commercial tests helps public health agencies develop communication materials explaining the limits of testing to professionals using them and to the public (this is particularly relevant for people who may access testing privately).

Sensitivity and specificity of tests

Understanding the extent to which a test can detect even very small amounts of virus or antibodies is paramount. This is called the limit of detection and refers to the minimum amount of material that a test can detect. This is important because some samples may contain less viral material or antibodies than others and the amount of virus and antibodies in the body changes as an infection progresses. There may also be differences in the amount of viral material in different parts of the body, so where and how a sample is taken is very important. Diagnostic tests also need to be able to distinguish between SARS-CoV-2 and other viruses that may be present in a sample, especially other coronaviruses that can cause respiratory infections.

When the accuracy of tests is discussed two important terms are used. This example talks about a diagnostic test to see if someone has an infection:

Sensitivity: the proportion of people with SARS-CoV-2 infection who test positive. A test with sensitivity of 95% would mean that 5 in 100 people who have COVID-19 would test negative (false negative). They have an infection, but the test says that they don’t.
Specificity: the proportion of people without SARS-CoV-2 infection who test negative. A specificity of 90% means that 10 in 100 people who are not infected still test positive (false positive). They do not have an infection, but the test says that they do.

Test sensitivity and specificity is reported by manufacturers by seeing how well their test is able to confirm the results for a group of reference samples that we know are either positive or negative:

Positive COVID-19 samples: taken from a group of people diagnosed with COVID-19 based on their clinical history, symptoms, other evidence indicating characteristic features of the disease (such as chest X-rays or CT scans) and genetic analysis of any viral material.
Negative COVID-19 samples: historic samples taken from people before the virus was circulating in humans. These may contain other viruses to check that a test is specific for SARS-CoV-2 only.

However, these numbers do not give a complete picture of a test’s reliability. The value of a test in real world use can be quite different, depending on how common the infection is in the population. This is because we do not know what proportion of the population are infected or how many people have antibodies. This can lead to false negative and false positive results.

For example, if a group of 10,000 patients were hospitalised with suspected COVID-19 symptoms in an outbreak area, it is likely that 90% of them are actually infected with SARS-CoV-2. For a diagnostic test with 95% sensitivity and 95% specificity the predictive accuracy of the test would be as follows:


	Infected with SARS-CoV-2	Not infected with SARS-CoV-2	Total	Predictive accuracy of test
Test positive	8,550 (true positive)	50 (false positive)	8,600	(8,550/8,600 x 100) = 99.4%
Test negative	450 (false negative)	950 (true negative)	1,400	(950/1,400 x 100) = 67.9%

Overall, the test is very good at identifying people with the infection. However, 50 people who are not infected will still test positive and 450 infected people test negative.

If there is a chance that only 5% of people are infected and 10,000 people are tested, the predictive accuracy of results for a test with 95% sensitivity and 95% specificity looks quite different:


	Infected with SARS-CoV-2	Not infected with SARS-CoV-2	Total	Predictive accuracy of test
Test positive	475 (true positive)	475 (false positive)	950	(475/950 x 100) = 50.0%
Test negative	25 (false negative)	9,025 (true negative)	9,050	(9,025/9,050 x 100) = 99.7%

This example shows that when disease (or antibody) prevalence is low in the population, the probability of a false positive result becomes higher, even using tests with reasonably high levels of sensitivity and specificity. Decisions about balancing test sensitivity and specificity depend on the purpose for which testing is being used for.

Test error is also amplified when tests are used in the “real world”. The results from tests performed under strict research laboratory conditions are not necessarily replicated when tests are used operationally, such as in large scale testing programmes. Errors can arise for several reasons. For example, a sample may be taken incorrectly or contaminated, and other sources of error can lead to more false results. There are no data yet on the extent to which operational use of tests in national COVID diagnostic testing programmes exacerbate the problem of false positive results, but one estimate based on tests using similar technologies for other viruses is 2.3%. The rate of false negatives in large programmes is also unknown but will be influenced by the timing of the test – samples taken in the early and late stages of infection are more likely to be falsely negative.

The British Medical Journal has an interactive COVID-19 test calculator where you can explore how the features of a test influence the accuracy of results.

Implications of test performance for how they are used

The characteristics of each test are important, but the way in which they will be used also has implications for the interpretation of the results. For example, the most important features of a test to be used for confirming a clinical diagnosis of a patient in hospital with COVID-19 are very different to those for a test intended for a mass screening programme. As the examples above show, tests with high sensitivity work well when there is a high chance that the person is infected. Specificity is much more important if tests are used to screen a very large population of people where most don’t have the infection.

Using tests to screen large populations

The rate of false positives and negatives has significant implications when a test is to be performed at scale as a screening tool, such as in a large workforce or at a national population level. This is because even if a test has 99% sensitivity and specificity, large numbers of people would get an incorrect positive result and subsequently be required to self-isolate, or get a false negative result and could go on to spread the virus. False positive results would also lead to additional unnecessary effort to identify contacts who would then be required to self-isolate. When the overall prevalence of infection in the population is low, the mass screening of asymptomatic people also increases the rate of false positive results. This then necessitates confirmatory testing using a second different test so that people know to either self-isolate or return to daily life.

The latest estimate from a national COVID-19 infection survey is that nearly 105,000 people had an infection between 13 and 19 September. In such a screening programme, the problems detailed above are present. However, such a survey still provides very useful information because statistical techniques can be used to control for false positives and false negatives. It also provides information over time and allows comparisons between different geographic areas – so the direction of the outbreak can be better understood. These findings can then be supported by other evidence, such as hospital admissions.

The scientific committee advising the Welsh Government on COVID-19 has published a framework for assessing the utility of tests in different scenarios as either a diagnostic or screening tool. A subgroup of the Scientific Advisory Group for Emergencies has also published recommendations on using tests in mass screening programmes.

You can find more content from POST on COVID-19 here.

You can find more content on COVID-19 from the Commons and Lords Libraries here.