Interpreting research evidence
What is validity, reliability, generalisability and applicability of research evidence? This section will focus on the way in which quality is assessed in quantitative and qualitative research.
This research glossary accompanies POST's research evidence content and provides definitions for terms used there.
This glossary is not intended to provide a complete list of definitions for all research concepts. The purpose of the glossary is to provide a list of agreed definitions that briefings in UK Parliament can refer to). Definitions used in this glossary have been reviewed by external experts. If there is a term that is not defined below, there are many other research glossaries available. For example, there are extensive, publicly available glossaries provided the National Institute for Health and Care Excellence, Cochrane, Public Health England, and many others.
A statistic that predicts the probability of an individual experiencing a particular event (such as developing a certain medical condition). For example, the absolute risk of a woman developing breast cancer across a lifetime is 12.5%. This means that out of every 1000 women, 125 will develop breast cancer in their lives (or, put another way, 1 in every 8 women).
A type of randomised controlled trial where two versions (A and B) of a particular intervention are compared (such as two different website designs) by users of a service being randomly assigned to receive A or B. They are considered a particularly robust study type as the randomisation reduces the likelihood of bias from extraneous variables.
An educated guess by a researcher that there is a relationship between the independent variable and the dependent variable (or in the case of an observational study, that there is a relationship between two or more factors being studied). An alternative hypothesis can predict a general outcome that differs from the null hypothesis (such as ‘there will be a difference between scores from the intervention group and the control group’) or a set outcome (such as ‘scores for the intervention group will be higher than for the control group’). Studies that involve hypothesis-testing are usually seeking to disprove a null hypothesis in favour of an alternative hypothesis.
A form of qualitative analysis that involves a researcher reviewing a dataset and developing hypotheses. During the process, the researcher collects more data and looks for any cases that do not fit the hypotheses. If there are cases that do not fit, the hypotheses are reformulated. The researcher then collects more data and looks for any cases that do not fit the reformulated hypotheses. This process continues until there are no cases that do not fit the hypotheses.
Attrition bias is the influence that participants leaving a study might have on the results. For example, if more participants are lost from a control group than an intervention group in a study, this could influence the results because those leaving the study may share certain characteristics. Therefore, when the groups are compared at the end of the study, there may appear to be a difference due to the intervention that is actually attributable to drop-out rates.
Any influence on a research study that could affect or distort the results or interpretation of results.
A longitudinal study that collects data at different points in time from a sample of participants born in a similar time period. The study tracks individuals across their life course.
A way to prevent biases (such as observer-expectancy effects, response bias and demand characteristics) caused by either participants or researchers knowing what the expected effects of a study are. A study can be non-blind, single-blind, double-blind or triple-blind.
The effect of one intervention on the outcomes of future interventions. Carryover effects can include practice effects and fatigue effects. Carryover effects are a particular issue for within-subject designs but their effect can be lessened with counterbalancing.
Also known as case-referent study. A type of observational study that compares the medical and lifestyle history of participants with a particular medical condition (cases) alongside participants without that medical condition (controls) who are matched for other variables. By comparing participants’ histories, some potential causative factors for the medical condition may be revealed.
Case designs are observational studies where participants have been exposed to a factor without a researcher intervening. Case designs involve looking at the outcomes related to a particular factor, such as the relationship between sustaining particular brain injuries and changes in language, memory or personality.
See case-control study.
A type of observational study that presents a detailed description of one particular individual or incident. For example, a case report might explain the symptoms and prognosis of a person with a rare illness or the events following a one-off natural disaster.
Also known as clinical series. A type of observational study that presents descriptions of a number of participants who have been exposed to a particular factor (such as having had a medical treatment or medical condition).
The proven effect of one factor on another. Causation is not the same as correlation, which is a potential relationship between two factors. Only experiments and non-equivalent groups designs can attempt to prove causation.
The tendency for people to cite work with positive results more than studies without positive results.
Any measure that has been approved for use in a medical setting to assess the health of an individual. Clinical measures include body mass index (BMI), blood pressure, lung capacity, cholesterol level and depression scales.
See case series.
Multi-stage studies designed to test whether medical interventions are safe and effective for patients. All medications in the UK are assessed by clinical trials before they are licensed for use. Clinical trials in the UK have four phases. Phase I tests the safety of an intervention on, typically, very few (often fewer than ten) patients, often healthy volunteers. Phase II tests for whether the invention is safe and works. It involves more participants (up to a few hundred). Phase III is usually a randomised controlled trial testing the intervention’s effectiveness, usually against the current standard treatment, on even more participants (up to several thousand). Phase IV collects data on the safety and effectiveness in patients after an intervention has been licensed.
Also known as group-randomised trial or place-randomised trial. A type of randomised controlled trial where randomisation happens at a group level. In a cluster randomised controlled trial, individuals are already in groups (such as schools, local councils, hospitals etc.) and these groups are randomly assigned to be control or intervention groups.
A way of sampling participants for a study based on their geographical location. For example, a cluster sample might involve recruiting an entire local school or a whole local hospital to take part in a study. Cluster sampling is a simple and cost-effective way to recruit lots of participants. However, because the sample is only drawn from one location, it may not be representative of the larger population.
A longitudinal study that collects data at set intervals from a sample of participants who share a specific characteristic, such as year of birth (birth cohort study) or having a particular health condition.
A cognitive bias where individuals tend to seek out, interpret and remember evidence that supports their pre-existing viewpoints.
In experiments and some quasi-experiments, a control group is used for comparison with the intervention group. The control group receives no treatment or a standard treatment that is already being used (such as the standard medication used for a certain condition).
Qualitative data are collected by observing the behaviour of individuals as covertly as possible in a managed environment, such as a laboratory (for example, watching the behaviour of children interacting with their parents in an unfamiliar environment). The qualitative data collected can be quantified (for example, by counting the number of instances of a certain behaviour).
A potential relationship between two factors where a change in one (such as amount of weekly cardiovascular exercise) appears to relate to change in another factor (such as lung capacity or percentage body fat). Correlations can be shown in studies using quasi-experimental designs and observational designs. Correlation is not the same as causation, where there is a proven effect of one factor on another. Correlations cannot prove causation. Correlations may occur between completely unrelated factors, just by chance. These are called spurious correlations.
A way to reduce biases caused by order effects or sequence effects by presenting interventions/controls (or test items, such as questions in a questionnaire) in all possible orders.
What could have happened if something had been different. For example, what would have been the chances of recovery for an individual if they had not received a medical intervention.
A measure of quality used in qualitative research, involving establishing that those who participated in the research find the results believable
A type of observational study that collects data from a group of participants in order to describe a population, such as recording the alcohol consumption of university students. They provide a snapshot of a certain factor in a population at a particular point in time. These studies can be repeated at a future point with a new set of participants (such as looking at alcohol consumption of university students every five years with a new group of students each time). These are called repeat cross-sectional studies and differ from other longitudinal studies because new participants are used each time, rather than using the same group throughout.
Indications (either explicit or implied) in studies for participants to behave in a certain way, which can bias results. Studies that measure behaviour are particularly prone to demand characteristics.
In an experiment or quasi-experiment, the outcome that is measured to see if the independent variable has an effect (such as the test scores of pupils receiving different teaching styles).
Statistics that are numerical summaries of quantitative data. They can include measures of central tendency (the average of the dataset) and measures of variability (how much most of the data differs from the average of the dataset). From these statistics alone, researchers cannot be sure whether the data match their educated guesses (hypotheses) as hypothesis-testing requires inferential statistics. They also cannot be used to make predictions from data about a larger population as this also requires inferential statistics.
A data collection method that uses accounts that are written or recorded by participants about their activities and habits. Diaries usually produce qualitative data. However, diaries may also include quantitative data, such as logs where participants keep track of the number of occasions they take part in an activity and the timings (for example, participants in a study on the link between exercise and well-being might log whenever they exercise over a set period).
A study designed to reduce potential bias from observer-expectancy effects, response bias and demand characteristics by ensuring that neither the participants nor the researchers know which participants are in intervention or control groups.
The number of participants who start a study but do not complete it for various reasons. Drop-out rates can lead to attrition bias.
A type of natural experiment in which a sub-population has been exposed to different factors that may affect health without a researcher intervening. Researchers can compare groups on a dependent variable. Ecological studies have included looking at the effect of radiation, air pollution and famine on various outcomes.
The magnitude, or size, of an effect from an intervention or factor in a research study. Effect size is different from statistical significance (the likelihood that results of a study occurred by chance). Effect sizes can be small or large. For example, in a study looking at an intervention to lose weight, participants in the intervention group might lose an average of 0.1kg more than those in the control group (a small effect size) or might lose an average of 20kg more (a large effect size).
A measure of quality for qualitative research that considers the extent to which the study increased empathy among participants.
Related to the concept of external validity, environmental generalisability is the extent to which the findings of a study could be applied to a different area. For example, a person may consider how generalisable the findings of an individual study are to different local areas, different regions or different countries.
A measure of quality for qualitative research that considers the extent to which the research outcomes and resulting changes are appropriate and fair.
The collection of qualitative data by observing and experiencing the behaviour of individuals from within the group being researched (for example, by documenting the behaviour of teachers in a staff room by joining the school staff).
A study that has two key features that are designed to reduce the influence of an extraneous variable. First, the independent variable is manipulated by the researcher. Second, participants are randomly assigned to the groups, so they have an equal chance of being in the control or intervention group.
The extent to which the conclusions of a study can be applied to different circumstances such as other populations (population generalisability), locations (environmental generalisability) or time periods (temporal generalisability).
Anything that may affect a measure besides the factor a researcher is interested in. For example, age could be an extraneous variable on a study looking at the effect of a medication on memory.
A decline in performance caused by participants becoming tired after carrying out a task repeatedly.
Discussions with several people at the same time on a specific topic or issue (for example, talking with a small group of people about their experience of using a particular social care service). The interaction between participants in the focus group is used to reveal areas of consensus and disagreement. Focus groups produce qualitative data (that may later be quantified, depending on the purpose and approach of the study).
Measures taken at intervals after the end of a research study, or after an initial implementation phase, to measure whether any effects found persist, grow or diminish over time.
The extent to which the findings of a study can be applied to other situations. Generalisability can be divided into population generalisability, environmental generalisability and temporal generalisability.
Grey literature comprises any research that is not published in an academic journal or academic book. It includes reports or other documents produced by industry, government departments, regulators and charities.
See cluster randomised controlled trial.
A type of standard standard-of-evidence of evidence where research is ranked on the quality and strength of its evidence from high to low, usually based on study type.
Educated guesses made by a researcher around what the data will reveal about a certain phenomenon. Researchers can use inferential statistics to test their hypotheses. There are two main types of hypothesis: the null hypothesis and the alternative hypothesis.
In an experiment or quasi-experiment, the factor that is changed between individuals or groups (such as the medication taken in a clinical trial) in order to see an effect.
Inferential statistics are mathematical ways of forming conclusions from quantitative data. In research studies, they are used for three main purposes: seeing which educated guesses (hypotheses) made by a researcher at the start of the study were accurate, estimating from a small dataset how much a certain phenomenon would be present in a larger population and exploring the relationship between factors.
How appropriate a study design was for answering a research question and how well the study was carried out. Aspects of internal validity include the way in which participants were selected, the reliability and validity of any measurements taken, and how well extraneous variables were controlled for.
Internationally agreed ways to measure the quantity of various properties, including time (seconds), length (metre), mass (kilogram) and amount of a substance (mole). It is the most widely used system of measurement. There are seven base units, all of which are metric and defined in terms of variant constants of nature, such as the speed of light in vacuum and the charge of the electron. Other measures can be derived from SI units (for example, voltage, pressure or radioactivity).
A design where a dependent variable is measured over time, then an intervention is introduced and the dependent variable continues to be measured. The dependent variable is compared between pre-intervention and post-intervention to explore if there is a difference.
In an experiment or quasi-experiment, any form of treatment (such as medication or training) given to a group of participants to measure its effect on an outcome.
A data collection method where an interviewer asks a participant questions (for example, interviewing individuals who have experienced homelessness). Structured interviews have set questions that are asked of each participant in the same order. Semi-structured interviews are more flexible and questions, and the order in which they are asked, may vary and/or be led by the discussion with each participant. Unstructured interviews tend to have no set questions and the conversation is led by the interviewee, although the interviewer usually has an idea of the sort of topics they want to cover. Interviews produce qualitative data (that can later be quantified, if needed.
Many fields are dominated by research published in English. This has led to bias in the way research in other languages is published and used. Researchers who are fluent in English and another language are more likely to publish positive results in English. Researchers are also more likely to cite work published in English than in other languages, which may lead to bias in secondary research if studies not written in English are excluded.
A data collection method where participants are asked to track the frequency and/or timing of certain events (such as exercise or food intake). Logs produce quantitative data and may be used alongside diaries.
A study that measures factors in a group over a period of time, revisiting the same participants at different points.
A measure commonly used in questionnaires where participants are asked to rate their experiences or opinions with a numerical value on a set scale. For example, a frequently-used Likert scale asks participants how much they agree with a statement on a five-point scale from 1 ‘strongly disagree’ to 5 ‘strongly agree’.
A type of secondary research where primary research is synthesised in a non-systematic way. Literature reviews do not have set methods for how they are conducted and can be at risk of unconscious or conscious bias from the researcher carrying out the review. This is because the researcher can choose what to include/exclude based on their pre-existing opinions or may only search for literature that confirms their pre-existing beliefs (confirmation bias).
Where data are input into a system and artificial intelligence is used to learn from the data how to carry out a certain task (such as carrying out data modelling).
A measure of central tendency that reveals the average of a dataset. It is calculated by adding up all the values in a dataset and then dividing by the number of values.
Information collected by monitoring or observing occurrences or properties. This can include measuring the physical or temporal properties of something that has been collected using an agreed system of measurement (for example, the international system of units provides agreed ways to measure the quantity of various properties). This can also include making observations of an individual or situation (such as monitoring the behaviour of birds on a bird table).
A measure of central tendency that reveals the average of a dataset. It is calculated by putting all the values in order and citing the middle value (or the mean of the two middle values if there are an even number of values).
A type of secondary research. Meta-analyses use data collated from primary research and are often performed alongside a systematic review. Meta-analyses extract data from the studies found during a systematic review and reanalyse the results as part of a larger dataset.
A type of quasi-experiment where researchers combine a non-equivalent groups design with a within-subject design to examine multiple interventions across multiple groups (such as looking at whether there are differences in which medical intervention is effective for men compared with women).
A research study that combines quantitative research methods and qualitative research methods. For example, a study could collect qualitative data in an ethnographic study and quantitative data through a survey.
A measure of central tendency that reveals the average of a dataset. It is the most common value in the dataset. There can be more than one mode in a dataset.
A way of using data to predict the effect of a particular factor on a system by creating a simplified computer simulation of the system being investigated. Some instances when models can be used include: if a researcher is interested in the effect of a small factor on a very large system, if a researcher is interested in the effect of a particular event but creating a situation where the event occurs would be unethical, or if a researcher is interested in the effect of something that has not yet happened (and may not happen). Models can be used in many different research areas, including ecology, engineering, astrophysics and economics.
Data collected and analysed by certain public bodies (ones confirmed as producers of official statistics by UK legislation) that have been assessed by the Office for Statistics Regulation for trustworthiness, quality and value and found to be fully compliant with the Code of Practice for Statistics. The Office for Statistics Regulation holds a list for all national statistics. Statistics that have been assessed by the Office for Statistics Regulation for trustworthiness, quality and value but are found not to be fully compliant with the Code of Practice for Statistics are designated official statistics.
A type of observational study where two sub-populations have been exposed to different factors without a researcher intervening. Researchers can compare these two groups on a dependent variable. For example, researchers have explored the random drafting of individuals into the armed forces to look at the effect of military service on lifetime earnings. Natural experiments where sub-populations are exposed to factors that may affect health are called ecological studies.
The collection of qualitative data by observing the behaviour of individuals as covertly as possible in a natural environment (for example, watching the play behaviour of children in a playground). The qualitative data collected can be quantified (for example, by counting the number of instances of a certain behaviour).
A study where the participants know which intervention they are receiving, as do the researchers carrying out the intervention and analysing the data. Non-blind studies are at the highest risk of influence from observer-expectancy effects, response bias and demand characteristics.
A type of quasi-experimental study with an intervention group and a control group being compared on a particular outcome (dependent variable) related to differences in an independent variable. The independent variable is not randomly assigned, often because it is impossible to assign it randomly (such as sex or age) or unethical to assign it randomly (such as malnutrition or smoking status). As there is not random assignment into groups, non-equivalent groups designs are at risk from extraneous variables influencing the results.
An educated guess made by a researcher that there is no relationship between the independent variable and the dependent variable (or in the case of an observational study, that there is no relationship between two or more factors being studied). Studies that involve hypothesis-testing are often seeking to disprove the null hypothesis in favour of an alternative hypothesis.
A study that does not meet either of the criteria of an experiment, meaning that it does not randomly assign an independent variable and does not have a traditional control group. Observational studies are so-called because researchers are not intervening but are instead observing phenomena. Observational studies can indicate correlation but cannot infer causation.
The bias caused by an experimenter intentionally or unintentionally influencing participants to produce the results that they expect.
Data collected and analysed by particular public bodies (those confirmed as producers of official statistics by UK legislation) that have been assessed by the Office for Statistics Regulation for trustworthiness, quality and value. If the statistics are assessed as fully compliant with the Code of Practice for Statistics by the Office for Statistics Regulation then they are designated national statistics.
A type of quasi-experiment where a study has an intervention group but no control group. Instead, measures are taken before and after an intervention and compared. Without a control group, there is no comparison to what might have happened without the intervention. This study type is likely to have low internal validity compared to other quasi-experiments and experiments.
The bias where more easily available research (open access) is cited more frequently than research that is less easily available (subscription only).
Research available in full for free online.
Recruiting participants from at a particular location and time, such as conducting a survey with shoppers at a supermarket on a Saturday morning.
The influence on a dependent variable caused by what number in an order an intervention is (for example, first, second, third and so on). Order effects can be controlled for by counterbalancing.
Any value that is far outside of an expected range, making it very different from the others (for example, an individual who scores 20% higher on a test than any other person).
The phenomenon where researchers collect, select or analyse data in a way that favours a ‘statistically significant’ result (either deliberately or accidentally).
A type of cross-sectional study that collects data from the same participants at intervals over time. Participants are selected based on certain characteristics, such as location or year of birth. For example, household panel studies look at how chosen households change over time and birth cohort studies (a type of panel study) track people born in the same year across their life course.
The influence of participants who have been invited to be part of a study choosing not to be involved. Participants who choose not to be involved may share common characteristics and missing them from a sample of the population could mean a study’s results are biased.
Related to demand characteristics, the phenomenon where a participant’s expectation of showing an improvement in a clinical measure (such as blood pressure or symptoms of depression) results in an improvement in these measures regardless of the intervention.
A substance or treatment that should have no clinical effect given to control groups so that the intervention can be compared to the improvements that occur just from the placebo effect.
See cluster randomised controlled trial.
Related to the concept of external validity, the extent to which the findings of a study could be applied to a wider population than just those individuals who took part in the research.
An improvement in performance caused by participants becoming better at a task after carrying it out repeatedly.
Researchers registering details of their study (such as the research design) before starting their research in order to increase transparency and reduce the effects of publication bias.
Individual studies that generate their own data.
A type of longitudinal study that selects participants at the start of the research process and collects data from these individuals over time.
The tendency for researchers and academic journal editors to favour publishing research where the findings have shown a positive result (meaning that they have shown an intervention works or have confirmed what the researchers initially predicted).
Information that is in a non-numerical form. For example, qualitative data could be the written account of a person who has experienced homelessness or photographs taken by an individual to describe their local community.
Qualitative data analysis requires a qualitative dataset (such as diaries, collections of pictures or transcripts of interviews). It can be used to test or develop hypotheses. Some of the most common forms of qualitative analysis include analytic induction, thematic analysis, discourse analysis and narrative analysis.
Research that collects qualitative research and uses qualitative data analysis to explore concepts. Qualitative research is usually less concerned than quantitative research with the generalisability and replicability of research findings to other contexts is more concerned with the data being an authentic and trustworthy reflection of the circumstances in which it was collected.
Information that is in a measurable, numerical/countable form. For example, measures of temperature, time and weight are quantitative data. Other quantitative data can be generated by counting or categorising occurrences. For example, quantitative data could be assigning individuals positions on a scale representing their level of anxiety.
The process of counting or measuring phenomena to create numerical quantitative data. Qualitative data can be transformed into quantitative data through quantification. For example, a researcher could count the number of times certain words appear in qualitative dataset.
Research that usually collects quantitative data and uses statistical or modelling to answer research questions. Quantitative research is generally concerned with the generalisability and replicability of research findings and strives for objectivity.
A study that does not meet either one or both of the two key features of an experiment, meaning that an independent variable may not be randomly assigned between an intervention group and the control group, and there may not be a separate control group. Quasi-experiments are not considered as robust as experiments because the lack of a randomised control group makes their results more liable to be influenced by extraneous variables.
A data collection method where a set of short questions are asked/answered in person, over the phone, online or on paper. Questionnaires usually produce quantitative data through asking closed questions with a set of possible responses (for example, asking a participant how much they agree with a statement on a five-point Likert scale from ‘strongly disagree’ to ‘strongly agree’). However, questionnaires may also produce qualitative data (for example, asking a participant to give a short explanation to why they agree/disagree with a statement). Questionnaires can be used as part of survey, which is the process of gathering, aggregating and analysing data from a group.
A method of sampling where all members of a particular population are equally likely to be selected to be part of a study. Because the sample is selected randomly, it can be considered likely to reflect the variety of characteristics in the population it was drawn from, making a representative sample.
A procedure (like a lottery) where all participants have the same likelihood of being assigned into an intervention or control group and that assignment is entirely down to chance. Randomisation helps researchers control for extraneous variables (for example, the age, weight, height or sex of individuals).
An experiment where participants are randomly placed into a control group or an intervention group. Cluster randomised controlled trials involve random assignment to the control or intervention at a group level (such as assigning whole schools, hospitals or local councils). They are considered a particularly robust study type as the randomisation reduces the likelihood of bias from extraneous variables. As experiments, they are able to prove causation.
A measure of variability in a dataset. It is the difference between the smallest value and the largest value in a dataset.
A type of secondary research similar to systematic reviews, in that they have a structured methodology in how they find and present primary research. However, they are run over a shorter timescale (usually 3 to 9 months) and are, therefore, not as exhaustive as a systematic review. The process for a rapid evidence assessment is less well-defined than that for a systematic review as there are different ways that the review process is shortened.
The effect on study outcomes of an individual’s inability to remember accurately events that happened in the past. Recall bias is a particular threat to the internal validity of case-control studies and some retrospective longitudinal studies, whose designs are based around accurately recalling previous life events. Recall bias is also an issue for any other study type where remembering past events is important.
A type of longitudinal study where data from different types of official record are used to track outcomes for participants.
The phenomenon of an extreme measure (one that is particularly high or particularly low) most likely being followed by a measure that is closer to average.
A descriptive statistic based on absolute risk that provides an indication of how much a selected factor raises the risk of experiencing a particular event. For example, a study could compare the absolute risk of developing breast cancer for women who do not drink alcohol and women who have one alcoholic drink a day. The absolute risk for teetotal women is 11.1% and the absolute risk for women who have one alcoholic drink a day is 11.7%. That is to say that 111 out of 1000 teetotal women will develop breast cancer and 117 out of 1000 women who have one alcoholic drink a day will develop breast cancer. Relative risk compares those two absolute risks to give an indication of how much a selected factor increases/decreases risk. As 117 is 5% higher than 111, this indicates that having one drink a day increases the risk of developing breast cancer by 5%. This 5% is the relative risk of having one drink a day.
How consistent a measure is. For example, if the same object is placed on weighing scales, they would be considered reliable if they showed the same weight each time.
A type of cross-sectional and longitudinal study where data are collected to provide a snapshot of a certain factor in a population at a particular point in time and this data collection is repeated at set time intervals with a new set of participants. These differ from other cross-sectional studies because data are collected at multiple time points. These differ from other longitudinal studies because new participants are used each time, rather than using the same group throughout.
The extent to which the results and conclusions of a study are corroborated by that study being run again. The more similar the results of a replication study are to the original study, the more likely the results are to be externally valid and, therefore, generalisable to a wider context.
The phenomenon where studies with significant findings are repeated and the repeat studies do not find significant results.
Information collected from an individual about their experiences and opinions. Reported data can be qualitative or quantitative. Some examples of reported data collection methods include interviews, focus groups and questionnaires.
A sample that reflects the variety in characteristics (such as age or sex) of the population it is drawn from.
The extent to which a study’s data could be analysed again or its methodology could be used to rerun the study with a different sample. In order for research to be reproducible, researchers must either share the data collected from the study or share the details of how they ran their study, including how the sample was selected, how the methodology was carried out, how measurements were taken, and how results were analysed.
The principles that researchers follow, including protecting participants from harm and giving them the information they need to decide whether to take part in a study.
The effect of participants behaving in a way that they believe a researcher wants them to, possibly influencing the results of a study.
A type of longitudinal study that uses data that have already been collected, rather than collecting data specifically for the study.
A study is considered more robust when it reduces the likelihood of extraneous variables influencing measured outcomes and bias influencing conclusions. A study is considered less robust if there is a greater chance that extraneous variables have influenced the data or there has been bias in the collection or reporting of data. Robustness influences how much one should rely on the findings of a study.
The number of participants involved in a study.
The group of participants that have been selected from a larger population to be involved in a study.
The process of choosing which participants will be selected from a larger population to be involved in a study.
The analysis and synthesis of primary research.
The potential effect of participants not being chosen randomly from the population being studied. Without randomisation in selection, it is possible that a biased sample of participants has been chosen and that the results of the study are affected by extraneous variables.
The potential effect of individuals volunteering to be involved in research. When studies solicit participation from people (such as by asking people to volunteer to take part in a study or fill out a questionnaire), those who choose to take part are likely to differ fundamentally from those who do not choose to take part and they may not be representative of the population. Indeed, they may over-represent people with strong opinions or interests, resulting in the study’s results being biased.
An interview data collection method where the questions asked, and the order in which they are asked, may vary and/or be led by the discussion with each participant (unlike structured interviews).
The potential bias that comes from the sequence of interventions. For example, any influence that a participant might experience from preceding interventions.
A study designed to reduce potential bias from response bias and demand characteristics. In a single-blind study, participants do not know whether they are in an intervention or control group and/or are not aware of exactly what the study is investigating. As the researchers conducting the study are aware of whether participants are in an intervention or control group, there is still potential bias from the observer-expectancy effect.
A method of recruiting individuals to take part in a study by getting participants to recruit people that they know. This sampling method may mean the sample is not representative as the characteristics of a group who all know each other may differ from the wider population.
The potential effect of participants underreporting behaviours or opinions that they believe are socially unacceptable or undesirable. The social desirability bias can be lessened when responses are anonymised and participants are assured that they will not be identifiable from their responses.
A coincidental similar pattern between two unrelated factors.
Guidelines for individuals to use when making decisions on the quality, strength or applicability of particular research studies. There are over 20 different standards of evidence in use in the UK.
A group of methods used for analysing quantitative data. There are two main types of statistics: descriptive statistics and inferential statistics. Descriptive statistics are numerical summaries of a dataset. Inferential statistics can be used to see which educated guesses (hypotheses) made by a researcher at the start of the study were accurate or can estimate from a small dataset how much a certain phenomenon would be present in a larger population.
In inferential statistics, the likelihood that a study will be able to detect an effect. The more statistical power a study has, the less likely there is to be a type II error. Power is based on the level of statistical significance, the potential effect size and the number of participants/measurements in the study (the sample size). An underpowered study is one where the combination of these three factors means it is likely that the null hypothesis will be rejected when the alternative hypothesis is correct. Increasing the power of a study generally means one or more of: setting a less stringent statistical significance level, having a greater effect size or increasing the sample size. As the first two factors are usually not adjustable by the researcher, the most common way to increase the power of a study (and reduce the likelihood of a type II error) is to increase the sample size.
An indicator of how likely the results of a study are to have happened by chance. The lower the likelihood that they happened by chance, the higher the likelihood that the null hypothesis can be disproved and an alternative hypothesis accepted. Results are statistically significant when they disprove the null hypothesis and are unlikely to have been caused by chance. Researchers set a level of probability (p-value) at which they consider a result statistically significant. For many areas of research, results are considered significant if the likelihood that they occurred by chance is less than 5% (which is written as p<0.05). Some research (such as clinical research) may set more stringent levels, with results only being considered significant if the likelihood that they occurred by chance is less than 1% (written as p<0.01).
A type of randomised controlled trial where participants in the control group eventually also receive the intervention. Can be used when there is concern over the ethics of only providing an intervention to one group in an experiment. They are considered a particularly robust study type as the randomisation reduces the likelihood of bias from extraneous variables. As experiments, they are able to prove causation.
A type of sampling that divides a population of interest into different sub-populations based on certain characteristics (such as age group, sex or income level). Participants in each of the sub-populations are then randomly chosen and invited to be part of the study until there are enough people recruited from each of the sub-populations. Because there is randomisation in this sampling method, it increases the likelihood that various population characteristics are reflected in the overall sample.
An interview data collection method where an interviewer asks a participant set questions. Questions are asked of each participant in the same order.
A part of a larger population, such as women in the UK or Spanish-speakers in the United States.
Publication of an article that is only available to individuals who have access to paid-for subscriptions. This affects who can read it, as subscriptions to journals are predominantly held by academic institutions (as opposed to members of the public, third sector workers or civil servants).
A measurement used to tell if an intervention worked that acts a proxy for the actual desired impact. For example, measuring cholesterol might be a surrogate endpoint for a study interested in whether a new medicine can reduce death from heart and circulatory diseases. Although cholesterol may be related to death from heart and circulatory diseases, it is not the same as looking directly at the mortality rates. It is a surrogate endpoint. Indeed, there is a possibility that the medicine might be highly effective at reducing cholesterol but have no effect on future mortality, or even increase future mortality
A research method that uses questionnaires to gather, aggregate and analyse data from a group. Examples of surveys include opinion polls and national censuses.
Systematic reviews are a way of finding and reporting as much existing evidence as possible in an area in a structured, replicable way. They provide readers with an overview of a topic area and a general understanding of what the quality and quantity of primary research is in that area. The areas they review tend to be relatively narrow.
Related to the concept of external validity, temporal generalisability is the extent to which findings of a study could be applied to a different time period.
A form of qualitative analysis that identifies, analyses and interprets patterns within data. One of the ways that thematic analysis does this is through coding the data. Coding is a multi-step process where data (such as written autobiographical statements) are reviewed by a researcher and concepts and phrases that appear key to the researcher are highlighted and moved into sub-categories and categories. After initial coding and categorising, researchers identify the themes and relationship between categories and begin to develop hypotheses relating to them. More data may be collected at this stage to test or give weight to hypotheses. Data collection and coding will usually stop when the researcher believes they have reached theoretical saturation (the point at which more coding/data collection is unlikely to further develop the hypotheses).
A study designed to reduce potential bias from observer-expectancy effects, response bias and demand characteristics by ensuring that which groups (intervention or control) participants are in is not known to the participants, the experimenters or the researchers carrying out the final analysis of data.
An error that can occur when using inferential statistics. A type I error is where the null hypothesis is incorrectly rejected and an alternative hypothesis accepted because the results are shown to be statistically significant even though the results were actually down to chance.
An error that can occur when using inferential statistics. A type II error is where the null hypothesis is incorrectly accepted and an alternative hypothesis rejected because the results are shown not to be statistically significant, even though the results were not due to chance.
An interview data collection method where there no set questions and the conversation is led by the interviewee, although the interviewer usually has an idea of the sort of topics they want to cover.
A method of recruiting participants to take part in a study by advertising the study. The sample is then made up of people who volunteer to take part. Voluntary sampling is vulnerable to self-selection bias and participation bias as those who do (or do not) put themselves forward for a study may differ from the overall population, meaning that the sample is not representative of the wider population.
A quasi-experimental study where there is no separate control group and every participant is exposed to every intervention and control. Without counterbalancing, within-subject designs are at risk of bias from carryover effects, order effects and sequence effects.
POST would like to thank the following peer reviewers for kindly giving up their time during the preparation of this briefing:
Alliance for Useful Evidence (Jonathan Breckon and Kuranda Morgan)
Dr Deborah Bailey-Rodriguez, Middlesex University London
Dr Adam Cooper, UCL
Rob Davies, CLOSER
Neil Drake, National Institute for Health and Care Excellence
Dr Andi Fugard, Birkbeck
Professor David Gough, EPPI-Centre
Professor John MacInnes, University of Edinburgh
National Audit Office (Margaret Anderson, Meg Callanan and Ruth Kelly)
Professor Melanie Nind, National Centre for Research Methods
Dr Kathryn Oliver, London School of Hygiene and Tropical Medicine
Dr Martin Ralphs, Office for National Statistics
Dr J Simon Rofe, SOAS
Professor Jennifer Rogers, PHASTAR
Dr Alex Sutherland, Behavioural Insights Team
Dr Ben Taylor, Open Innovation Team
What is validity, reliability, generalisability and applicability of research evidence? This section will focus on the way in which quality is assessed in quantitative and qualitative research.
There's a huge range of research evidence out there. Why is it different than other types of information? How is it collected and analysed? Here we have collected resources to help you understand and use research evidence more effectively.
This glossary compiles key terms used in recent POST research on artificial intelligence (AI).