How do we assess whether someone is overweight or underweight? What
factors do we need to take into account and how can we produce a
helpful single number index? In this investigation, the body mass
index (BMI) is derived as a measure of obesity. Conditional
distributions and the concept of adjusting one variable by another are
explored and correlation is introduced as a measure of the strength of
a relationship.
Skinfold thickness is widely used as a physiological measurement and has
a particular role in the construction of standards for growth of
children. Sampling variability is explored in this context. The
interpretation and use of confidence intervals are investigated, along
with the relationship between sample size and precision. The idea of a
reference range is introduced.
How should we compare two drugs which aim to reduce blood pressure?
Students are invited to design a clinical trial to compare the
treatments. This is discussion, not computer based. On the computer,
the relationship between sample size and power is explored. Execution
of the trial is simulated and data are made available to students for
analysis. Examination of the data involves the two independent
samples. Both continuous and categorical responses are considered.
Minitab provides an analysis via macros, allowing students to analyse
and interpret their own data.
When studying the behaviour of herring gulls it is useful to be able to
identify males and females. Gulls are one of the species where this
cannot be done by simple visual inspection. The problem is to identify
suitable measurements of size or weight which can be taken easily and
which can be used to discriminate between the sexes. Emphasis is on
graphical techniques such as histograms and scatterplots, with graphical
interaction to explore the effectiveness of different cut-points to
classify birds as male or female. Normal distributions are introduced
as models to provide firmer classification rules. Students are asked to
experiment with different measurements to produce a simple but
effective rule.
Immunoglobulin E is one of the substances that antibodies are made of.
High levels are associated with diseases such as hay fever, asthma and
eczema. A patient's 'IgE' level might be one of the factors taken into
account by a doctor in diagnosing some of these diseases. The problem
is to establish a 'normal range' of values for 'IgE' in the general
population and to use this to determine how extreme are the 'IgE'
levels of 3 patients suspected of having eczema. Statistical issues
raised include - estimating percentiles; transformations to Normality;
sampling variability of estimates of percentiles; validity of combining
data from several groups of 'normal' subjects in constructing a 'normal
range'.
A recently-developed test for the prenatal diagnosis of Down's
Syndrome. Students analyse data from mothers of controls and Down's
children, in an attempt to identify a suitable critical value for the
test. Sensitivity and specificity are defined and calculated for a
number of possible critical values, and a ROC curve is then plotted.
These calculations can be carried out using both a parametric (Normal)
and a non-parametric approach. Students are also encouraged to
consider the consequences for the choice of critical value of the
incidence rate of Down's Syndrome and the different costs that might be
associated with Type I and a Type II errors. "Grey areas", where a
decision is deferred pending further tests, may also be explored.
Body fatness is increasingly recognised as important in physiological
research and clinical practice. The reference method for determining
body fatness (densitometry) involves submerging a subject in a tank of
water, measuring the volume of the water that is displaced and then
imputing the subject's body density using Archimedes' Principle.
Simpler methods, developed for routine use and for use with elderly and
sick subjects, determine body fatness from measurements of skinfold
thickness or electrical impedance. The empirical models underpinning
these methods do not always give accurate results when used outwith the
population in which the model was originally developed. The accuracy of
six of these models is compared with the reference method in a group of
Glasgow school children. Paired t-tests and ANCOVA are used to
investigate bias. Limits of agreement between the reference and
another method are determined from prediction intervals.
Bacterial vaccines have to pass a test for microbiological sterility.
These tests are carried out on a bulk tank of the vaccine and are then
repeated on the individual ampoules or vials, which are filled
aseptically before being released as a finished product. Government
regulations require that 20 filled ampoules are taken at random from each
batch. The contents of each ampoule are tested for bacterial
contamination. The batch of ampoules will pass the test if each of the
20 taken are found to be free of living bacteria.
A fault has developed in the sterile air supply in the filling machine
and, as a result, each ampoule in a batch of 9000 has a 1% chance of
being contaminated with microbes independently of all other ampoules.
Is the standard 20-ampoule test likely to detect this contamination?
Statistical issues introduce will include: simple and stratified random
sampling; binomial distribution and graphical exploration of the
relationship between the contamination rate, the probability of
detecting contamination in a sample of ampoules and the sample size.
How effective are drugs in reducing the increased eye-pressure which
occurs in some diseases? The build-up of pressure in the eye can be
monitored by dye techniques. Changes in pressure in different groups
of animals can then be assessed by examining profiles of dye
concentration over time. Analysis of variance, applied to models of
these profiles, can then be used to identify differences among the
drugs used.
How can we trace the source of salmonella food poisoning in a university
hall of residence? Students are prompted through aspects of variable
selection, hypothesis formulation and interpretation, and the
calculation of appropriate test statistics. Contingency tables and chi-
square tests are the principal analysis tools. The effects of different
choices of response variable are explored.
How can we tell whether a plant distribution pattern is random or shows
clustering or other features? Ecological sampling methods for spatial
patterns are considered. Sampling techniques, which involve random
placings of a quadrat on soil, are explored. The Poisson probability
model is used for analysis.
Assessment of pollution levels can be made using measurements taken from
the feathers of birds. Heavy metals enter the environment from a
variety of sources. Many of these elements accumulate throughout the
food chain and reach higher concentrations in some organisms than are
found in the environment. Birds can eliminate pollutants either through
excretion or by sequestering them in feathers or eggs. Because birds
replace feathers at least annually, there should be no age-related
difference in the concentrations of pollutants. A range of data
summary techniques are introduced to explore data based on this theme:
frequency tables; histograms, bar charts, stem & leaf diagrams, box
plots; measures of centre; measures of variance.
The pollution theme will be further developed by providing students
with the opportunity to choose the correct test and develop the ability
to interpret the result. This sequel provides the basis for introducing
ideas of hypothesis testing and making decisions: is that significant;
interpreting the results of significance tests; p-values; degrees of
freedom; significance levels.
In many villages in Kenya, wheat and maize are grown as subsistence
crops. The crops are subject to attack by the Nile rat which eats
seedlings and wheat. Controlling these pests is expensive and involves
the use of poisons. Therefore, it is better to avoid control unless it
is really necessary. The best way to determine whether control is
necessary in a particular year is to estimate the number of animals in
the population early in the season. By the time the crops are being
damaged, it is too late. Two Mark-release-recapture methods - the
simple Lincoln Index and the Jolly-Seber Method - are covered,
exploring how to estimate the number of individuals in an animal
population. Statistical issues addressed include the use of simple
estimation techniques, sampling variability and how to calculate
confidence intervals, improve accuracy and compare different estimates.
The passage of genes between succeeding generations is explored with
particular reference to equilibrium and genetic drift as a theme for
introducing the basic ideas of probability and modelling: probability;
independence; simulation; binomial distribution; simple markov chain.
Lichens are sensitive to pollutants in the environment. Pollutants are
absorbed across the surface of the lichen and are either bound in the
hyphal wall, where they are stored harmlessly, or they are taken up by
the metabolocally active algae which may die off or be damaged as a
result. The effect of Zinc and Potassium on the respiration rate of the
lichen Parmelia saxatilis are investigated using linear
regression, multiple regression and an interaction term.
Studies have suggested that exercise tolerance in patients with angina
is reduced after a meal. Data are presented from a study in which
angina patients performed two exercise tests, one in a fasted state and
one after eating a standardised meal. The results of such tests have
clinical implications because the efficiency of antiaginal drugs is
tested using exercise tests. If eating a meal before such a test has
an effect on anginal threshold, then this must be taken into account.
The use of paired and independent sample t tests for comparing
treatments is explored. Issues such as the choice of test, use of
randomisation, effect of sample size as well as the formulation of
hypothesis and interpretation of the results are addressed