#
Level 7 statistical investigation notes

At level 7 the new concepts are formalising sampling methods for taking a random sample, starting to formalise inference, and the reliability of an inference.

The standard deviation is a new measure of spread at this level. As, standard deviation is used in solving problems modelled by the normal distribution, it is important students develop an understanding of its interpretation as well as its calculation using technology.

##
Sampling Notes

- Reasons for sampling include time and cost considerations, lack of access to the entire population and the nature of the data collection or test, for example, blood test does not require all blood to be taken, testing breaking strain of fishing line destroys the line.
- Features of a good sampling technique include the sample is sufficiently large, randomly chosen and representative of the population.
- Sample size affects the variability of an inference. If a sample is too small, it is more likely to be unusual and less likely to be representative. As the Central Limit Theorem for sample means (a level 8 objective) applies to samples of at least 30 items, random samples of this size are acceptable. There is no statistical requirement that a sample be a proportion of the population. For an inference of a population proportion, however, a much larger sample size is needed, at least 250. This size comes from margin of error considerations (a level 8 objective) but at level 7 an intuitive understanding is sufficient.
- Randomised sampling techniques include simple random, systematic, stratified, cluster, and quota.
- It is important to identify the positive features of each method and be able to carry out each method correctly in order for the sample to be as representative as possible. Students must be able to provide evidence they have carried out their chosen sampling methods correctly. If a sample is randomly chosen then it is representative of the population.

##
Exploratory data analysis notes

- Exploratory data analysis starts with multivariate data. Investigative questions that can be asked of the data should be posed: such as
- wondering whether there is a connection between two variables,
- wondering whether other variables should be taken into account when possible patterns are observed,
- exploring multiple representations of the data into order to unlock the stories in the sample data.

- Technology such as a graphics calculator can draw a modified box plot, which shows whether extreme data values are outliers. Outliers are not simply the greatest or least data values. Outliers are more than 1.5 times the standard deviation above the upper quartile or below the lower quartile.
- If the sample box plot is approximately symmetrical and has no outliers it can be assumed the population has a similar distribution.
- If the sample data is skewed, then the median will be more reliable than the mean as an estimate of the population central value. However, if the distribution of the sample data is skewed this does not imply that the population is skewed. The skewness may be an artefact of sampling variability.
- A statistical estimate is not a guess but an inference or prediction of the true population parameter based on sample statistics. The sample median is used to infer (used as a point estimate of) the population median. Similarly the sample mean, quartiles, standard deviation can be used as estimates of the corresponding population parameters. A sample proportion can be used to estimate a population proportion, for example, the fraction or percentage of students who travel more than 30 minutes to and from school each day.
- Evaluation of sampling and data collection methods must be based on identifying features of good sample design or good experimental design. Appropriate considerations are those that would make the inference more reliable/less variable:
- such as further (described) strata,
- repeated sampling and averaging statistics,
- context factors
- relative size of the mean and standard deviation ie if the standard deviation is small in relation to the mean, then the population is likely to be closely spread about the population mean.
- If the sample contains at least 30 items, it may be trivial at Level 7 to suggest a larger sample would improve the inference of a measurement.

## Bivariate data notes

- Informal predictions, interpolations and extrapolations, can be made from bi-variate data. If a scatter plot of the data shows an approximately linear relationship, then a line of best fit can be found using technology and its equation used to make predictions within a range depending on the scatter. Similarly a curve can be fitted and used if appropriate. Sensible context considerations are required for extrapolations.

## Data notes

- Data collection considerations include:
- use of equipment,
- recording of data,
- consistency between measurers,
- questionnaire design,
- independence and training of interviewers,
- opportunity for behavioural bias,
- refusal to participate.

- Using a large existing data set, such as Census at School, to teach all the objectives will be successful if context, purpose, sampling, data collection and reasons for choice of measures are available. This information will allow
- sampling to be evaluated as the population can be identified;
- data collection methods to be evaluated, for example, what is the effect of self-measuring by students;
- measures selected to be evaluated, for example, what is the evidence that the reaction time question tested what it aimed for and not familiarity with clicking a mouse?;
- relevant contextual knowledge, exploratory data analysis and statistical inference to be applied.

Important teaching ideas (working towards):

Students need to

- Use correct vocabulary of estimate and parameter.
- Develop an understanding that confidence in the estimate will vary depending on factors such as sample size, sampling method, the nature of the underlying population, sources of bias.
- Experience evidence for the central limit theorem by simulating samples and comparing the distribution of sample medians or means for samples of different sizes.
- Investigate whether an effect has occurred and develop informal criteria to judge the strength of the effect. For example to test whether the ability to recall objects on a tray after viewing for 10 seconds is improved after attending a memory enhancing course.

Last updated March 1, 2012

TOP