Warning: jsMath requires JavaScript to process the mathematics on this page.
If your browser supports JavaScript, be sure it is enabled.

S1 summary

« back to index page

Chapter 1 summary - Mathematical modelling in probability and statistics

This chapter does not have a summary.

Chapter 2 summary - Representation of sample data

For a stem and leaf diagram each row represents a stem and is indicated by teh number to the left of the vertical line. The digits to the right of the vertical line are the leaves associated with the stem.
A grouped frequency distribution consists of several classes and their associated class frequencies.

For the class 5-9 for example the

lower class boundary is 4.5

lower class limit is 5

upper class limit is 9

upper class boundary is 9.5

class width is 9.5 - 4.5 = 5

class mid-point is {1\over 2}(4.5+9.5) = 7
When drawing a histogram, for each histogram bar the area is directly proportional to the frequency that it is representing:

Area \propto Frequency

and since the histogram consists of a series of bars, then for a histogram:

Total Area \propto Total Frequency
The height of a histogram bar is found by dividing the class frequency by the class width.
Histograms are plotted using class boundaries.

Chapter 3 summary - Methods for summarising sample data (location)

The mode is that value of a variate which occurs most frequently.
The median is the middle value of an ordered set of data.
The quartiles of an ordered set of data are such that 25% of the observations are less than or equal to the first quartile (Q_1), 50% are less than or equal to the second quartile (Q_2) and 75% are less than or equal to the third quartile (Q_3).
The mean of a set of observations is the sum of all the observations divided by the total number of observations, i.e.

\displaystyle \mu = \bar x = {\sum x\over n} \qquad or \displaystyle \qquad {\sum fx\over \sum f}

Chapter 4 summary - Methods for summarising data (dispersion)

The range of a data set is given by:

Range = largest value - smallest value
The interquartile range is given by

{\rm IQR} = Q_3 - Q_1
The semi-interquartile range is defined as:

{\rm SIQR} = {1 \over 2}(Q_3 - Q_1)
Variance of a population is defined as:

\displaystyle \sigma^2 = {\sum (x-\mu)^2\over n} \qquad or \displaystyle \qquad \sigma^2 = {\sum f(x-\mu)^2\over \sum f}
Unbiased estimator of the population variance is defined as:

\displaystyle s^2 = {\sum(x-\bar x)^2\over n-1} \qquad or \displaystyle s^2 = {\sum f(x-\bar x)^2\over \sum f - 1}
The standard deviation is the positive square root of the variance.
For

positive skew: Q_2 - Q_1 \quad < \quad Q_3 - Q_2

negative skew: Q_2 - Q_1 \quad > \quad Q_3 - Q_2

symmetry: Q_2 - Q_1 \quad = \quad Q_3 - Q_2

Chapter 5 summary - Probability

{\rm P}(\text {event } A \text { or event } B) = {\rm P}(A \cup B)
{\rm P}(\text {both events } A \text { and } B) = {\rm P}(A \cap B)
{\rm P}(\text {not event } A) = {\rm P}(A')
Complementary probability

{\rm P}(A') = 1 - {\rm P}(A)
Addition rule

{\rm P}(A \cup B) = {\rm P}(A) + {\rm P}(B) - {\rm P}(A \cap B)
Conditional probability

\displaystyle {\rm P}(A \text { given } B) = {\rm P}(A|B) = {{\rm P}(A \cap B)\over {\rm P}(B)}
Multiplication rule

\displaystyle {\rm P}(A \cap B) = {\rm P}(A|B) \times {\rm P}(B)
A and B are independent events if

{\rm P}(A \cap B) = {\rm P}(A) \times {\rm P}(B)
A and B are mutually exclusive events if

{\rm P}(A \cap B) = 0

Chapter 6 summary - Correlation

Product-moment correlation coefficient:

\displaystyle r = {S_{xy}\over \sqrt {S_{xx}S_yy}}

where

\displaystyle S_{xy} = \sum (x_i - \bar x)(y_i - \bar y) = \sum x_iy_i - {\sum x_i \sum y\over n}

\displaystyle S_{xx} = \sum (x_i - \bar x)^2 = \sum x_i^2 - {(\sum x_i)^2\over n}

\displaystyle S_{yy} = \sum (y_i - \bar y)^2 = \sum y_i^2 - {(\sum y_i)^2\over n}
r is a measure of linear association

r = \quad 1 \Rightarrow perfect positive linear correlation
r = -1 \Rightarrow perfect negative linear correlation
r = \quad 0 \Rightarrow no linear correlation

Chapter 7 summary - Regression

Explanatory or independent variable:
a variable that is set independently of the other variable
Response or dependent variable:
the variable whose values are decided by the values of the explanatory or independent variable.
Linear regression model:

y_1 = \alpha + \beta x_i + \varepsilon_i
The regression line of y on x is:

y = a + bx,

where \displaystyle \qquad b = {S_{xy}\over S_{xx}} \qquad and \qquad a = \bar y - b\bar x

Chapter 8 summary - Discrete random variables

For a discrete random variable X

\displaystyle \sum_{\forall x}^{\quad} {\rm P}(X=x) = 1

\displaystyle \mu = {\rm E}(X) = \sum_{\forall x}^{\text { }} x{\rm P}(X=x)

\displaystyle \sigma^2 = {\rm E}(X^2) - \mu^2 = \sum_{\forall x}^{\text { }} x^2{\rm P}(X=x) - \mu^2
Properties of expected values and variance

{\rm E}(aX + b) = a{\rm E}(X) + b

{\rm Var}(aX + b) = a^2{\rm Var}(X)
Cumulative distribution function {\rm F}(x)

0 \leq {\rm F}(x) \leq 1
For the discrete random variable X:

\displaystyle {\rm F}(x_0) = {\rm P}(X \leq x_0) = \sum_{x \leq x_0} {\rm P}(X=x)

Chapter 9 summary - The normal distribution

For a continuous random variable X, having a normal distribution,

Mean = \mu
Variance = \sigma^2
Given that X \sim {\rm N}(\mu, \sigma^2),

then \displaystyle \quad Z = {X - \mu\over \sigma} \sim {\rm N}(0,1^2)

   http://maths.adibob.com/

   This site is not endorsed by Heinemann or edexcel in any way.

   Site produced by Adrian Lowdon. Email adi@adibob.com