Chapter 1 summary - Mathematical modelling in probability and statistics
This chapter does not have a summary.
Chapter 2 summary - Representation of sample data
For a stem and leaf diagram each row represents a stem and is indicated by teh number to the left of the vertical line. The digits to the right of the vertical line are the leaves associated with the stem.
A grouped frequency distribution consists of several classes and their associated class frequencies.
For the class 5-9 for example the
lower class boundary
is
4.5
lower class limit
is
5
upper class limit
is
9
upper class boundary
is
9.5
class width
is
9.5 - 4.5 = 5
class mid-point
is
{1\over 2}(4.5+9.5) = 7
When drawing a histogram, for each histogram bar the area is directly proportional to the frequency that it is representing:
Area \propto Frequency
and since the histogram consists of a series of bars, then for a histogram:
Total Area \propto Total Frequency
The height of a histogram bar is found by dividing the class frequency by the class width.
Histograms are plotted using class boundaries.
Chapter 3 summary - Methods for summarising sample data (location)
The mode is that value of a variate which occurs most frequently.
The median is the middle value of an ordered set of data.
The quartiles of an ordered set of data are such that 25% of the observations are less than or equal to the first quartile (Q_1), 50% are less than or equal to the second quartile (Q_2) and 75% are less than or equal to the third quartile (Q_3).
The mean of a set of observations is the sum of all the observations divided by the total number of observations, i.e.
\displaystyle \mu = \bar x = {\sum x\over n} \qquad or \displaystyle \qquad {\sum fx\over \sum f}
Chapter 4 summary - Methods for summarising data (dispersion)
Unbiased estimator of the population variance is defined as:
\displaystyle s^2 = {\sum(x-\bar x)^2\over n-1} \qquad or \displaystyle s^2 = {\sum f(x-\bar x)^2\over \sum f - 1}
The standard deviation is the positive square root of the variance.
For
positive skew:
Q_2 - Q_1 \quad < \quad Q_3 - Q_2
negative skew:
Q_2 - Q_1 \quad > \quad Q_3 - Q_2
symmetry:
Q_2 - Q_1 \quad = \quad Q_3 - Q_2
Chapter 5 summary - Probability
{\rm P}(\text {event } A \text { or event } B) = {\rm P}(A \cup B) {\rm P}(\text {both events } A \text { and } B) = {\rm P}(A \cap B) {\rm P}(\text {not event } A) = {\rm P}(A')
Complementary probability
{\rm P}(A') = 1 - {\rm P}(A)
Addition rule
{\rm P}(A \cup B) = {\rm P}(A) + {\rm P}(B) - {\rm P}(A \cap B)
Conditional probability
\displaystyle {\rm P}(A \text { given } B) = {\rm P}(A|B) = {{\rm P}(A \cap B)\over {\rm P}(B)}
r = \quad 1 \Rightarrow perfect positive linear correlation r = -1 \Rightarrow perfect negative linear correlation r = \quad 0 \Rightarrow no linear correlation
Chapter 7 summary - Regression
Explanatory or independent variable:
a variable that is set independently of the other variable
Response or dependent variable:
the variable whose values are decided by the values of the explanatory or independent variable.
Linear regression model:
y_1 = \alpha + \beta x_i + \varepsilon_i
The regression line of y on x is:
y = a + bx,
where \displaystyle \qquad b = {S_{xy}\over S_{xx}} \qquad and \qquad a = \bar y - b\bar x