Statistics

Introduction to Statistics

Statistics, or statistical procedures, refer to a set of mathematical procedures to organise, summarise and interpret data[1]. Being able to analyse and interpret statistical data is a key skill for researchers and professionals from many different disciplines. You will need to make decisions based on statistical data, to interpret statistical data in research articles, and to conduct your own research and interpret its data

Explore basic statistical concepts that underlie statistical analyses, such as types of measurement of data, and useful definitions related to samples and population.

Two families of statistics will also be discussed: Descriptive and Inferential

You will learn about the graphic representations of data: histograms, pie charts and line charts

The study of statistics involves the following skills and knowledge:

statistical concepts and necessary maths formulae
different types of statistical analysis and their purpose
ability to interpret and read results from statistical analyses
making decisions about statistical results

[1] The word ‘data’ is from Latin and is plural, ‘datum’ is the singular. However, ‘data’ as a singular noun is widely accepted in some disciplines these days.

Activity 1: Interpreting and analysing statistical data

Importance of interpreting and analysing statistical data

Basic concepts in statistics

Descriptive and inferential statistics

There are two types of families of statistics:

Descriptive statistics

Descriptive statistics are used to summarise and describe data (information that has been collected).

Data are usually organised and presented in tables or graphs that summarise information, such as histograms, pie charts, bars or scatterplots.
Descriptive statistics are only descriptive and, thus, do not involve generalising beyond the data that has been collected.

Examples of descriptive statistics are the average age of university students, or the number of female and male students undertaking a Health Sciences degree.

Inferential statistics:

With Inferential statistics, data are usually collected from a sample; that is, a smaller representative subset of the larger population we wish to investigate.

Inferential statistics use the theory of probability to investigate whether patterns found in the sample of study can be generalised to the wider population where the sample comes from.
Inferential statistics aim to test hypotheses and explore relationships between variables, and can be used to make predictions about the population.
Inferential statistics are used to draw conclusions and inferences; that is, to make valid generalisations from samples.

Examples of inferential statistics are statistical techniques to explore the relationship between variables (e.g. correlation coefficients). These techniques show us whether two variables are related: for instance, whether there is a relationship between stress levels and academic results.

purposes of descriptive and inferential statistics

Activity 2. Descriptive and inferential statistics

What type of statistic is needed?

Useful statistical concepts

Population and sample: These are sets of individuals in a study, and are representative of the entire group that you wish to study (e.g. all Australian university students). Because accessing the whole population is rarely possible, data is usually collected from a sample or set of the relevant population.
Sampling error: Even though samples should be representative of the population, in some cases sampling errors may have a negative impact. For example, a sample may over represent certain individuals by having a particular characteristic (e.g. more students who are highly stressed might choose to volunteer for a study on stress at university, than students who are not stressed). Sample size, or the number of participants/observations, will also affect the results of inferential statistics.
Data: Measurements or observations collected from a sample or population.
Statistic: Characteristics of the sample under study (e.g. mean or average study time of La Trobe university students).
Parameter: Characteristics of a population (e.g. mean or average study time of university students).
Variable: A property or characteristic of a person, event or object that can take on different values or amounts (e.g. study time). Variables can be grouped into two categories: independent and dependent variables.
Independent variable: A variable that is manipulated in order to investigate its effect on another variable.
Dependent variable: A variable that is affected by the independent variable.
Controlled variables: These are the variables that you keep constant (controlled) during the experiment or research study.

For example, if we want to determine what type of antidepressant (drug A, drug B or drug C) is most effective in the treatment of depression:

The independent variable is the type of antidepressant, as the researcher is interested in analysing what treatment is most effective.
The dependent variable is the level of depression; the researcher will measure whether levels of depression change as a result of administering drug A, B or C ; that is, the research study will determine which drug is most effective.

Another example: If we want to investigate whether students who listen to music when they study obtain higher grades than those who do not:

Independent variable: listening to music (yes/no)
Dependent variable: academic grades which may be influenced by the independent variable (listening to music or not ).

Activity 3. Dependent and independent variables

Dependent and independent variables

Levels of measurement

Understanding your data is important as it determines the level of measurement. There are four levels of measurement of data: nominal, ordinal, interval, and ratio.

Nominal variables express a qualitative (either/or) attribute and do not imply a numerical ordering. These variables are known as categorical or nominal. Examples of this type of variable are gender (male/female), or marital status (married, single, divorced,…).
Ordinal variables express categories with a natural order; that is, values that can be ranked. However, the precise difference or distance between categories is unknown. Examples of ordinal variables are:

- Educational level, which can be ranked (e.g. from primary education to postgraduate research,

- The rate of agreement or disagreement with a statement or question.

Thus, individuals can be ranked according to the importance they give to religion, but the precise difference between two responses (e.g. very important and important) cannot be defined.

Interval variables express numerical data that can be measured along a continuum, and the distance between numerical values is equal. However, there is no absolute zero. For example:
- IQ (a measurement of intelligence) can be measured numerically along a scale, and the distance between an individual who scores 100 and an individual who scores 105 is the same as the distance between 105 and 110. There is no absolute zero as an individual cannot obtain a score of 0.

Ratio variables are similar to interval variables, but with an added characteristic: a zero measurement is an absolute zero. For instance:
- Income can be classified as a ratio variable because income can be distributed along a continuum fromlow to high. A zero score exists as a person’s income can be $0.00.

The diagram below shows the differences between each type of measurement.

Retrieved from http://www.slideshare.net/KarenHarker1/inferential-statistics-34291836

Activity 4. Identify level of measurement

Levels of measurement

Graphic representations

Statistical information is usually presented and summarised using graphics and tables. The type of graph used to summarise and present data will depend on the level of measurement of the data:

Nominal data (i.e., qualitative attributes) are often represented in a pie chart or bar chart. With this type of data we are interested in presenting the frequency of responses for each category.

Pie charts

In a pie chart, each category is represented by a slice of the pie. The area of the slice represents the percentage of responses in the category.

pie chart of mac and windows users

Fig 1. Frequencies of previous computer ownership by current Mac users (Retrieved from http://www.onlinestatbook.com/Online_Statistics_Education.pdf)

Pie charts are particularly helpful when displaying frequencies of a small number of categories. But they can be confusing if there are a large number of categories, or data from two different studies or experiments are presented.

Bar charts

Bar charts display the frequencies of different categories. Bar charts can also be used to compare the responses of two or more groups. In the example below, the bar chart compares the perceived stress of males and females by marital status (e.g. single, divorced, separated,…).

Fig 2. Example of bar chart. Adapted from Pallant, J. (2013). SPSS survival manual: A step by step guide to data analysis using IBM SPSS (5th ed.). Sydney, Melbourne, Auckland, London: Allen & Unwin

This bar chart allows us to compare perceived stress by gender and perceived stress. The X axis shows two groups: males and females; whilst the Y axis represents their scores on perceived stress. The higher the bar, the higher their perceived stress.

Moreover, each bar represents a group of males or females by marital status (the legend of the bar on the right shows what colour is used for each group). A quick look tells us that the separated women are the most stressed group, and widowed women perceive less stress than any other group.

However, it is important to note that we can’t conclude that widowed women in general are less stressed, or that separated women in general will be more stressed than any other group. This is an example of descriptive statistics: the graph summarises perceived stress by gender and marital status. To investigate whether there is a significant difference between these groups- and whether we can generalise these results to the wider population- we will need to employ inferential statistics.

Numerical data (ratio and interval) is usually represented by means of a histogram. A histogram is a frequency bar chart that shows the distribution of scores, and gives information about the shape and distribution of scores. Histograms are often used to assess the normality of distribution of scores, which will be discussed later.

Example of histogram of perceived stress by gender

Fig 3. Example of histograms. Adapted from Pallant, J. (2013). SPSS survival manual: A step by step guide to data analysis using IBM SPSS (5th ed.). Sydney, Melbourne, Auckland, London: Allen & Unwin

The x axis represents scores on perceived stress (e.g. a person can score 30, 35, 40,…). The y axis represents the frequency of each score; that is, how many times each score occurs. For instance, a score of 20 has a frequency of 10 for females; that is, 10 females scored 20 on perceived stress. Each histogram represents the frequency of stress scores for males and females and allows you to compare which scores are most frequent for each group.

Again, this only gives you descriptive information about your groups, but does not allow you to conclude that females are significantly more stressed than males, for instance.

Further resources

The Khan Academy has produced a series of videos that explain diverse statistical concepts.

Maths and Probability (Khan Academy)

The RMIT Learning Lab contains fact sheets on statistical concepts, including useful statistical definitions.

Statistics (RMIT Learning Lab)

DESCRIPTIVE STATISTICS	Question	INFERENTIAL STATISTICS
	We would like to know how many university students experience high stress levels.
	We would like to know whether female students experience higher stress levels than male students.
	We would like to know what study strategies are used by first year university students.
	We would like to know whether there is are a relationship between university students' study strategies and their academic results.

DEPENDENT VARIABLE	Question	INDEPENDENT VARIABLE
	Workshop attendance (Yes/No)
	Grades

DEPENDENT VARIABLE	Question	INDEPENDENT VARIABLE
	Hours of television viewing
	Violent behaviour

Variables	NOMINAL	ORDINAL	INTERVAL	RATIO
Gender
Exam grades

Variables	NOMINAL	ORDINAL	INTERVAL	RATIO
Year level (first, second, third)
Stress levels (measured from 0 to 10)

Search La Trobe

Statistics

Introduction to Statistics

Activity 1: Interpreting and analysing statistical data

Importance of interpreting and analysing statistical data

'More than 80% of dentists recommend Colgate'.

'More than 80% of dentists recommend Colgate'.

Basic concepts in statistics

Descriptive and inferential statistics

Activity 2. Descriptive and inferential statistics

What type of statistic is needed?

Useful statistical concepts

Activity 3. Dependent and independent variables

Dependent and independent variables

Levels of measurement

Activity 4. Identify level of measurement

Levels of measurement

Take a look at this completed example first:

Graphic representations

Further resources