By the end of this topic, you should be able to:
Subtopics
Objectives
Introduction to statistics
Statistical terms
Methods of data collection
Statistical terms
Sampling Techniques
Method of collecting, organizing, summarizing, presenting, analyzing and interpreting data (information) in a convenient and informative way to assist in making more effective decisions.
Statistics can be categorized as descriptive statistics and inferential/inductive statistics.
Descriptive Statistics is designed to describe, without going any further; that is without attempting to infer or conclude anything that goes beyond the data themselves.
Inferential Statistics is a method used to determine something about a population, based on sample.
Qualitative variable – measured according to their specific categories
or characteristics. Example: gender (male, female), marital status (single, married), race (Malay, Indian, Chinese), grade (A, B, C)
Quantitative variable – when the variable studied comes in term of numbers (numerical value) Example: number of student, total income, distance traveled, test mark etc.
Quantitative variable can further be classified as:
Example 1:
Consider a population of 120,000 students in Terengganu. It was found that the mean height of the student is 148 cm and the variance is 1.5 cm . It also found that the mean height of 1,500 students in Dungun High School is 152 cm and the variance is 2 cm.
Population – 120, 000 students in Terengganu
Sample – 1,500 students in Dungun High School
Element – student
Variable – height of students
Parameter is a numerical measure used to describe a population.
Statistic is a numerical measure used to describe a sample.
Below are some examples of parameter and statistic based on the information in the previous example.
Parameter: | Statistics: |
---|---|
i. Population Size, N = 120,000 | i. Sample Size, n = 1,500 |
ii. Mean, \(\mu\) = 148 cm | ii. Mean, \(\bar{x}\) = 152 cm |
iii. Variance, \(\sigma^2\) = 1.5 cm | iii. Variance, \(s^2\) = 2 cm |
Data is a collection of observations, measurements or information obtained from study that is carried out.
- Primary data – data that is gathered and published for the first time by the researcher.
Advantages | Disadvantages |
---|---|
i. satisfy the research objectives | i. very costly |
ii. more up to date | ii. time consuming |
iii. sensitive data is difficult to collect directly from the respondent |
- Secondary data – data that is obtained from other sources (not the researcher) such as from annual report, journal, newspaper, internet etc.
Advantages | Disadvantages |
---|---|
i. easy to obtain | i. data might not satisfy the research objective |
ii. less costly | ii. there might be errors committed by the original researchers. |
iii. can obtained in a large quantity |
Measurement is simply the act of determining the quantity of values of a variable or assigning number to a variable.
Level of measurement:
Exercise 1:
- Census - To study the whole population
Advantages | Disadvantages |
---|---|
i. Data collected from all elements. | i. Very costly and time consuming. |
ii. Data are more complete. | ii. Result would be out to date. |
- Sampling - To study the sample. Sampling is the process of selecting a sample from a population.
Advantages | Disadvantages |
---|---|
i. Less costly and required less time. | i. Data not collected from all elements. |
ii. Result is more up to date. | ii. Data are less complete. |
Every elements in the population has equal chance to be selected as sample.
Sampling frame must be available.
Two methods can be used to randomly select n elements, where n is the sample size:
Example 2:
A group of researcher planned to survey the family backgrounds of all students studying in UiTM. Due to time constraint, they decided to survey only 300 students. By using simple random sampling, discuss how they would select the sample.
Make a list of all the students who studying in UiTM. Assign each student a unique number, between 1 until the last students.
Using lucky draw:
Write the numbers on a small slip of paper and deposit all the slips in a box. The first selection is made by drawing a slip out of the box without looking at it. This process is repeated until the sample size of 300 is chosen.
Using random numbers:
Refer to a table of random numbers. Starting at any point in the table read across or down and notes every number that falls between that numbers. Use the numbers you have found to pull the names from the list that correspond to the 300 numbers you found. These 300 students are your sample. OR
Use random number generated by the computer software in order to select the sample. The person correspond to the numbers produced by the computer will be the sample.
Advantages | Disadvantages |
---|---|
i. Every element has equal chance to be selected. | i. Not suitable for heterogeneous population. |
Sampling frame must be available. How to collect sample?
Step
Example 3:
There are 200 elements in the population and a sample of 10 is desired. Discuss how the sample can be selected by using Systematic Random Sampling.
Advantages | Disadvantages |
---|---|
i. Every element has equal chance to be selected. | i. More difficult to use. |
ii. In order to get a good sample, population must be properly arranged. |
Applicable for population that is categorized such as according to sex, races, etc.
Characteristics of the population:
Example 4:
A group of research planned to survey all workers working in an industrial area. They are divided as followed. In order to save cost, they are decided to survey only 600 of the workers. Discuss how the sample can be selected by using stratified random sampling.
Race | Sub Population Size | Number of Sample |
---|---|---|
Malay | 2800 | \(n_1\ =\ \frac{2800}{4500}\ *\ 600\ =\ 373\) |
Chinese | 1250 | \(n_1\ =\ \frac{1250}{4500}\ *\ 600\ =\ 167\) |
Indian | 450 | \(n_1\ =\ \frac{450}{4500}\ *\ 600\ =\ 60\) |
Total | 4500 | 600 |
To sample each of the stratums, use either simple random sampling or systematic random sampling.
Advantages | Disadvantages |
---|---|
i. Every element has equal chance to be selected. | i. More difficult to use. |
ii.Suitable for categorized population. |
Applicable for a population that is divided into homogeneous or similar cluster. Elements in the cluster are heterogeneous.
How to use cluster sampling?
Example 5:
A group of researchers planned to survey all family in Kuala Besut, living in 50 villages. In order to save cost, they decide to survey only 10 villages. Discuss by using cluster sampling.
Suppose you divide district Kuala Besut into 50 villages. Then by using simple random sampling or systematic random sampling, select 10 villages from 50 villages. Sampled each (all) of the elements in 10 villages.
Advantages | Disadvantages |
---|---|
i. Suitable for a population that is quite large. | i. Difficult to ensure that cluster are similar/homogeneous |
ii. Suitable for clustered population. |
Suitable for a large population. Selection done by stages.
Example 6:
A group of researchers planned to survey the background of all form 5 students in Terengganu. They decided to use sampling. Discuss.
Let say:
They randomly selected
Not all elements in the population has equal chance to be selected as sample.
Suitable to be used if sampling frame not available and in market research.
Example 7:
A group of researcher planned to survey 120 house-owners in Dungun who have been using Sharp washing machines for more than 2 years. Discuss.
The numbers allocated for each group of respondents is based on the population statistics. The researcher has the flexibility to choose whomever he wants as long as the specifications set are met.
The researcher has the flexibility to select anybody that they wants or meets until the required sampled is obtained
The researcher selects a respondent whom he thinks has a certain characteristics that he wants to study
An initial group of respondent is selected usually at random. After being interviewed, these respondents are asked to identify others who belong to the target population of interest.
Generally there are 6 methods of data collection that can be used in order to collect the primary data. They are:
Researcher talks to the respondent face to face.
Advantages | Disadvantages |
---|---|
i. Produce the highest response rate. | i. Very costly and time consuming. |
ii. Can explain any unclear questions | ii. Interviewers must be properly trained. |
Interviewer asks questions from a prepared questionnaire
Advantages | Disadvantages |
---|---|
i. Less costly and required less time. | i. Appropriate only for population with telephones. |
ii. Can contact respondents several times. | ii. Respondents might refuse to cooperate. |
A questionnaire is sent to each respondent with a stamped addressed envelope attached.
Advantages | Disadvantages |
---|---|
i. Less costly. | i. Response rate very low. |
ii. Can be used in any population size. | ii. Unsure when the questionnaires shall come back. |
Respondents will be observed without their knowledge
Advantages | Disadvantages |
---|---|
i. Data obtained very accurate. | i. Very costly and time consuming. |
ii. The access of information is not affected by the respondents. | ii. The observer needs to be highly skilled and unbiased |
The researcher gives the questionnaire directly to the respondent and waits for them to complete it.
Electronic e-mail, internet survey and short messaging services (SMS).
Before you begin drafting your questionnaire, it is important to consider:
Some guidelines in designing a questionnaire
Exercise 2:
A researcher wishes to study the career aspirations of students from the Faculty of Accountancy, which consists of 50 classes. The researcher intends to choose only 10 classes and all the students from these 10 classes will be chosen for the study.
A group of researchers from Yayasan ABC conducted a survey on their sponsored students who are currently pursuing their studies at local universities. The purpose of the study is to determine the average monthly amount spent on academic books by these students. A list of 350 students’ names arranged alphabetically and addresses was obtained. A random sample of 70 students was selected from the list.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".