Variable and Constant
A characteristic that varies with an individual or an object is called variable. For example, age is a variable as it changes from person to person. A variable can assume a number of values.
A characteristic that does not vary with an individual or an object is called constant.
Primary and Secondary Data
The data published or used by an organization which originally collected them are called primary data. These are the first-hand data. Population census reports are primary data.
The data published or used by an organization other than which originally collected them are called secondary data. The data in Economic Survey of Pakistan are secondary data because these are originally collected by the Federal Bureau of Statistics.
Methods of collecting primary data
Following are some methods of collecting primary data:
i. Direct personal investigation
ii. Indirect investigation
iii. Through questionnaires
iv. Through enumerators
v. Through local sources
Major sources of secondary data
Following are the sources of secondary data:
i. Official, e.g. the publications of the Statistical Division, Ministry of Finance, Federal Bureau of Statistics, etc.
ii. Semi-official, e.g. State Bank of Pakistan, Chambers of Commerce & Industry.
iii. Publications of trade associations.
iv. Research organizations like universities and other institutions.
The process of arranging data into classes or categories according to some common characteristics present in the data is called classification.
The process of arranging data into rows and columns is called tabulation. A table is systematic arrangement of data into vertical and horizontal rows.
How to prepare a good table
i. A table should be simple.
ii. Units of measurement and nature of the data should be specified.
iii. Zeros need not be entered.
iv. Percentages should be clearly indicated.
V. Important items should be placed in the most prominent positions of the table.
Measures of central tendency
Any number that is used to represent the distribution is called an average. Since such value tends to lie in the centre of the distribution, it is called measure of central tendency or measure of location.
Qualities of a good average
i. It should be clearly defined by mathematical formula.
ii. It should be simple to understand.
iii. It should be easy to calculate.
iv. It should not be affected by extreme values.
A bar graph of a frequency distribution in which the widths of the bars are proportional to the classes into which the variable has been divided and the heights of the bars are proportional to the class frequencies. In histogram, class boundaries are taken on X-axis and frequencies taken on Y-axis.
A cumulative frequency polygon, popularly known as Ogive is a graph obtained by plotting the cumulated frequencies of distribution against the upper or lower class boundaries.
Important types of averages
Following are some important averages:
i. Arithmetic mean
ii. Geometric mean
iii. Harmonic mean
The arithmetic mean of a set of n observations is defined as the sum of all the observations divided by the number of observations. The arithmetic mean of sample is equal to sum divided by number.
The value which appears maximum number of times in the data is called mode. In some data, mode may not exist and in some data, there may be more than one modes. A data having two modes is called bimodal. In the data 4,2,6,6,5,9, the mode is 6.
Disadvantages of mode
Following are the disadvantages of mode:
i. It is not rigorously defined.
ii. It is often indeterminate and indefinite.
iii. It is not based on all the observations.
iv. When distribution consists of small number of values, the mode may not exist.
The range, R, is defined as the difference between the largest and smallest observations in a set of data. Symbolically, the range is given by the relation R= xm-x0 where xm stands for the largest observation and x0 denotes the smallest one.
The central value when all the values are written in an array is called median. If there is even number of values, the mean of central two values will be the median.
Merits of median
Following are some merits of median:
i. It is easily calculated and understood.
ii. It is not affected by extreme values.
iii. In highly skewed distributions, median is appropriate average to use.
Objectives of classification
Following are a few objectives of classification:
i. To reduce the large sets of data to an easily understood summary.
ii. To display the points of similarity and dissimilarity.
iii. To save mental strain by eliminating unnecessary details.
iv. To reflect important aspects of the data.
v. To prepare the ground for comparison and inference.
Advantages of harmonic mean
Following are the advantages of harmonic mean.
i. It is rigorously defined by mathematical formula.
ii. It is based on all the observations of data.
iii. It is amenable to mathematical treatment.
iv. It is appropriate type for averaging rates and ratios.
The organization of a set data in a table showing the distribution of the data into classes or groups together with the number of observations in each class or group is called frequency distribution.
Population vs. Sample
Population: Statisticians define a population as the entire collection of items that is the focus of concern. A population can be of any size and while the items need not be uniform, the items must share at least one measurable feature. For example, all students studying stats form a population as they have at least one common characteristic, that is, they all study stats.
Sample: It is a part of population. The critical difference between a population and a sample is that with a population, our interest is to identify its characteristics whereas with a sample, our interest is to make inferences about the characteristics of the population from which the sample was drawn.
A pie chart is a circular chart (pie-shaped); it is split into segments to show percentages or the relative contributions of categories of data.
A pie chart gives an immediate visual idea of the relative sizes of the shares of a whole. It’s a good method of representation if you are to compare a part of a group with the whole group.