Statistics and Modeling

Ahmed, Mukhtar

doi:10.1007/978-981-15-4728-7_3

Mukhtar Ahmed^2,3

Abstract

This chapter describes the application of statistical concepts with illustration about statistical models, probability, normal distribution, and analysis of variance (ANOVA). Statistical analysis is an important action process in research that deals with data. It follows well-defined, systematic, and mathematical procedures and rules. Data is information obtained to answer questions related to how much, how many, how long, how fast and how related. Statistics main objective is the analysis of data from generated experiment, but how should this data be collected to address our research questions and what should be our experimental design? Thus, in order to address question of interest clearly and efficiently, we need to organize experiment accurately so that we can have right type and amount of data. This is only possible using experimental design which has been elaborated in this chapter. The designs discussed here are completely randomized design (CRD), randomized complete block design (RCBD), Latin square design, nested and split plot design, strip-plot/split-block design, and split-split plot design. Similarly, factorial experiments have been discussed in detail with description about the interaction. The concept about fractional factorial design, multivariate analysis of variance (MANOVA), and analysis of covariance (ANCOVA) has been presented. Principal component analysis which is the method of multivariate statistics and used to check variation and patterns in a data set was also presented. It is easy way to visualize and explore data. The relationship between one or more variables to generate model which could be used for the prediction analysis has been discussed using concept of regression. Finally, association between two or more variables was presented using correlation. At the end different analytical tools/software were listed which can be used to do different kind of statistical analysis.

Access provided by Autonomous University of Puebla. Download chapter PDF

Selected Statistical Methods in Experimental Studies

Data Analysis Techniques for Quantitative Study

The ANOVA-type inference in linear mixed model with skew-normal error

Article 11 May 2017

Keywords

3.1 Basic Statistics

Statistics is the science (pure and applied) dealing with creating, developing, and applying techniques to evaluate uncertainty of inductive inferences. It helps to answer the question about different hypothesis. It can model the role of chance in our experiments in a quantitative way and gives estimates with errors. Propagation of error in input values could also be determined by the statistics. History of statistics goes back to the experience of gambling (seventeenth century) which leads to the concept of probability. Afterwards concepts of normal curve/normal curve of error were introduced. Charles Darwin (1809–1882) work was largely biostatistical in nature. Karle Pearson (1857–1936) founded the journal Biometrika and school of statistics. Pearson was mainly concerned with large data, and his student W. S. Gosset (Pseudonym, Student) (1876–1937) presented Student’s t-test which is a basic tool of statistician and experimenters throughout the globe. Genichi Taguchi (1924–2012) promoted the use of experimental designs.

Observations in the form of numbers are very important to perform different kind of statistical analysis. In case of crop production, observation can be phenology, leaf area, crop biomass, and yield. These numbers then constitute data, and its common characteristics include variability or variation. Variables may be quantitative or qualitative. Observations on quantitative variables may be further classified as discrete or continuous. Furthermore, probability of occurrence of value such as blondeness may be measured by probability function or probability density function (PDF). Chance and random variable terms are generally used for the variables possessing PDF. Population is all possible values of a variable, while part of population is called a sample. The concept of randomness is used to have true representative data sample from the population. Collected data could be characterized using tables, charts (pie chart, bars, etc.), and pictures (histogram). Afterwards data are presented in frequency tables, and measure of central tendency is used to locate center. This can help to find measure of spreading of the observation. Mean or average (μ) is the most common method to use the measure of central tendency. In case of dice, μ can be calculated by using following equation

$$ \mu =\frac{1+2+3+4+5+6}{6}=3\frac{1}{2} $$

(3.1)

If a sample is taken from the population having four observation, then $ \overline{Y} $ (sample mean) for the four observation (3, 5,7,9) is

$$ \overline{Y}=\frac{3+5+7+9}{4}=6 $$

(3.2)

This can be further symbolized by

$$ \overline{Y}=\frac{Y_1+{Y}_2+{Y}_3+{Y}_4}{4} $$

(3.3)

where Y₁ = value of first observation, Y₂ = value of second observation, Y₃ = value of third observation, and Y₄ = value of fourth observation. For the nth observations, Y_i is used to represent the ith observation and $ \bar{Y} $ is given by

$$ \overline{Y}=\frac{Y_1+{Y}_2+{Y}_3+{Y}_4+\dots +{Y}_i+\dots .+{Y}_n}{n} $$

(3.4)

This equation can be further shortened to

$$ \overline{Y}=\frac{\sum_{i=1}^n{Y}_i}{n} $$

(3.5)

Difference between observations (Y_i) and sample mean ($ \overline{Y}\Big) $ is called sample deviation (Y_i−$ \overline{Y}\Big),\mathrm{and}\ \mathrm{its}\ \mathrm{sum}\ \mathrm{is}\ \mathrm{equal}\ \mathrm{to}\ \mathrm{zero}\sum \left({Y}_i-\overline{Y}\right)=0 $.

For the different number of observations, it’s better to use weights that depend on the number of observations in each mean called weighted mean. A weighted mean is defined as follows:

$$ {\overline{Y}}_w=\frac{\sum {w}_i{Y}_i}{\sum {w}_i} $$

(3.6)

Another term supplement to the mean is median and it is value for which 50% of the observations lie on each side. However, if values are even, then median is average of the two middle values, e.g., 3, 6, 8, and 11 median is 7 (6 + 8)/2. If data is nonsymmetrical in that case, mean and median could be different, and data might be skewed in one direction; thus arithmetic mean may not be a good criteria to measure central value. Mode (most frequent value) is another measure to calculate central tendency. Central tendency provides summary about the data but does not provide information about variation. Standard deviation or variance or square root (Y_i − μ)² is used to measure variation or dispersion from the mean. It can be represented by two symbols: (i) σ² (sigma square for the population) and (ii) S² (sample). Population variance is defined as sum of squared deviations divided with total number, and it can be elaborated by the following equation if we intent to sample this population with replacement:

$$ {\sigma}^2=\frac{{\left({Y}_1-\mu \right)}^2+{\left({Y}_2-\mu \right)}^2+{\left({Y}_3-\mu \right)}^2+\dots +{\left({Y}_N-\mu \right)}^2}{N} $$

(3.7)

$$ =\frac{\sum_i{\left({Y}_i-\mu \right)}^2}{N} $$

(3.8)

However, when sampling is without replacement, then divisor is N−1, and it could be represented by the equation as follows:

$$ {S}^2=\frac{{\left({Y}_1-\mu \right)}^2+{\left({Y}_2-\mu \right)}^2+{\left({Y}_3-\mu \right)}^2+\dots +{\left({Y}_N-\mu \right)}^2}{N-1} $$

(3.9)

$$ =\frac{\sum_i{\left({Y}_i-\mu \right)}^2}{N-1} $$

(3.10)

The sample variance/mean square can be computed by using following formulas:

$$ {s}^2=\frac{{\left({Y}_1-\overline{Y}\right)}^2+{\left({Y}_2-\overline{Y}\right)}^2+{\left({Y}_3-\overline{Y}\right)}^2+\dots +{\left({Y}_N-\overline{Y}\right)}^2}{n-1} $$

(3.11)

$$ {s}^2=\frac{\sum_i{\left({Y}_i-\overline{Y}\right)}^2}{n-1} $$

(3.12)

$$ \left(n-1\right){s}^2={\sum}_i{\left({Y}_i-\overline{Y}\right)}^2 $$

(3.13)

s² = SS (sum of squares). For example, for the numbers 3, 5, 7, and 9, the SS is

$$ {\left(3-6\right)}^2+{\left(5-6\right)}^2+{\left(7-6\right)}^2+{\left(9-6\right)}^2={\left(-3\right)}^2+{\left(-1\right)}^2+{(1)}^2+{(3)}^2=9+1+1+9=20 $$

The variance for this data set will be 20/3 = 6.66, and the square root of the sample variance is called the standard deviation (s). For the above example, it can be calculated by the following method:

$$ s=\sqrt{\frac{20}{3}}=2.58 $$

Thus Eq. (3.12) can be represented as follows:

$$ SS={\sum}_i{\left({Y}_i-\overline{Y}\right)}^2 $$

(3.14)

This Eq. (3.14) could be further modified to a computing formula as follow:

$$ {\sum}_i{\left({Y}_i-\overline{Y}\right)}^2=\sum \limits_i{Y_i}^2-\raisebox{1ex}{${\left({\sum}_i{Y}_i\right)}^2$}\!\left/ \!\raisebox{-1ex}{$n$}\right. $$

(3.15)

The term $ \raisebox{1ex}{${\left({\sum}_i{Y}_i\right)}^2$}\!\left/ \!\raisebox{-1ex}{$n$}\right. $ is called the correction factor (CF) or correction term or adjustment for the mean. The Eq. (3.15) could be easily validated by using following data set in the Table 3.1.

Table 3.1 Data set for the validation of sum of squares equation

Statistics and Modeling

Abstract

Similar content being viewed by others

Selected Statistical Methods in Experimental Studies

Data Analysis Techniques for Quantitative Study

The ANOVA-type inference in linear mixed model with skew-normal error

Keywords

3.1 Basic Statistics

3.2 Statistical Models

3.3 The Linear Additive Model

3.4 Probability

3.5 Normal Distribution

3.6 Comparison of Means

3.7 Analysis of Variance (ANOVA)

3.7.1 Calculation of the F-Test

3.8 Experimental Design and Its Principles

3.8.1 Completely Randomized Design (CRD)

3.8.2 Randomized Complete Block Design (RCBD)

3.8.3 Missing Values Estimation

3.8.4 Latin Square Design

3.8.5 Factorial Experiments

3.8.6 Fractional Factorial Design

3.8.7 Nested and Split Plot Design

3.8.8 Strip Plot/Split-Block Design

3.8.9 Split-Split Plot Design

3.8.10 MANOVA (Multivariate Analysis of Variance)

3.9 ANCOVA (Analysis of Covariance)

3.10 Principal Component Analysis (PCA)

3.11 Regression

3.12 Correlation

3.13 Analytical Tools/Software

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation