Using Excel® to Do Statistical Analysis

Shardt, Yuri A. W.

doi:10.1007/978-3-319-21509-9_8

Yuri A. W. Shardt²

2763 Accesses
1 Citations

Abstract

This chapter introduces the reader to the application of Excel^® to solving statistical problems with an emphasis on regression. Required Excel functions are described, including array functions that allow for the manipulation of matrices in Excel. Writing your own Excel code to extend its functionality and the resulting issues from Excel security are also considered. Finally, the Data Analysis and Solver add-ins are described in detail. Ready-to-use Excel templates are provided for constructing normal probability plots, box-and-whisker plots, and periodograms, as well as linear regression, nonlinear regression, and factorial design analysis. This chapter concludes with three examples: one focusing on linear regression and one on nonlinear regression, and the last one is a collection of factorial design experiments solved using the factorial design template. By the end of this chapter, the reader should be comfortable in using Excel to perform linear regression and factorial design for any level of problem complexity. The reader should also be able to use the appropriate templates and functions to speed up the analysis of a given data set.

Access provided by Autonomous University of Puebla. Download chapter PDF

Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

Design of Experiments Using R

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Microsoft Excel^® is a spreadsheet programme developed by Microsoft^®, which comes bundled with Microsoft Office^®. The most recent version of Microsoft Office is Office 2013. Not only can Excel perform most basic spreadsheet commands, it contains a programming language called Visual Basic that can be used to create powerful and useful macros. Most, if not all, of the analysis presented in the previous chapters can be easily implemented in Excel. It will be assumed that the reader has a basic understanding of Excel, can write simple formulae (equations), understands what a column and row are, and can create basic graphs. Basic background information about Excel can be found from such sources as the Excel for Dummies Series (Harvey 2013). This chapter will examine in detail additional features, such as array functions, using Solver, and writing basic Excel macros. In order to clearly distinguish between the code required for Excel functions and text, all Excel commands and variables are shown in bold Courier New. Locations on the ribbon, menu paths, and keystrokes are shown in plain Courier New.

8.1 Ranges and Arrays in Excel

Ranges and arrays are how Excel refers to groups of columns and rows. The difference lies in how they are used. A range is a group of rows and columns in an Excel spreadsheet, while an array is a group of rows and columns used in an Excel function or code. In Excel, an array most closely approximates a matrix.

One useful property of Excel is the ability to write an equation in one cell and then drag it to other cells. When dragging such an equation, any references to a range will be changed as the cells are dragged, for example, the cell A4 will become A5 if the formula is dragged down one row and B4 if it is dragged right by one column. This is called relative referencing, since it depends on the location. Although relative referencing is useful, it is not always desirable. Excel allows a cell reference to be made absolute, that is, it will not change its value as the cell is dragged. This is done by placing a dollar sign $ before the element that one wishes to freeze. There are three options (illustrated using cell A2):

1.
Absolute Rows and Columns ($A$2): in this case, the reference will always be to this cell.
2.
Absolute Row but Relative Column (A$2): in this case, the row will stay the same, but the column can change.
3.
Absolute Column but Relative Row ($A2): in this case, the column will stay the same, but the row can change.

Another useful feature in Excel is the ability to name a range. This means that rather than having to drag and select a very large range of cells, it can be conveniently referred to using the corresponding range name, for example, instead of average(B4:B1000) we could write average(y), where y has been defined to equal B4:B1000. Naming a range is shown in Fig. 8.1 and consists of two steps: select the range and enter a name in the location shown. In order to make your life easier, make sure that the name selected is unique to the workbook and not just the worksheet.

8.2 Useful Excel Functions

This section will examine various Excel functions that can be used for solving statistical problems.

8.2.1 Array Functions in Excel

Array functions are Excel’s equivalent to matrices. Arrays are defined as a range of cells that are treated together. When using array functions, the following steps must be followed:

1.
Select the output range.
2.
Enter the array formula into one of the cells in the selected array.
3.
Once the formula has been entered, press Ctrl + Shift + Enter to register the formula as an array formula. Normally, one would simply press Enter.

A summary of the most common array functions is given in Table 8.1.

Table 8.1 Excel array functions

Full size table

8.2.2 Statistical Functions in Excel

Table 8.2 lists some common statistical functions in Excel. Most of these functions as written only work on newer versions of Excel (2010 or newer). A detailed explanation of the functions and differences can be found in Sect. 2.4: Common Statistical Distributions.

Table 8.2 Excel statistical functions

Full size table

8.3 Excel Macros and Security

Macros are Excel’s version of functions, or user-written code, that Excel can execute. The programming language used by Excel is called Visual Basic (VB).

In Excel 2003, code can be inserted by going to Tools → Macro → Visual Basic Editor (Alt + F11). In Excel 2007 or newer, code can be inserted by going to the View Ribbon and selecting the Macro icon and then View Macro. For both versions of Excel, in the window that appears, enter the name of the function that you desire to create (or edit) and press Create (Edit). If a new function is being created then, in the new window that opens, replace Sub with Public Function. This will allow the new code to be directly accessed from the spreadsheet by typing = FunctionName(Required Parameters). Below, some sample code has been provided that implements the Michaelis–Menten equation.

Public Function MichaelisMenten(Concentration, vmax, KM) As Double

'This function will contain a single line of code that 'implements the Michaelis-Menten equation

MichaelisMenten = vmax * Concentration / (KM + Concentration)

End Function

8.3.1 Security in Excel

Unfortunately, when a macro is designed, Excel has the tendency to be paranoid and think that it is always a nasty virus. Thus, the appropriate parameters should be set for security. The procedure in dealing with security in Excel depends on the version of Excel installed. The following sections explain the procedures for Excel 2003 (and older) and Excel 2007 (and newer).

8.3.1.1 Dealing with Security in Excel 2003 or Older

In Excel 2003, go to Tools → Macros → Security. A new window will appear. Select Medium. With this option, Excel will always ask you on opening the file whether you wish to activate the macros or not. When a file with macros is loaded, then the warning shown in Fig. 8.2 will appear when the file is initially loaded.

8.3.1.2 Dealing with Security in Excel 2007 or Newer

In Excel 2010 or newer, to set the security, go to the File menu, and select Options. In Excel 2007, go the Start button instead of the File menu and select Excel Options. Select Trust Center in the window that appears. After this, select Trust Center Settings…. In the new window, go to Macro Settings and select the appropriate level of security you desire. A good choice is to select the option Disable All Macros with Notification because the macros will be disabled, but you will be notified of their existence. Press OK on all the open windows to save the changes. A file with a macro must be saved as an .xlsm file. In Excel 2007 or newer, when opening the file with a macro and the suggested above settings, a warning will appear. Clicking on the warning will in Excel 2010 or newer activate the macros (shown in Fig. 8.3), while in Excel 2007 another window will appear which allows the macros to be activated (shown in Fig. 8.4).

8.4 The Excel Solver Add-In

Solver is an Excel add-in that allows the user to iteratively solve systems of equations. Unfortunately, it is not installed by default on most computers.

8.4.1 Installing the Solver Add-In

In Excel 2003, in order to install the Solver add-in, go to Tools → Add-in. In the window that appears, which is shown in Fig. 8.6, select Solver Add-in and press OK. Excel will then try to install the add-in. This may require the original Excel CD or DVD.

In Excel 2007, in order to install the Solver add-in, go to the Start button and select Excel Options at the bottom of the menu that appears. In the new window that appears, select Add-ins. Finally, go to the Go… button and a window similar to Fig. 8.6 should appear.

In Excel 2010 or 2013, in order to install the Solver add-in, go to the File menu and select Options at the bottom of the Menu that appears. In the new window that appears, select Add-ins. Finally, click the Go… button. The last two steps are shown in Fig. 8.5. A window similar to Fig. 8.6 should appear.

8.4.2 Using the Solver Add-In

In order to start Solver, in Excel 2007 or newer, locate the Data ribbon and go to the extreme right-hand side in the area marked Analysis. Solver should be there as shown in Fig. 8.7. In Excel 2003 or older, go to Tools → Solver.

Figure 8.8 shows the main Solver window that appears in Excel 2010 or newer. It is a must that the option Make Unconstrained Variables Non-Negative be unchecked, as it can lead to wrong results otherwise. The following sections are important for use in regression analysis:

1.
Objective Function Value: this is the value of the objective function that is to be optimised.
2.
Type of Optimisation: what type of optimisation is desired: maximisation (Max), minimisation (Min), or force the solver to obtain a particular value (Value of). For regression, the minimisation option should be used.
3.
Variables: this is the range of the cells (variables) that the computer can vary to determine the solution. For regression, this would represent the cells where the parameter values have been entered.
4.
Constraints: this box lists the constraints for the problem. In order to add a constraint, click on the “Add” button. The window shown in Fig. 8.9 should appear. Once the desired form of the constraint has been selected, click Add to add the constraint to the list of constraints. Selecting a constraint from the box and clicking Change will cause the same window to appear and the properties of the constraint can be changed. Finally, selecting a constraint and clicking Delete will remove the constraint.
Fig. 8.9
Add constraint window
Full size image
5.
Solve: clicking this button will start the solver. The solution of the problem may take some time. Solver will either state that a solution was found (Fig. 8.10 (left)) or that no solution was found (Fig. 8.10 (right)). In general, if a solution is found, select Keep Solver Solution and press OK; otherwise, select Restore Original Values and press OK. If the Solver fails to find a solution, an error message will be included. It can give a suggestion as to how to fix the problem. Three common things to check (in order of precedence) are that:
Fig. 8.10
(Left) Solver found a solution and (right) Solver failed to find a solution (one possible result)
Full size image
1. (a)
  The number of iterations was not exceeded.
2. (b)
  The Excel spreadsheet and Solver were properly configured, especially that Box 7 in Fig. 8.8 is unchecked; and
3. (c)
  To make sure that the appropriate solver method was selected. Changing the solver method from GRG nonlinear to evolutionary can be useful.
6.
Options: clicking this button will bring up the window shown in Fig. 8.11. Each of the choices in this window can speed up or slow down the amount of time required to obtain a solution or even if a solution can be found. Each option will be discussed separately:
Fig. 8.11
Solver option window (Excel 2010 or newer)
Full size image
1. (a)
  Max Time: this represents the maximum amount of time that Solver will run in order to determine a solution. If the problem is large, then increasing this option can potentially allow Solver to find a solution.
2. (b)
  Iterations: this represents the maximum number of iterations that Solver will perform before it stops. If the initial guess is far from the solution, it may take many iterations before a solution is obtained. Thus, increasing the number of iterations can be a good idea.
3. (c)
  Precision: this represents the largest possible difference between the calculated value of the constraints and the specified value of the constraints. The smaller the number the longer it will take to find a solution.
4. (d)
  Tolerance: this is similar to precision but is used for integer constraints. It represents the percentage by which the calculated values differ from the specified values.
5. (e)
  Convergence: this is similar to precision but is used to compute the maximum allowable difference between 2 iterations of the parameters (or cells that can change). Since for most purposes, a relative value would be better, this entry should be changed whenever the parameters are expected to either be all very large numbers or very small numbers.
6. (f)
  Use Automatic Scaling: this should always be selected as it minimises the effect the magnitude of the different variables can have on the solution. It is especially important if one of the variables ranges from 100 to 1,000, but the other variable ranges from 0.01 to 1.

The options in the other tabs are mostly irrelevant and should be left at their default values unless the problem at hand requires special treatment. However, the correct approach to take requires consulting an appropriate source on numerical methods.

In the older version of Excel (2007 or older), the solver window has the form shown in Fig. 8.12. The available information is the same except that it is arranged slightly differently. Also, the method that Excel uses to solve the nonlinear problem is based on a simple Newtonian search, so that the results can be more difficult to obtain or different from those obtained using the newer methods. The options are shown in Fig. 8.13.

8.5 The Excel Data Analysis Add-In

The Data Analysis add-in in Excel is another very useful Excel add-in that can improve the ability to perform certain statistical tasks. It is installed using the same procedure as installing the Solver add-in (see Sect. 8.4.1: Installing the Solver Add-In). In order to start Solver, in Excel 2007 or newer, locate the Data ribbon and go to the extreme right-hand side in the area marked Analysis. Solver should be there as shown in Fig. 8.7. In Excel 2003 or older, go to Tools → Data Analysis.

The Data Analysis window is shown in Fig. 8.14. Although there are many different options, the main problem with the data analysis add-in is that the results are static and that any changes made in the original data set require the given programme to be rerun. Also, the display of information is not always the best. Nevertheless, for the purposes of this book, the only useful option is the Fourier analysis option, which will compute, given a data set, the appropriate Fourier coefficients, which can then be used to create a periodogram for the data set. An Excel template file has been created to simplify the process (see Sect. 8.6.3: Periodogram Template).

Selecting the Fourier analysis option will give the window shown in Fig. 8.15. There are only two key areas to consider. First, the input range must have a length of 2ⁿ, where n ∈ ℕ, that is, the length must be an integer power of 2. If the particular list is less than the desired value, then add extra zeros to the end of the list to make it an integer power of 2. The output range should have the same size and orientation as the input range, that is, if the input range is a column, then the output range should also be one, similarly for a row. Clicking OK will give the required Fourier coefficients.

8.6 Excel Templates

This section describes the Excel templates available from the book website. All templates have been tested on Excel 2013 and 2010. All should work on Excel 2007 as well. For the few that require special formulae, a separate Excel 2007 version has been included.

The following are some useful remainders when using the templates:

1.
Adding new rows: new rows should be entered inside the thick-bordered area. This will automatically update all formulae to include the new row. This can be accomplished by right-clicking on the appropriate row name and selecting Insert. A row will be inserted above the selected row. See Fig. 8.16 for an example.
Fig. 8.16
(Left) Inserting a row and (right) column (Excel 2013)
Full size image
2.
Adding new columns: new columns should be entered inside the double-bordered area. This will automatically update all formulae to include the new column. This can be accomplished by right-clicking on the appropriate column name and selecting Insert. A column will be inserted to the left of the selected column. See Fig. 8.16 for an example.
3.
Formulae: most formulae can be dragged down or across to fill the new data. It is suggested that you drag from the first row (or column) down to the last to make sure that everything is properly aligned.
4.
Macros: a few of the templates contain macros that allow for easier and better implementation of the given procedures. When macros are present, make sure that the security is appropriately set so that they can be used (see Sect. 8.3.1: Security).

8.6.1 Normal Probability Plot Template

Requirements: basic Excel installation.
Goal: create a normal probability plot in Excel that can be modified to deal with other distributions.
Filename: normplot.xltx
Description: a screen shot of the template with an explanation of the formulae used is shown in Fig. 8.17. The resulting normal probability plot is shown in Fig. 8.18. The steps for creating a normal probability plot can be summarised as follows:
Fig. 8.17
Normal probability plot data (the formulae given are those placed in the first row, and they would then be dragged down into each of the remaining rows)
Full size image

Fig. 8.18
Resulting normal probability plot
Full size image
1. 1.
  Place the original data in Column A.
2. 2.
  Obtain the order of the data in Column A in Column B. You can use the rank function.
3. 3.
  In Column C, enter = normsinv((ColumnB1-0.5)/count(Column$A)). 0.5 is subtracted from the original ranked value in order to avoid asking the computer for the location for which the probability is 100% (it is + ∞!).
4. 4.
  In Column D, compute the Z-score for each of the data points, that is, subtract the mean and divide the resulting value by the standard deviation of the values in Column A.
5. 5.
  Plot a scatter plot of the data in Columns C and D.
6. 6.
  The straight line can be added by plotting the data in Column C against itself.
Warnings: the axes of the plot are fixed to the range [−3.0, 3.0]. Should there be data outside this region, then it will be necessary to manually change the axis limits.

8.6.2 Box-and-Whisker Plot Template

Requirements: basic Excel installation.
Goal: create a box-and-whisker plot in Excel.
Filename: boxplot.xltx
Filename, Excel 2007 or Older: boxplot2007.xlt
Description: a screen shot of the box-and-whisker plot is shown in Fig. 8.19. The following steps can be used to create a box-and-whiskers plot in Excel from scratch:
Fig. 8.19
Box-and-whisker plot in Excel
Full size image
1. 1.
  Place the data in a column (or row) and call that range data.
2. 2.
  Compute the minimum, first quartile, median, third quartile, and maximum values. This can be accomplished by using the following formulae: min(data), quartile.inc(data,1), median(data), quartile.inc(data,3), and max(data). In Excel 2007 or older, one should replace quartile.inc by quartile.
3. 3.
  Create a column containing the following values in the specified order: Q1 – minimum, Q1, median – Q1, Q3, and maximum – Q3. This will allow the box-and-whisker plot to be properly created in Excel.
4. 4.
  Select the middle three items (Q1, median – Q1, and Q3) and create a stacked column graph. The steps required are shown in Fig. 8.20 for Excel 2013. The arrows provide the sequence of steps that should be followed to create the graph. The initial graph that is obtained now needs to be formatted to look like a box-and-whisker plot.
  Fig. 8.20
  Creating the initial graph for a box-and-whisker plot (Excel 2013). The arrows provide the sequence of events to follow
  Full size image
5. 5.
  Select the bottom blue box and add a negative error bar, which is equal to Q1 – minimum. Set the positive error bar equal to zero. The steps required are shown in Fig. 8.21 for Excel 2013. Select the top grey box and add a positive error bar, equal to maximum – Q3. The negative error bar should be set to zero. The same procedure would be followed, mutatis mutandis.
  Fig. 8.21
  Adding error bars (Excel 2013). The arrows provide the sequence of events to follow
  Full size image
6. 6.
  Once again, select the blue box and set the fill option to no fill and the border option to no line. The steps required are shown in Fig. 8.22 for Excel 2013. Select the orange box and set its fill to no fill. Repeat for the grey box. This should now look like a box-and-whisker plot. Additional formatting can be performed to obtain the final version.
  Fig. 8.22
  Changing the fill and border options (Excel 2013). The arrows provide the sequence of events to follow
  Full size image

8.6.3 Periodogram Template

Requirements: basic Excel installation plus installing the Data Analysis add-in (see Sect. 8.5: the Excel Data Analysis for how to install it.)
Goal: create both the full and half periodograms in Excel.
Filename: periodogram.xltx
Description: a screenshot of the template is shown in Fig. 8.23 with the resulting periodograms shown in Fig. 8.24. Note that every time new data are entered, it is necessary to rerun the Fourier analysis function in the Data Analysis add-in. The set-up of the Fourier transform window is shown as an inset in Fig. 8.23. Also, the number of data points must be a multiple of 2ⁿ where n is an integer, that is, 2, 4, 8, 16, 64, 128, 456,…. If the data set of interest is not a multiple, then it is necessary to add extra zeros to the end of the list to make it so.
Fig. 8.23
Periodogram template layout (Excel 2013). The inset shows how to initialise the Fourier analysis function
Full size image

Fig. 8.24
Sample full and half periodograms
Full size image

An explanation of the columns is as follows:
1. 1.
  Column A contains a simple count of the sample number starting from 1.
2. 2.
  Column B contains the values corresponding to each sample number. This column is called data and must be a multiple of 2ⁿ, where n is an integer.
3. 3.
  Column C contains the Fourier transform values as returned by the Fourier analysis function in Excel. The values are complex numbers and should not be changed.
4. 4.
  Column D contains the magnitude of the values in Column C, that is, =abs(C2). This column is used to construct the full periodogram.
5. 5.
  Column E contains the frequency corresponding to each sample, that is, =(A2-1)/COUNT(data).
6. 6.
  Column F contains the half periodogram frequencies, which is basically the first 2^{n – 1} values from Column E with the remaining values set to #N/A, so that they will be ignored. The formula used is = IF(A2-1<$I$2,E2,#N/A). It should be noted that cell $I$2 contains the centre point value.
7. 7.
  Column G contains the half periodogram magnitudes, which is basically twice the corresponding value in Column D, up to the centre point value, after which the values are arbitrarily set to #N/A. This allows the half periodogram to be plotted for an arbitrary number of values. The formula used is = IF(A2-1<$I$2,D2*2,"NaN").
8. 8.
  Full Periodogram: the full periodogram is created by plotting Column D as the y-axis and Column E as the x-axis.
9. 9.
  Half Periodogram: the half periodogram is created by plotting Column G as the y-axis and Column F as the x-axis.
Warnings: the Fourier transform function must be rerun each time the data are changed. Furthermore, the data length must always be a multiple of 2ⁿ, where n is an integer.

8.6.4 Linear Regression Template

Requirements: basic Excel installation plus ability to use macros.
Goal: perform linear regression in Excel in an easy and straightforward manner.
Filename: linearregresion.xltm
Description: a screenshot of the plain template is shown in Fig. 8.25. The yellow blocks are where the required data are entered. The green block represents the row in which an array formula needs to be entered. The complete green row should be selected and then the first cell highlighted. Finally, press Ctrl+Shift+Enter to copy the array formula to the entire green row. Adding additional parameters and data points will also require that the formulae be appropriately copied down. The spreadsheet automatically creates the normal probability plot for the residuals and plots of the residuals as a function of y and $ \widehat{y} $, as well as a time series plot of the residuals. Additional plots can be created by the user. An example of how to use the template is provided in Sect. 8.7.1: Linear Regression Example.
Fig. 8.25
Linear regression template
Full size image
Warning: this template requires that the internal macros be enabled. Also, the array formulae need to be properly entered.

8.6.5 Nonlinear Regression Template

Requirements: basic Excel installation plus installation of Solver and the ability to use macros.
Goal: perform nonlinear regression in Excel in an easy and straightforward manner.
Filename: nonlinearregresion.xltm
Description: a screenshot of the plain template is shown in Fig. 8.26. The yellow blocks are where the required data are entered. Note that Solver needs to be used to obtain a solution to the problem. The configuration of Solver is shown as an inset in Fig. 8.26. The layout and formatting of the results are similar to the linear regression case. Two important differences are that the model and its Jacobian must be entered as a macro and that Solver must be used. The spreadsheet automatically creates the normal probability plot for the residuals and plots of the residuals as a function of y and $ \widehat{y} $, as well as a time series plot of the residuals. Additional plots can be created by the user. An example of how to use the template is provided in Sect. 8.7.2: Nonlinear Regression Example.
Fig. 8.26
Nonlinear regression template. The inset shows how to set up the Solver (Excel 2013)
Full size image

The template comes with four predefined functions for creating the model and the corresponding Jacobian. Each function takes the same inputs: the range corresponding to the parameters and the range corresponding to the inputs. The fours functions are model, dydb1, dydb2, and dydb3. This approach is very similar to what MATLAB^® requires and provides the most flexibility in defining the relevant functions.
Warning: this template requires that the internal macros be enabled and Solver installed.

8.6.6 Factorial Design Analysis Template

Requirements: basic Excel installation and appropriate macro security.
Goal: perform the analysis of a factorial design experiment in Excel in an easy and straightforward manner.
Filename: factorialdesigntemplate.xltm
Description: a screenshot of the plain template is shown in Fig. 8.27. The yellow blocks are where the required data are entered. The green block represents the row in which an array formula needs to be entered. The complete green row should be selected and then the first cell highlighted. Finally, press Ctrl+Shift+Enter to copy the array formula to the entire green row. Adding additional parameters and data points will also require that the formulae be appropriately copied down.
Fig. 8.27
Analysis of factorial experiments template
Full size image

The spreadsheet automatically creates the normal probability plot for the parameters and residuals as well as plots of the residuals as a function of y and $ \widehat{y} $ and a time series plot of the residuals. Additional plots can be created by the user. An example of how to use the template is provided in Sect. 8.7.3: Factorial Design Examples.
Warning: this template requires that the internal macros be enabled. Also, the array formulae need to be properly entered.

8.7 Excel Examples

This section presents three examples that show how to implement various forms of regression analysis in Excel. The topics considered are linear regression, nonlinear regression, and analysis of factorial design. All examples are based on real data obtained from experiments. The procedures use the appropriate templates for solving the problem faster. The final form of the spreadsheet including all required information is provided as a reference for the user.

8.7.1 Linear Regression Example in Excel

This example examines the problem of fitting a theoretical equation to experimental data in order to obtain the values of the different constants in the system. Detailed information about the problem can be found in Prickett et al. (2011); Elliott et al. (2007); Prickett et al. (2010); and Jochem and Körber (1987). Data provided courtesy of Dr. Richelle Prickett.

8.7.1.1 Problem Statement for Linear Regression Example

Consider the problem of obtaining the values of the parameters in a theoretical equation that describes the osmotic pressure of the sodium chloride (NaCl) salt and hydroxyethyl starch (HES, chemical formula (C₆H₁₀O₅)_m(C₂H₅O)_n). Based on the virial equation of state, it is assumed that the following equation can be used to describe the osmolality (Π) of such a mixture:

$$ \varPi ={B}_3{m}_3^2+{B}_3{k}_{diss}{m}_2{m}_3+{C}_3{m}_3^3+{k}_c $$

(8.1)

where B ₃ and C ₃ are the virial parameters to be determined, m ₂ is the molality of NaCl in millimol/kg of solvent, m ₃ is the molality of HES in millimol/kg of solvent, k _diss is the disassociation constant that is equal to 1.678, and k _c is a known constant that depends on the system being analysed. An experiment was run where the ratio of the mass of HES to the mass of NaCl was fixed to 0.5. The results obtained are shown in Table 8.3.

Table 8.3 Fitting the virial equation (Excel example)

Full size table

8.7.1.2 Problem Solution for Linear Regression Example

Before linear regression can be applied, the above equation must be rearranged so that all known constant information is on the left-hand side and all the unknown variables are on the right-hand side. Thus, the equation would be rewritten as

$$ \varPi -{k}_c={B}_3\left({m}_3^2+{k}_{diss}{m}_2{m}_3\right)+{C}_3{m}_3^3 $$

(8.2)

The required variables would be defined as

$$ \begin{array}{c}y=\varPi -{k}_c\\ {}\overrightarrow{x}=\left\langle {m}_3^2+{k}_{diss}{m}_2{m}_3,{m}_3^3\right\rangle \\ {}\overrightarrow{\beta}={\left\langle {B}_3,{C}_3\right\rangle}^T\end{array} $$

(8.3)

Entering all the data in the Excel spreadsheet gives the results shown in Figs. 8.28 and 8.29.

Using the original data shows that the second point (Π = 1,314) is potentially an outlier, since its residual is extremely large. Thus, the row corresponding to this point (row 4 in the original layout) was deleted and the regression analysis was redone. The results are shown in Figs. 8.30 and 8.31. The results are much better as there are now no clear outliers and the data confidence intervals, especially for C ₃, are much smaller.

8.7.2 Nonlinear Regression Example in Excel

This example examines the problem of fitting a theoretical equation to experimental data in order to obtain the values of the different constants in the system. Unlike the previous case, nonlinear regression must be performed in order to obtain a result. Detailed information about the problem can be found in Ross-Rodriguez (2009). Data provided courtesy of Dr. Lisa Ross-Rodriguez.

8.7.2.1 Problem Statement for Nonlinear Regression Example

Consider the problem of obtaining a relationship for the ratio between the equilibrium and isotonic cell volumes given the osmotic pressure. The theoretical relationship can be written as

$$ \frac{V}{V_0}=\left(1-{b}^{*}\right)\frac{-1+\sqrt{1+4B{\varPi}_0}}{-1+\sqrt{1+4B\varPi }}+{b}^{*} $$

(8.4)

where both B and b* are the parameters to be determined and Π ₀ is a known osmotic value. The experimental data are provided in Table 8.4. For this data set, Π ₀ has a value of 0.293.

Table 8.4 Equilibrium cell volume data (Excel example)

Full size table

8.7.2.2 Problem Solution for Nonlinear Regression Example

Before we set up the problem in Excel, it is first necessary to compute some preliminary information. First, we need to obtain the derivatives of Eq. (8.4) with respect to the parameters, that is,

$$ \frac{d\left(V/{V}_0\right)}{d{b}^{*}}=1-\frac{1-\sqrt{1+4B{\varPi}_0}}{1-\sqrt{1+4B\varPi }} $$

(8.5)

$$ \frac{d\left(V/{V}_0\right)}{dB}=2\left(1-{b}^{*}\right)\left[\frac{\varPi_0}{\sqrt{1+4B{\varPi}_0}\left(-1+\sqrt{1+4B\varPi}\right)}-\frac{\varPi \left(-1+\sqrt{1+4B{\varPi}_0}\right)}{\sqrt{1+4B\varPi }{\left(-1+\sqrt{1+4B\varPi}\right)}^2}\right] $$

(8.6)

It can clearly be seen that this equation is nonlinear in the parameters. Thus, nonlinear regression using Solver will be performed. In order to obtain values for the parameter confidence intervals using Equation (198), the grand Jacobian will be calculated using the “best” estimated values of the parameters and the above derivatives.

The nonlinear regression Excel template used is set up identically to that of the linear regression template. The only difference is that now the estimated parameter values are not computed using a formula. Instead, they must be determined using Solver. Given the problem set-up, initial parameter estimates can be a bit of an issue, as the solution is sensitive to them. A recommended initial guess would be 0.5 for b ^* and 2.5 for B. The macros are shown in Sect. 8.7.2.3: VB Macros.

The final results are shown in Fig. 8.32. Figure 8.33 shows the normal probability plot and a time series plot of the residuals. It is easy to note that the B parameter is not significant and its value could be zero. Given the overall good fit and the relative well-behaved nature of the residuals, this would suggest that potentially not enough data have been collected to make an appropriate estimate. This situation partly explains why the Solver can have issues with obtaining a good value for B. The residual plots are shown in Fig. 8.33. Overall the results are decent, given the small sample. Since it has been assumed that the given equation holds, in order to obtain a better understanding of the data, additional experiments should be performed.

8.7.2.3 VB Macros

The macros required for performing the detailed regression analysis are shown here. There are three main macros: (1) to compute the model parameters (model), (2) to compute the derivative of the model with respect to b ^* (dydb1), and (3) to compute the derivative of the model with respect to B (dydb3).

Public Function model(parameter, x)

bs = parameter(1)

B = parameter(2)

model = (1 - bs) * (-1 + Sqr(1 + 4 * B * 0.293)) / (-1 + Sqr(1 + 4 * B * x(1))) + bs

End Function

Function dydb1(parameter As Range, x As Range)

bs = parameter(1)

B = parameter(2)

dydb1 = 1 - (-1 + Sqr(1 + 4 * B * 0.293)) / (-1 + Sqr(1 + 4 * B * x(1)))

End Function

Function dydb3(parameter As Range, x As Range)

bs = parameter(1)

B = parameter(2)

Pio = 0.293

so = Sqr(1 + 4 * B * Pio)

s = Sqr(1 + 4 * B * x(1))

dydb3 = 2 * (1 - bs) * (Pio / so / (s - 1) - x(1) * (so - 1) / s / (s - 1) ^ 2)

End Function

8.7.3 Factorial Design Examples Using Excel

This section presents the Excel spreadsheets for analysing some of the factorial design experiments presented in Chap. 4. The examples are all based on the factorial design template. The following examples have a corresponding Excel spreadsheet:

1.
Figure 8.34: Example 4.2, Analysis of a Full Factorial Experiment, from Sect. 4.4.4, Projection;
Fig. 8.34
Factorial design: full factorial example
Full size image
2.
Figure 8.35: Sect. 4.7.4, Detailed Mixed-Level Example; and
Fig. 8.35
Factorial design: mixed-level example
Full size image
3.
Figure 8.36: Sect. 4.8.2, Factorial Design with Centre Point Example.
Fig. 8.36
Factorial design: combined factorial and centre point example
Full size image

8.8 Further Reading

The following are references that provide additional information about the topic:

1.
General Excel Help:
1. (a)
  Harvey G (2013) Excel^® 2013 All-in-One for dummies. Wiley, Hoboken
2. (b)
  Brillo J (2007) Excel for scientists and engineers: numerical methods. Wiley, Hoboken
3. (c)
  Schmuller J (2013) Statistical analysis with Excel^® for dummies, 3rd edn. Wiley, Hoboken
2.
Linear Regression Data Set:
1. (a)
  Elliott JA, Prickett RC, Elmoazzen HY, Porter KR, McGann LE (2007) A multisolute osmotic virial equation for solutions of interest in biology. J Phys Chem B 111:1775–1785
2. (b)
  Prickett RC, Elliott JA, McGann LE (2010) Application of the osmotic virial equation in cyrobiology. Cryobiology 2010:30–42
3. (c)
  Prickett RC, Elliott JA, McGann LE (2011) Application of the multisolute osmotic virial equation to solutions containing electrolytes. J Phys Chem B 115:14531–14543
4. (d)
  Jochem M, Körber C (1987) Extended phase diagrams for the ternary solutions H₂O − NaCl − glycerol and H₂O − NaCl − hydroxyethylstarch (HES) determined by DSC. Cryobiology 24:513–536
3.
Nonlinear Regression Data Set:
1. (a)
  Ross-Rodriguez LU (2009) Cellular osmotic properties and cellular responses to cooling. University of Alberta, Edmonton

References

Elliott JA, Prickett RC, Elmoazzen HY, Porter KR, McGann LE (2007) A multisolute osmotic virial equation for solutions of interest in biology. J Phys Chem B 111:1775–1785
Article CAS Google Scholar
Harvey G (2013) Excel® 2013 All-in-One for dummies. Wiley, Hoboken
Google Scholar
Jochem M, Körber C (1987) Extended phase diagrams for the ternary solutions H₂O − NaCl − glycerol and H₂O − NaCl − hydroxyethylstarch (HES) determined by DSC. Cryobiology 24:513–536
Article CAS Google Scholar
Prickett RC, Elliott JA, McGann LE (2010) Application of the osmotic virial equation in cyrobiology. Cryobiology 2010:30–42
Article Google Scholar
Prickett RC, Elliott JA, McGann LE (2011) Application of the multisolute osmotic virial equation to solutions containing electrolytes. J Phys Chem B 115:14531–14543
Article CAS Google Scholar
Ross-Rodriguez LU (2009) Cellular osmotic properties and cellular responses to cooling. University of Alberta, Edmonton
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Automation and Complex Systems (AKS), University of Duisburg-Essen, Duisberg, North Rhine-Westphalia, Germany
Yuri A. W. Shardt

Authors

Yuri A. W. Shardt
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shardt, Y.A.W. (2015). Using Excel^® to Do Statistical Analysis. In: Statistics for Chemical and Process Engineers. Springer, Cham. https://doi.org/10.1007/978-3-319-21509-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-21509-9_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21508-2
Online ISBN: 978-3-319-21509-9
eBook Packages: Chemistry and Materials ScienceChemistry and Material Science (R0)

Publish with us

Policies and ethics

Using Excel® to Do Statistical Analysis

Abstract

Similar content being viewed by others

Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

Sample Size, Mean, Standard Deviation, and Standard Error of the Mean

Design of Experiments Using R

Keywords

8.1 Ranges and Arrays in Excel

8.2 Useful Excel Functions

8.2.1 Array Functions in Excel

8.2.2 Statistical Functions in Excel

8.3 Excel Macros and Security

8.3.1 Security in Excel

8.3.1.1 Dealing with Security in Excel 2003 or Older

8.3.1.2 Dealing with Security in Excel 2007 or Newer

8.4 The Excel Solver Add-In

8.4.1 Installing the Solver Add-In

8.4.2 Using the Solver Add-In

8.5 The Excel Data Analysis Add-In

8.6 Excel Templates

8.6.1 Normal Probability Plot Template

8.6.2 Box-and-Whisker Plot Template

8.6.3 Periodogram Template

8.6.4 Linear Regression Template

8.6.5 Nonlinear Regression Template

8.6.6 Factorial Design Analysis Template

8.7 Excel Examples

8.7.1 Linear Regression Example in Excel

8.7.1.1 Problem Statement for Linear Regression Example

8.7.1.2 Problem Solution for Linear Regression Example

8.7.2 Nonlinear Regression Example in Excel

8.7.2.1 Problem Statement for Nonlinear Regression Example

8.7.2.2 Problem Solution for Nonlinear Regression Example

8.7.2.3 VB Macros

8.7.3 Factorial Design Examples Using Excel

8.8 Further Reading

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

Using Excel^® to Do Statistical Analysis