Projection methods in Excel

© Oxana Rodionova, Alexey Pomerantsev 
Russian Chemometrics Society http://rcs.chph.ras.ru/english.htm

Contents

Introduction
1. Projection methods
1.1. Concept and Notation
1.2. PCA 
1.3. PLS
1.4. Data pre-processing
1.5. Software
1.6. Didactic example
2. Application of Chemometrics Add-In
2.1. Installation 
2.2. General information
3. PCA
3.1. ScoresPCA
3.2. LoadingsPCA
4.PLS
4.1. ScoresPLS 
4.2. UScoresPLS 
4.3. LoadingsPLS 
4.4. WLoadingsPLS
4.5. QLoadingsPLS
5. PLS2 
5.1. ScoresPLS2 
5.2. UScoresPLS2
5.3. LoadingsPLS2 
5.4. WLoadingsPLS2 
5.5. QLoadingsPLS2 
6. Additional Matrix Functions
6.1. MIdent 
6.2. MIdentD2
6.3. MCutRows
6.4. MTrace
Conclusion
 

Introduction

The manual describes application of software Chemometrics Add-In that is a specially designed add-in for Microsoft Excel. Chemometrics Add-In is aimed to data analysis based on the following projection methods: Principal Component Analysis (PCA), Projection on Latent Structures (PLS1 and PLS2). All calculations are performed inside an active Excel workbook by means of the special user-defined worksheet functions in the same manner as it is done by standard math functions, e.g. MMULT. It is supposed that a reader has an experience in working within Excel environment and familiar with basic principals of array calculations by means of standard worksheet functions. 

Notation and methods of matrix algebra are intensively used in the manual. Readers who have lack of skills in this subject are highly recommended to study or at least get acquainted with the pertinent literature. The manual is not a textbook on multivariate data analysis. Only basic formulae essential for understanding of Chemometrics Add-In implementation are described here. A reader poorly acquainted with these methods is asked to address to the special literature 

The manual is illustrated by the examples placed in Projection.xls workbook, which accompanies the manual.

Important information on Projection.xls application

References to the examples are placed in the text of the manual as Excel objects. All examples are of abstract modeling type but their background is tightly connected with real-world cases.

Contents

1. Projection methods

 1.1. Concept and Notation

Projection methods are widely used for multivariate data analysis especially in chemometrics. They are applied both to a one-block data X (classification, e.g. PCA) and to a double-block data as X and Y (calibration, e.g.  PCR and  PLS). 

Let us consider the (I×J) data matrix X, where I is the number of objects (rows) and J is the number of independent variables (columns). Ordinary number of variables is rather high (J>>1). 

We can simultaneously analyze the (I×K) matrix Y, where I is the same number of objects and K is the number of responses.

 

Fig. 1 Multivariate data

The essence of the projection techniques is considerable reduction of the data dimensionality both for blocks X and Y. The are many reviews of the projection methods and the reader is referred to them for more details.

Contents

1.2. PCA

Principal Component Analysis (PCA) is the oldest projection method. This method uses new formal (or latent) variables ta (a=1,…A), which are linear combination of the original variables xj (j=1,…J). i.e.

ta=pa1x1+… paJxJ ,

or in the matrix notation

(1)

In this equation T is called the score matrix, or scores. Its dimension is (I×A). Matrix P is called the loading matrix, or loadings, and it has dimension (A×J). E is the residual (I×J) matrix.

Fig. 2 PCA graphic representation

New variables ta are often called principal components (PC) therefore the method is called Principal Component Analysis (PCA). The number of columns ta in matrix T, and  pa in matrix P, is equal to number A that is called the number of principal components. It defines the projection model complexity. Value of A is certainly less than the number of variables J and the number of objects I. Score and loading matrices have the following properties

The recurrent algorithm NIPALS, is often used for calculation of PCA scores and loadings. NIPALS extracts one principal component at a time. At first, the original matrix X is pre-processed (at least it is centered; see section 1.4) and thus it turns into matrix E0, a=0. Afterwards the following algorithm is applied

1. Select an initial vector t 
2. pt = tt Ea / ttt 
3. p = p / (ptp)½
4. t = Ea p / ptp
5. Return to step 2 until the convergence

As soon as component a is calculated, the following substitution is done:  ta=t, and pa=p. For calculation of the next principal component the new residuals are calculated in accordance with the following expression Ea+1 = Eat pt and the same algorithm is repeated with a+1 substituted for a.  

After principal component space is constructed, the new objects Xnew can be projected onto this space. In other words the score matrix Tnew can be calculated. In PCA this is easily done by the expression

TnewXnew P

Naturally Xnew matrix for the new objects should be pre-processed in the same way as the training matrix X, which was used for the PCA decomposition

Contents

1.3. PLS 

Projection on Latent Structures (PLS) can be considered as a generalization of PCA. In PLS the decompositions of matrices X and Y are conducted simultaneously 
 

Projection is built in order to maximize correlation between corresponding vectors of X-scores ta and Y-scores ua.

Fig. 3 PLS graphic representation

If Y block consists of several responses (i.e. K>1), two types of projections can be built; they are PLS1 and PLS2. In the first case the projection space is built separately for each response variable  yk . Scores  T (U) and loadings  P (W, Q) depend on the response in use. Such an approach is called PLS1. For the PLS2 method one common space is built for all responses. The following expressions show the properties of PLS projection matrices

The recurrent algorithm is used for calculation of PLS scores and loadings. This algorithm calculates one PLS component at a time. 

For PLS1 the algorithm is as follows. The initial matrix X and vector y are preprocessed (at least they are centered; see section 1.4) and thus they turn into matrix  E0 and vector f0, a=0. Afterwards the following steps are performed.

1. wt = fat Ea 
2. w = w / (wtw)½
3. t = Ea w
4. q = tt fa / ttt 
5. u = qfa / q2 
6. pt = tt Ea / ttt 

After current (a-th) PLS1 component has been calculated the following substitutions are done: ta=t, pa=p, wa=w, ua=u, and qa=q. For calculation of the next component new residuals should be calculated in accordance with the following expressions Ea+1 = Ea t pt and  fa+1 = faqt. The same algorithm is repeated for a+1 substituted for a

For PLS2 the algorithm is as follows. At first the initial matrices X and Y are preprocessed (at least they are centered; see section 1.4) and they turn into matrices E0 and F0, a=0. Afterwards the following steps are performed.

1. Select initial vector u 
2. wt = ut Ea 
3. w = w / (wtw)½ 
4. t = Ea w
5. qt = tt Fa / ttt 
6. u = Fa q/ qtq 
7. Return to step 2 until the convergence
8. pt = tt Ea / ttt 

After current (a-th) PLS2 component has been calculated the following substitutions are done: ta=t, pa=p, wa=w, ua=u, and  qa=q. For calculation of the next component, new residuals should be calculated in accordance with the following expressions Ea+1 = Ea t pt and  Fa+1 = Fatqt . Afterwards the same algorithm is repeated with a+1 substituted for a

Calculation of PLS scores Tnew and Unew for new objects (Xnew , Ynew) is more complicated than in PCA. 

For PLS1, matrix Xnew and vector  ynew are pre-processed in the same way as in the calibration dataset, and thus they turn into matrix E0 and vector  f0, a=0. Further, the following algorithm for components a=1,…, A is applied

1. t = Ea w
2. u =fa

After calculation of the current PLS1 component (a-th), new residuals are calculated as Ea+1 = Ea tpt and  fa+1 = faqt . The same procedure is then applied after substituting index a+1 for a

In PLS2 new matrices Xnew and Ynew are pre-processed in the same manner as training data and they turn into matrices E0 and F0, a=0. For each PLS2 component a=1,…, A , the following algorithm is applied

1. t = Ea w
2. u = Fa q/ qtq 

After calculation of the a-th PLS2 component new residuals are calculated as Ea+1 = Ea t pt and Fa+1 = Fatqt  . After substituting index a+1 for a, the same procedure is used.

Contents

1.4. Data pre-processing

It is worthy of mentioning that the PCA and PLS methods do not take into consideration the free term in the course of X and Y decomposition. This could be seen from Eq (1) and (2). It is initially supposed that all columns in matrices X и Y have zero mean values, i.e.

and      for each i=1,…, I and k=1,…, K

This condition can be easily satisfied by data centering. Centering means subtraction of matrix M from the original matrix X, i.e.

 

Centering is a column-wise operation. For each vector xj the mean value is calculated as

In that case M=(m11,..., mJ1), where 1 is the vector of identities with dimension I.

Centering is always an obligatory procedure before application of the projection methods. The second simplest pre-processing technique is scaling. 

Scaling is not as necessary as centering. In comparison with centering, scaling does not change the data structure but simply modifies the weights in different parts of data. The most widely used scaling is a column-wise one. This can be expressed as a right multiplication of matrix X by matrix W, i.e. 

Matrix W  is the diagonal matrix with dimension J×J.  As a rule, diagonal elements wjj are equal to the inverse values of standard deviations, i.e. 

calculated for each column xj. Row-wise scaling (also called normalization) is a left multiplication of matrix X by a diagonal matrix W, i.e.

In this case dimension of W is I×I, and its elements wii are ordinary the inverse values of standard deviations calculated for each row xit

Combination of centering and column-wise scaling

is called autoscaling

Data scaling is often used to compensate contribution of various variables to a model (i.e. in hyphenated methods such as LC-MS), to take into account the heterogeneous errors, or in the case when different data blocks should be processed in one model. Scaling can also be seen as a method for numerical calculations stabilization. At the same time scaling should be used with a caution as such a pre-processing could essentially change the results of quality analysis. Each pre-processing (centering, scaling, etc) is firstly applied to the calibration data set. This set is used for calculation of mj and dj, which afterwards are applied for the pre-processing both of the training and test sets.

Contents

1.5. Software

Projection methods are implemented in various software packages. Short review of some popular programs is presented in this section. End-users mainly apply special packages. Among them are the following: SIMCA developed by  Umetrics, and a similar software the Unscrambler developed by  Camo. Among the most popular packages is also PLS ToolBox developed by  Eigenvector Research, Inc. First two packages work in a stand alone mode, whilst the latter works in the MatLab environment. 

Many projection methods are included in the general statistical packages such as  Statistica, SPSS , and others.

There is one shareware program. This is Multivariate Analysis Add-in  for Excel developed in Bristol Center for Chemometrics headed by prof. R. Brereton

Contents

1.6. Didactic example 

File Projection.xls is used for illustration of the Chemometrics add-in facilities. The performance of all the above-mentioned methods is illustrated with a simulated data set (X, Y), which should be centered but not scaled. 

File Projection.xls includes the following worksheets: 

Intro: short introduction  

Data: data used in the example. The block of predictors X consists of 14 objects (9 in the calibration and 5 in the test sets) and 50 variables. The Y block includes two responses, which correspond to the same 14 objects. The scheme, that explains arrays' names is also presented on the same worksheet. 

PCA: application of worksheet functions ScoresPCA and LoadingsPCA

PLS1: application of worksheet functions ScoresPLS, UScoresPLS, LoadingsPLS, WLoadingsPLS and QLoadingsPLS

PLS2: application of worksheet functions ScoresPLS2, UScoresPLS2, LoadingsPLS2, WLoadingsPLS2 and QLoadingsPLS2.

Plus: application of additional worksheet functions MIdent, MIdentD2, MTrace, and MCutRows.

Unscrambler: comparison the results obtained by Chemometrics Add-In and the Unscrambler program

SIMCA-P: comparison the results obtained by Chemometrics Add-In and  SIMCA program

Install: instruction for Chemometrics Add-In installation

Contents

2. Application of Chemometrics Add-In

2.1. Installation

To start working with Chemometrics Add-In, the software should be properly installed. The setup package consists of two files: Chemometrics.dll and Chemometrics.xla.

The setup package with Demo version of  Chemometrics Add-In can be downloaded  here. This version is not limited in functionality, but it can be used with "our own" files only. These files (Calibration.xls, Excel.xls, Projections.xls, etc) may be modified in any way, but the changes cannot be saved. Full version may be granted by demand.

The setup files can be placed manually in accordance with the instruction given in worksheet Install . Afterwards Chemometrics Add-In should be installed as any other Excel add-in

Chemometrics Add-In can be installed automatically with the help of button  Install  in worksheet Install 

 

Download files Chemometrics.dll and Chemometrics.xla to some folder in your computer, e.g. G:\InstallAddIn.

If Chemometrics Add-In has already been installed on the computer, the procedure will be terminated. In other case, after confirmation of the intention, the following dialog appears.

 

Here you should select the folder where Chemometrics setup package is located. If you point out a wrong folder, the following message appears

 

and you should find a proper folder on your computer.

In the course of installation various information boxes could appear, e.g.  

 

It is recommended to confirm all such queries.

If installation succeeded, you will see the message box    

 

in which all operations are listed.

If installation failed or cancelled, the following message is displayed  

 

After successful completion of installation, no other Add-Ins related operations are required, and you can start to explore Chemometrics Add-In immediately without reloading Excel. 

In some cases, after successful Add-in installation you will see "########" or "#VALUE!" in cells with chemometric functions. Press the ALT-Ctrl-F9 key for immediate recalculation of the whole Projection.xls workbook

Contents

2.2. General information

Chemometrics Add-In takes all input information from an active Excel workbook. This information should be placed directly in the areas of worksheets (data X and Y).

A user may organize one's working space in the most convenient way, to place all data onto one worksheet, or share them between several worksheets, or even use the worksheets in various workbooks. The results are output as the arrays in the worksheets. Software does not have any limitations on the input arrays' dimension, i.e. the number of objects (I), the number of variables (J), and the number of responses (K). The input data size is limited by the computer memory supported by the operation system in use and also by limitations of Excel software

Chemometrics Add-In includes several functions, which can be used as standard worksheet functions. For this purpose one can use Insert Function dialog box and select User Defined category. In the Select a function window all functions described bellow can be found.

Returned values are arrays, therefore the functions should be entered as array formula, i.e. applying Ctrl+Shift+Enter at the end of formula input. 

Function arguments can be numbers, names, arrays, or references containing numbers. 

To input an array formula first it is necessary to select the area on the worksheet, which corresponds to the dimension of the output array. If the selected area is larger than the output array, the superfluous cells are filled with the #N/A symbols. On the contrary, in case the selected area is smaller than the output array, part of the output information is lost.

Number of principal/PLS components 
Each function has an optional argument PC that defines the number of PCs (A). If PC is omitted, the output corresponds to the selected area. If PC value is more than min (I, J), then decomposition is done for the maximum possible number of PCs and superfluous cells are filled with the #N/A symbols.

Centering and/or scaling
Each function has the optional arguments CentWeightX and CentWeightY, which define whether centering and/or scaling for X and Y arrays are performed. These arguments can be as follows

0 - no centering, and no scaling (default value)
1 - only centering, i.e. subtraction of the column-wise mean values
2 - only scaling by the column-wise standard deviations 
3 - centering and scaling, i.e. autoscaling

If any argument CentWeightX or CentWeightY is omitted it is assumed to be equal to 0.

Contents

3. PCA

3.1. ScoresPCA 

Performs decomposition of matrix X by using the principal component analysis (PCA), and then returns an array that presents the score values Tnew calculated for matrix Xnew.

Syntax

ScoresPCA(X [, PC] [,CentWeightX] [, Xnew])

X is the array of X-values (calibration set) 

PC is an optional argument (integer), which defines the number of principal components (A), used in the PCA decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling is done 

Xnew is an optional argument that presents an array of new values Xnew (test set) for which the score values are calculated

Remarks

  • Arrays Xnew and X must have the same number of columns

  • If argument Xnew is omitted, it is assumed to be the same as X, and thus the calibration score values T are returned

  • The result is an array (matrix) with the number of rows equals the number of rows in array Xnew, and the number of columns equals the number of PCs (A)

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is TREND

Example

Fig. 4 Example of input for function ScoresPCA

Contents

3.2. LoadingsPCA 

Performs decomposition of matrix X by using the principal component method, and then returns an array that presents the loading values P 

Syntax

LoadingsPCA (X [, PC] [,CentWeightX])

X is the array of X-values (calibration set) 

PC is an optional argument (integer), which defines the number of principal components (A), used in the PCA decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling is done 

Remarks

  • The result is an array (matrix) with the number of rows equals the number of columns (J) in array X, and the number of columns equals the number of PCs (A)

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is: MINVERSE.

Example

Fig. 5 Example of input for function LoadingsPCA

Contents

4. PLS

4.1. ScoresPLS 

Performs decomposition of matrices X and Y by using the method of projection on latent structure (PLS), and then returns an array that presents the PLS score values Tnew calculated for matrices Xnew.

Syntax

ScoresPLS(X, Y [, PC] [,CentWeightX] [, CentWeightY] [, Xnew])

X is the array of X-values (calibration set) 

Y is the array of Y-values (calibration set) 

PC is an optional argument (integer), which defines the number of PLS components (A), used in the PLS decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling for matrix X  is done 

CentWeightY is an optional argument (integer) that indicates whether centering and/or scaling for matrix Y  is done 

Xnew is an optional argument that presents an array of new values Xnew (test set) for which the PLS score values are calculated

Remarks 

  • Array Y must have only one column (K=1)

  • Arrays Y and X must have the same number of rows (I)

  • Arrays Xnew and X must have the same number of columns (J)

  • If argument Xnew is omitted, it is assumed to be the same as X, and thus the calibration PLS score values T are returned

  • The result is an array (matrix) with the number of rows equals the number of rows in array Xnew, and the number of columns equals the number of PCs (A)

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is TREND 

Example

Fig.6 Example of output for function ScoresPLS

Contents

4.2. UScoresPLS 

Performs decomposition of matrices X and Y by using the method of projection on latent structure (PLS), and then returns an array that presents the PLS score values Unew calculated for matrices Xnew and Ynew.

Syntax

UScoresPLS(X, Y [, PC] [,CentWeightX] [, CentWeightY] [, Xnew] [, Ynew])

X is the array of X-values (calibration set) 

Y is the array of Y-values (calibration set) 

PC is an optional argument (integer), which defines the number of PLS components (A), used in the PLS decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling for matrix X  is done 

CentWeightY is an optional argument (integer) that indicates whether centering and/or scaling for matrix Y  is done 

Xnew is an optional argument that presents an array of new values Xnew (test set) for which the PLS score values are calculated

Ynew is an optional argument that presents an array of new values Ynew (test set) for which the PLS score values are calculated

Remarks 

  • Arrays Y and Ynew must have only one column (K=1)

  • Arrays Y and X must have the same number of rows (I)

  • Arrays Xnew and X must have the same number of columns (J)

  • If argument Xnew is omitted, it is assumed to be the same as X, and thus the calibration PLS score values U are returned

  • If argument Ynew is omitted, it is assumed to be the same as Y, and thus the calibration PLS score values U are returned

  • The result is an array (matrix) with the number of rows equals the number of rows in array Xnew, and the number of columns equals the number of PCs (A)

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is TREND 

Example

Fig.7 Example of output for function UScoresPLS

Contents

4.3 LoadingsPLS 

Performs decomposition of matrices X and Y by using the method of projection on latent structure (PLS), and then returns an array that presents the loadings values P

Syntax

LoadingsPLS(X, Y [, PC] [,CentWeightX] [, CentWeightY])

X is the array of X-values (calibration set) 

Y is the array of Y-values (calibration set) 

PC is an optional argument (integer), which defines the number of PLS components (A), used in the PLS decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling for matrix X  is done 

CentWeightY is an optional argument (integer) that indicates whether centering and/or scaling for matrix Y  is done 

Remarks 

  • Array Y must have only one column (K=1)

  • Arrays Y and X must have the same number of rows (I)

  • The result is an array (matrix) with the number of rows equals the number of columns (J) in array X , and the number of columns equals the number of PCs (A)

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is MMULT.

Example

Fig. 8 Example of input for function LoadingsPLS

Contents

4.4 WLoadingsPLS 

Performs decomposition of matrices X and Y by using the method of projection on latent structure (PLS), and then returns an array that presents the loading weights values W

Syntax

WLoadingsPLS(X, Y [, PC] [,CentWeightX] [, CentWeightY])

X is the array of X-values (calibration set) 

Y is the array of Y-values (calibration set) 

PC is an optional argument (integer), which defines the number of PLS components (A), used in the PLS decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling for matrix X  is done 

CentWeightY is an optional argument (integer) that indicates whether centering and/or scaling for matrix Y  is done 

Remarks 

  • Array Y must have only one column (K=1)

  • Arrays Y and X must have the same number of rows (I)

  • The result is an array (matrix) with the number of rows equals the number of columns (J) in array X, and the number of columns equals the number of PCs (A)

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is MMULT.

Example

Fig. 9 Example of input for function WLoadingsPLS

Contents

4.5 QLoadingsPLS 

Performs decomposition of matrices X and Y by using the method of projection on latent structure (PLS), and then returns an array that presents the loadings values Q

Syntax

QLoadingsPLS(X, Y [, PC] [,CentWeightX] [, CentWeightY])

X is the array of X-values (calibration set) 

Y is the array of Y-values (calibration set) 

PC is an optional argument (integer), which defines the number of PLS components (A), used in the PLS decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling for matrix X  is done 

CentWeightY is an optional argument (integer) that indicates whether centering and/or scaling for matrix Y  is done 

Remarks 

  • Array Y must have only one column (K=1)

  • Arrays Y and X must have the same number of rows (I)

  • The result is an array (vector) with the number of columns equals the number of PCs (A), and rows equals the number of responses K, i.e. 1 

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is MMULT

Example

Fig. 10 Example of input for function QLoadingsPLS

Contents

5. PLS2

5.1. ScoresPLS2 

Performs decomposition of matrices X and Y by using the method of projection on latent structure (PLS2), and then returns an array that presents the PLS2 score values Tnew calculated for matrices Xnew.

Syntax

ScoresPLS2(X, Y [, PC] [,CentWeightX] [, CentWeightY] [, Xnew])

X is the array of X-values (calibration set) 

Y is the array of Y-values (calibration set) 

PC is an optional argument (integer), which defines the number of PLS components (A), used in the PLS decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling for matrix X  is done 

CentWeightY is an optional argument (integer) that indicates whether centering and/or scaling for matrix Y  is done 

Xnew is an optional argument that presents an array of new values Xnew (test set) for which the PLS score values are calculated

Remarks 

  • Arrays Y and X must have the same number of rows (I)

  • Arrays Xnew and X must have the same number of columns (J)

  • If argument Xnew is omitted, it is assumed to be the same as X, and thus the calibration PLS2 score values T are returned

  • The result is an array (matrix) with the number of rows equals the number of rows in array Xnew, and the number of columns equals the number of PCs (A)

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is TREND 

Example

Fig.11 Example of output for function ScoresPLS2

Contents

5.2. UScoresPLS2 

Performs decomposition of matrices X and Y by using the method of projection on latent structure (PLS2), and then returns an array that presents the PLS score values Unew calculated for matrices Xnew and Ynew.

Syntax

UScoresPLS2(X, Y [, PC] [,CentWeightX] [, CentWeightY] [, Xnew] [, Ynew])

X is the array of X-values (calibration set) 

Y is the array of Y-values (calibration set) 

PC is an optional argument (integer), which defines the number of PLS components (A), used in the PLS2 decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling for matrix X  is done 

CentWeightY is an optional argument (integer) that indicates whether centering and/or scaling for matrix Y  is done 

Xnew is an optional argument that presents an array of new values Xnew (test set) for which the PLS2 scores values U are calculated

Ynew is an optional argument that presents an array of new values Ynew (test set) for which the PLS2 scores values U are calculated

Remarks 

  • Arrays Y and Ynew must have  the same number of column (K)

  • Arrays Y and X must have the same number of rows (I)

  • Arrays Xnew and X must have the same number of columns (J)

  • If argument Xnew is omitted, it is assumed to be the same as X, and thus the calibration PLS2 score values U are returned

  • If argument Ynew is omitted, it is assumed to be the same as Y, and thus the calibration PLS2 score values U are returned

  • The result is an array (matrix) with the number of rows equals the number of rows in array Xnew, and the number of columns equals the number of PCs (A)

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is TREND 

Example

Fig.12 Example of output for function UScoresPLS2

Contents

5.3 LoadingsPLS2 

Performs decomposition of matrices X and Y by using the method of projection on latent structure (PLS2), and then returns an array that presents the loadings values P

Syntax

LoadingsPLS2(X, Y [, PC] [,CentWeightX] [, CentWeightY])

X is the array of X-values (calibration set) 

Y is the array of Y-values (calibration set) 

PC is an optional argument (integer), which defines the number of PLS components (A), used in the PLS2 decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling for matrix X  is done 

CentWeightY is an optional argument (integer) that indicates whether centering and/or scaling for matrix Y  is done 

Remarks 

  • Arrays Y and X must have the same number of rows (I)

  • The result is an array (matrix) with the number of rows equals the number of columns (J) in array X, and the number of columns equals the number of PCs (A)

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is MMULT.

Example

Fig. 13 Example of output for function LoadingsPLS2

Contents

5.4 WLoadingsPLS2 

Performs decomposition of matrices X and Y by using the method of projection on latent structure (PLS2), and then returns an array that presents the loadings weights values W

Syntax

WLoadingsPLS2(X, Y [, PC] [,CentWeightX] [, CentWeightY])

X is the array of X-values (calibration set) 

Y is the array of Y-values (calibration set) 

PC is an optional argument (integer), which defines the number of PLS components (A), used in the PLS2 decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling for matrix X  is done 

CentWeightY is an optional argument (integer) that indicates whether centering and/or scaling for matrix Y  is done 

Remarks 

  • Arrays Y and X must have the same number of rows (I)

  • The result is an array (matrix) with the number of rows equals the number of columns (J) in array X, and the number of columns equals the number of PCs (A)

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is MMULT.

Example

Fig. 14 Example of input for function WLoadingsPLS2

Contents

5.5 QLoadingsPLS2 

Performs decomposition of matrices X and Y by using the method of projection on latent structure (PLS2), and then returns an array that presents the loadings values Q

Syntax

QLoadingsPLS2(X, Y [, PC] [,CentWeightX] [, CentWeightY])

X is the array of X-values (calibration set) 

Y is the array of Y-values (calibration set) 

PC is an optional argument (integer), which defines the number of PLS components (A), used in the PLS2 decomposition 

CentWeightX is an optional argument (integer) that indicates whether centering and/or scaling for matrix X  is done 

CentWeightY is an optional argument (integer) that indicates whether centering and/or scaling for matrix Y  is done 

Remarks 

  • Arrays Y and X must have the same number of rows (I)

  • The result is an array (matrix) with the number of columns equals the number of PCs (A), and rows equals the number of columns in array Y, i.e. the number of responses K 

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is MMULT.

Example

Fig. 15 Example of input for function QLoadingsPLS2

Contents

6. Additional Matrix Functions

6.1. MIdent  

Returns a square identity matrix

Syntax

MIdent(Size)

Size is argument (integer), which defines the matrix dimension

Remarks 

  • The result is a square array (matrix) with the number of rows and columns equal Size

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • No similar standard worksheet function can be found

Example

Fig. 16 Example of input/output for function MIdent

Contents

6.2. MIdentD2 

Returns two-diagonal rectangular matrix of zeros and units

Syntax

MIdentD2(Size, CutFrom, CutOff)

Size is argument (integer > 1), which defines the dimension of initial identity matrix

CutFrom is argument (integer > 0), which defines the row number where cut out starts

CutOff is argument (integer ≥ 0), which defines the number of rows that must be deleted from initial identity matrix

Remarks 

  • The result is a rectangular array (matrix) with the number of rows equal Size-CutOff and columns equal Size

  • In case (CutFrom + CutOff -1) > Size function returns #VALUE!

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • No similar standard worksheet function can be found

Example

Fig. 17 Example of input/output for function MIdentD2

Contents

6.3. MCutRows 

Returns a matrix with cut out rows

Syntax

MCutRows(X, CutFrom, CutOff)

X is an array to be cut out

CutFrom  is argument (integer > 0), which defines the row number where cut out starts

CutOff  is argument (integer 0), which defines the number of rows that must be deleted from an array X

Remarks 

  • Function calculates the initial dimensionality of array X: nRows and nColumns. The result is a rectangular array (matrix) with the number of rows equal nRows-CutOff and columns equal nColumns

  • In case (CutFrom + CutOff - 1) > nRows function returns #VALUE!

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is OFFSET

Example

Fig. 18 Example of input/output for function MCutRows

Contents

6.4. MTrace

Returns the matrix trace of an array

Syntax

MTrace(X) 

X is a square array

Remarks 

  • An array X must have the same number of columns and rows

  • Formula must be entered as an array formula terminated by CTRL+SHIFT+ENTER

  • Similar standard worksheet function is MDETERM

Example

Fig. 19 Example of input/output for function MTrace

Contents

 

Conclusion

Chemometrics Add-In provides possibilities for calculation of various scores and loadings for projection methods using original data. Furthermore, applying these results, typical graphic representation used in multivariate data analysis can be obtained, i.e. scores plots, loading plots, and others. Such an approach also provides possibilities for building calibration models, calculation of root mean square error of calibration (RMSEC) and prediction (RMSEP), and etc. 

Examples are demonstrated in files 

Contents