Our Scientific Activities
Simple Interval Calculation (SIC)
Simple Interval Calculation (SIC) is a method of linear modeling
y = Xa + errors
that gives the result of prediction directly in the interval form. The SIC also approach provides wide possibilities for the Object Status Classification, i.e. leverage-type classification of relative importance of the calibration and test sets samples with respect to a model.
Click the icon to open a PowerPoint file (= 870 kB) "Simple Interval Calculation (SIC) - Theory and Applications", presented at the Second Winter School on Chemometrics (WSC-2), Barnaul, Russia, 2003.
Click here to ask for the file by e-mail
The SIC approach is based on the single assumption that all errors are limited (sampling errors, measurement errors, modelling errors), which would appear to be reasonable in many practical applications. For prediction modelling, this leads to results that are in a convenient interval form. The SIC approach assesses the uncertainty of predicted values in such a way that each point of the resulting interval has equal 'possibility'. The SIC-interval is in contrast to the traditional confidence interval estimators, which are based upon theoretical error distributional model assumptions, which rarely hold for practical data analysis of technological and natural systems. No probabilistic measure is introduced on the error domain, therefore one does not have to evaluate the likelihood of the values within the resulting prediction interval.
The finiteness of the error helps to construct the Region of Possible Values (RPV) (Fig. 1), a limited region in the parameter space that includes all possible parameter values that satisfy the data set & model under consideration.
Fig.1: Illustration of RPV in model parameter space.
The initial data set contains 24 objects but only 12
were necessary to form the RPV.The SIC-approach does not use an(y) objective function (e.g. sum of squares) for the parameter estimate search. In conventional regression analysis these estimates are the values of unknown parameters, which agree with the experimental data in the best way. In the SIC method any model parameter value that does not contradict experimental data, i.e. lies inside or on the border of RPV, is accepted as a feasible estimate.
The RPV concept provides wide possibilities for selection the samples from calibration set that are of the most importance for model construction. This is because the RPV is formed not by all objects from the calibration set, but only by so-called boundary objects. Therefore, if we exclude all objects from the calibration set except boundary ones, the RPV will not change.
The position of a new objects (e.g. test set objects, or new X-data alone) in relation to the RPV helps to understand the object similarities/dissimilarities in comparison with those from the calibration set. The object status map (see Fig. 2), or so called the SIC influence plot, can be constructed for any dimensionality of initial data set [X, y] and any number of estimated model parameters.
Fig. 2: The example of Object Status map for real world data.
Samples Ñ1-Ñ11 () are the calibration objects.
Samples T1-T4 () are the test objects.
Samples C2, C1, C6, and C11 are the boundary objects.
Sample T1 is an insider, sample T2 is an outsider
Sample T3 is an absolute outsider. Sample T4 is an outlier.SIC returns an object status classification which divides the SIC- residual vs. SIC - leverage plane into three categories: 'insiders' (new objects very similar to the calibration set) and 'outsiders' (all other objects in the rest of this plane. It is possible to establish a further distinction between 'absolute outsiders' and more extreme 'outliers'.
Description of the SIC-method and the Object Status Classification approach is published in --
O. Ye. Rodionova, K. H. Esbensen, and A.L. Pomerantsev, "Application of SIC (Simple Interval Calculation) for object status classification and outlier detection - comparison with PLS/PCR", J. Chemometrics, 18 , 402-413 ( 2004)
DOI:10.1002/cem.885Click here to ask for the file by e-mail
No doubt that multivariate problems where data matrix is rank- deficient are of great practical interest. To apply SIC-method to such kind of problems we join it with traditional projection methods (e.g., principal component analysis or partial least squares).
We consider that the criteria of quality of interval prediction used in SIC-procedure allow to look at the old problems of multivariate data analysis from a new point of view. These problems are optimum number of PCs, outlier detection, missing data, and insignificant observations. The roots of the method are in the old ideas of Kantorovich to apply the linear programming to the data analysis. The calculation aspects of SIC-method are rather simple since they founded on the well-designed Simplex algorithm.
Now the SIC method is implemented in MATLAB script-language. The software description is presented here. The program may be downloaded as zip file.
An examples of the SIC-method are published in --
A.L. Pomerantsev, O.Ye. Rodionova, "Hard and soft methods for prediction of antioxidants' activity based on the DSC measurements", Chemom. Intell. Lab.Syst., 79 (1-2), 73-83 (2005)
DOI:110.1016/j.chemolab.2005.04.004Click here to ask for the file by e-mail A.L. Pomerantsev, O.Ye. Rodionova, A. Höskuldsson, "Process control and optimization with simple interval calculation method", Chemom. Intell. Lab.Syst., 81 (2), 165-179 (2006)
DOI:10.1016/j.chemolab.2005.12.005Click here to ask for the file by e-mail A.L. Pomerantsev and O.Ye. Rodionova, "Prediction of antioxidants activity using DSC measurements. A feasibility study", In Aging of Polymers, Polymer Blends and Polymer Composites, 2, Nova science Publishers, NY, 2002, pp. 19-29 (ISBN 1-59033-256-3). Click here to ask for the file by e-mail
O. Ye. Rodionova, A.L. Pomerantsev, "Principles of Simple Interval Calculations" In: Progress In Chemometrics Research, Ed.: A.L. Pomerantsev, 43-64, NovaScience Publishers, NY, 2005, (ISBN: 1-59454-257-0) Click here to ask for the file by e-mail A.L. Pomerantsev and O.Ye. Rodionova, "Multivariate Statistical Process Control and Optimization", Ibid, 209-227 Click here to ask for the file by e-mail
The main purpose of non-linear regression is to fit data with a non-linear model, to predict response for predictor values that are far from the observed ones, to estimate the uncertainties in prediction.
Click the icon to open a PowerPoint file (=1290 kB) "“Introduction to non-linear regression analysis" (in Russian), presented at the Second Winter School on Chemometrics (WSC-2), Barnaul, Russia, 2003.
Click here to ask for the file by e-mail
Consider example of rubber aging prediction. Data of accelerated aging tests, performed at temperatures: T=140C, 125C and 110C, are presented in Fig 3.
Fig 3: Experimental data (left Y- and bottom X-axes)
and predicted kinetics (right Y- and top X-axes)The response ELB is the 'Elongation at break' property that is measured in accordance with ASTM D412-87. The data are fitted with the first order kinetics, which rate constant k depends on temperature by the Arrhenius law:
ELB=ELB1+(ELB0-ELB1)*exp(-k*t), k=k0*exp[-E/(RT)],
where ELB0, ELB1, k0, and E are unknown parameters. Prediction is performed at normal temperature 20C and the left (bottom) limits of confidence intervals are obtained. This example is presented in:
E.V. Bystritskaya, O.Ye. Rodionova, and A.L. Pomerantsev "Evolutionary Design of Experiment for Accelerated Aging Tests", Polymer Testing, 19, 221-229, (1999)
DOI:10.1016/S0142-9418(98)00077-4Click here to ask for the file by e-mail
O. Y. Rodionova, A. L. Pomerantsev "Prediction of Rubber Stability by Accelerated Aging Test Modeling", J Appl Polym Sci, 95 (5) 1275-1284, (2005)
DOI:10.1002/app.21347Click here to ask for the file by e-mail Click here to know more about Evolutionary Design of Experiment (EDOE).
Making the forecast, it is essential to find not only the point prediction value, but also to characterize the uncertainty, which firstly depends on the extrapolation distance. Certainly, the most convenient way is to present the result of prediction as a confidence interval.
Fig 4: Upper bounds of confidence intervals versus confidence
probability P for various methods: F, A, M, B, L, S, and "exact" values TWe suggest a new method of confidence estimation for NLR, where, unlike bootstrap, we simulate parameter estimates, not initial data. The details are presented in
A.L. Pomerantsev "Confidence Intervals for Non-linear Regression Extrapolation", Chemom. Intell. Lab. Syst, 49, 41-48, (1999)
DOI:10.1016/S0169-7439(99)00026-XClick here to ask for the file by e-mail
The difference in the confidence intervals constructed for a nonlinear model by various methods can be very great (see Fig. 4), but in some cases this difference could be negligible from the "engineering" point of view. To explain this, we suggest a new coefficient of nonlinearity, which is used for the decision-making about the method that can be utilized for a given task. It is calculated by the Monte Carlo procedure and accounts for the model structure as well as the experimental design features. More information about the coefficient of nonlinearity is published in
E.V. Bystritskaya, A.L. Pomerantsev, and O.Ye. Rodionova "Nonlinear Regression Analysis: New Approach to Traditional Implementations", J. Chemometrics, 14, 667-692 (2000)
DOI: 10.1002/1099-128X(200009/12)14:5/6<667::AID-CEM614>3.0.CO;2-TClick here to ask for the file by e-mail
These ideas were implemented in the software FITTER, a new Excel Add-In.
Successive Bayesian Estimation
The successive Bayesian estimation (SBE) of regression parameters is an effective technique applied in nonlinear regression analysis. The main concept of SBE is to split the whole data set into several parts. Afterwards, estimation of parameters is performed successively - fraction by fraction - with Maximum Likelihood Method. It is important, that results obtained on the previous step are used as a priori values (in the Bayesian form) for the next part. During this procedure, the sequence of the parameter estimates is produced and its last term is the ultimate estimate. Description of SBE is published in
G.A. Maksimova, A.L. Pomerantsev, "Successive Bayesian Estimation of Regression Parameters", Zavod. Lab., 61, 432-435, (1995)
It was shown that this technique is correct and it gives the same values of estimates for linear regression as the traditional OLS approach. Moreover, in that case, the result does not depend on the order of the series. In non-linear regression case, the situation becomes more difficult but we can pose that all these properties are asymptotically the same.
Click the icon to open a PowerPoint file (=1890 kB) "Successive Bayesian estimation for linear and non-linear modeling", presented at the Second Winter School on Chemometrics (WSC-2), Barnaul, Russia, 2003 )
Click here to ask for the file by e-mail
This method is used for obtaining kinetic information from spectral data without any pure component spectra (Fig. 5, left). With the help of real-world example, this approach is compared with known methods of kinetic modeling (Fig 5, right).
![]()
Fig. 5: Successive estimates of kinetic parameters (left panel) and ultimate estimates with various methods presented by the 0.95 confidence ellipses (right panel)
This example is presented in
A.L. Pomerantsev "Successive Bayesian estimation of reaction rate constants from spectral data", Chemom. Intell. Lab. Syst, 66 (2), 127-139 (2003)
DOI:10.1016/S0169-7439(03)00028-5Click here to ask for the file by e-mail
O.Ye Rodionova, A.L Pomerantsev "On One Method of Parameter Estimation in Chemical Kinetics Using Spectra with Unknown Spectral Components", Kinetics and Catalysis, 45 (4): 455-466, (2004)
DOI: 10.1023/B:KICA.0000038071.51067.d5Click here to ask for the file by e-mail
FITTER is an Add-In procedure for Excel. If you are under Excel you can open FITTER as any add-in file using Tools/Add-Ins menu command. It will add the new menu item Fitter into Tools menu. Clicking it, the main Fitter dialog for starting FITTER is activated.
Fig. 6: Main Fitter dialogFITTER is a powerful instrument of statistical analysis. Using it you may solve multivariate nonlinear regression problems. Much of the power of FITTER comes from its ability to estimate parameter values of complicated user-defined functions that may be entered in ordinary algebraic notation as a set of explicit, implicit and ordinary differential equations. FITTER uses the unique procedure for analytic calculation of derivatives and special optimization algorithm which provides the high accuracy even for significantly nonlinear models. All complicated calculations are performed in the special DLL library created using C++ compiler, which provides high speed processing. FITTER allows to include prior knowledge about parameters and accuracy of measurement in addition to experimental data. Using Bayesian estimation, you can process both unlimited arrays of single-response data, and data referring to different responses.
Click the icon to open a PowerPoint file (=1290 kB) "Non-linear Regression Analysis with Fitter Software Application", presented at the First Winter School on Chemometrics (WSC-1), Kostroma, Russia, 2002.
Click here to ask for the file by e-mail
With the help of FITTER you can obtain a lot of additional statistical information concerning the input data and the quality of fitting. Parameter estimates, variances, covariance matrix, correlation matrix and F-matrix; the starting and final values of the sum of squares and objective function, error variance, and spread in eigenvalues of the Hessian matrix; error variance for each observation point calculated by fit and by population. Moreover, there are hypotheses testing for: Student's test for outliers, test of series for residual correlation, Bartlett's test for homoscedastisity, Fisher's test for goodness of fit. Also, you can calculate confidence intervals for each observation point by linearization method or with the help of modified bootstrap technique. Detailed description of Fitter application is presented in
E.V. Bystritskaya, A.L. Pomerantsev, and O.Ye. Rodionova "Nonlinear Regression Analysis: New Approach to Traditional Implementations", J. Chemometrics, 14, 667-692 (2000)
DOI: 10.1002/1099-128X(200009/12)14:5/6<667::AID-CEM614>3.0.CO;2-TClick here to ask for the file by e-mail
FITTER takes all information from open Excel workbook. Information should be placed directly on a worksheets (Data and Parameters) or written in a text box (Model). All results are also output as tables on the worksheets. In purpose to explain what information you want to use, you need to register it with the help of FITTER wizards. There are DATA, MODEL and BAYES registration wizards. While working with different FITTER wizards you only register the required information, change options and look through process of registration. You may change data only on the worksheets but not inside the wizards. Since your information (Data, Model, ...) has been registered it is kept in memory till you replace it by another registration.
An example of Fitter application to the diffusion problems solution is published in
A. L. Pomerantsev "Phenomenological modeling of anomalous diffusion in polymers", J Appl Polym Sci, 96(4) 1102 - 1114, (2005)
DOI:10.1002/app.21540Click here to ask for the file by e-mail
Estimation of the parameters of the Arrhenius equation often leads to multicollinearity, or, in other words, a degenerate set of equations in the least-squares procedure. This circumstance makes it difficult to estimate the unknown parameters. Simple expedients for model modification are considered that reduce multicollinearity.
O. E. Rodionova, A. L. Pomerantsev "Estimating the Parameters of the Arrhenius Equation", Kinetics and Catalysis, 46, 305–308, (2005).
DOI: 10.1007/s10975-005-0077-9Click here to ask for the file by e-mail
Click here to know more about Fitter software.
The problem of counterfeit drugs is important all over the world. For the first time the World Health Organization (WHO) obtained information about forgeries in 1982. At that time counterfeit drugs were mainly found in the developing countries. The definition for “counterfeit drug” by WHO is as follows: “A counterfeit medicine is one which is deliberately and fraudulently mislabeled with respect to identity and/or source. Counterfeiting can apply to both branded and generic products and counterfeit products may include products with the correct ingredients or with the wrong ingredients, without active ingredients, with insufficient active ingredient or with fake packaging”
Nowadays, there are “high quality” counterfeit drugs that are very difficult to detect. It is worth mentioning that fake drugs include dietary supplements too. In such medicine non-declared substances such as hormones, ephedrine, etc., may be found. According to WHO information the spread of counterfeit drugs in different countries are as follows: 70% of turnover is in developing countries and 30% is in market-economy countries. The distribution of fake drugs with respect to different therapeutic groups is as follows: (1) antimicrobial drugs 28%; (2) hormone-containing drugs 22% (including 10% of steroids); (3) antihistamine medicines 17%; (4) vasodilators 7%; (5) drugs used for treatment of sexual disorders 5%; (6) anticonvulsants 2%; (7) others 19%. Visual control, disintegration tests or simple color reaction tests reveal only very rough forgeries. More complicated chemical methods are also used but all these methods try to prove or disprove the content and concentration of an active ingredient. But the main goal is to discriminate genuine and counterfeit drug, even in cases where the counterfeit drug contains the sufficient concentration of active ingredient and as a result to answer the question: “Does given drug correspond to the original as it is marked on the package?”
Express-methods for detection of counterfeit drugs are of vital necessity. In many cases dosage forms contain not only active substances but also excipients. The exact content of excipients could differ for the genuine and fake drugs. It is proposed to apply near infra-red (NIR) spectroscopy that could be used both for identification of pharmaceutical substances and dosage forms independently of contents of an active ingredient. NIR also could give information about the excipients in a pharmaceutical preparation and thereby be able to detect counterfeit drugs even with proper active substance. A feasibility study has been published in
O.Ye. Rodionova, L.P. Houmøller, A.L. Pomerantsev, P. Geladi, J. Burger, V.L. Dorofeyev, A.P. Arzamastsev "NIR spectrometry for counterfeit drug detection", Anal. Chim. Acta, 549, 151-158 (2005)
DOI:10.1016/j.aca.2005.06.018Click here to ask for the file by e-mail
Two grades of tablets (antispasmodic drug, uncoated tablets, 40 mg) are investigated. Ten genuine tablets, subset N1, and 10 forgeries, subset N2 were measured using the InAs detector. After that one tablet from set N1 was cut in half and a spectrum of the interior of a cut tablet was measured; this was named N1Cut. The same procedure was done for one tablet from set N2. As a result, in total 22 spectra were obtained. These spectra were pre-treated by MSC and are shown in Fig. 7.
Fig 7: MSC pre-treated spectra . Blue lines (N1) are 11 genuine tablets spectra
and red lines (N2) are 11 counterfeit tablets spectra.The data are subjected to a principal component analysis. Taking into account two principal components (PCs) we come to the following results (Fig. 8, left). Two manifest clusters in the PC1–PC2 plane are seen. Thus, the subsets N1 and N2 may easily be discriminated. The object variance in subset N2 (counterfeit drug) is significantly greater than the variance between objects in subset N1 (genuine tablets). This may be explained by better manufacturing control for genuine tablets. Spectra for the cut tablets are not different from the whole tablets (compare open dots).
![]()
Fig. 8: PCA scores plot (left panel) and SIMCA plot (right panel). Blue dots represent genuine tablets (N1)
and red squares represent counterfeit tablets (N2). Open dots and squares show cut tablets.SIMCA method is applied to discriminate class N1 (genuine tablets) from any other counterfeit tablets .The “membership plot” that presents the distance to model si versus leverage hi is shown in Fig. 8, right panel. The limits are shown as white lines: horizontal for the distance to model and vertical for the leverage. It may be easily that the N1Cut object has a low leverage, but its distance to the model is greater than the limit though it lies not far from the model. Samples from set N2 are very far from the model and undoubtedly can be classified as non-members of this class.
In general, there is one class of genuine drug samples and there may be plenty of forgeries of different degrees of similarity. Due to the production quality demands in the large pharmaceutical plants, the differences between the genuine items are rather small. Nevertheless, we consider this investigation as a feasibility study that yields promising results. For more trustworthy modeling it is necessary to collect a representative set of genuine samples of the drug produced at different times, with different shelf life, etc. On the other hand, the diversity inside the counterfeit samples is essentially large. Sometimes the difference between the genuine and counterfeit drugs could be seen visually in the NIR spectra, but in other situations the answer is not so evident. To claim that a sample is a forgery, it is not necessary to compare the concentrations of active ingredients. All that is needed is to check whether a given sample is identical to the genuine drug or not. The above analysis shows that the NIR approach together with PCA has good prospects and may efficiently substitute wet chemistry.
Chemometric functions in Excel
Chemometrics uses a very large variety of software, special chemometric and general mathematical packages or various environments as Matlab or VBA. As a result, to make first steps for a student or analyst it is necessary to obtain some special software and to acquaint with it. To make the chemometric start quick and easy we propose to design the basic projection methods as worksheet functions in Excel, a most widely-spread data handling environment. In this case all calculations are carried out in the open Excel books. Moreover all regular Excel capacities can be applied for additional calculations, graphical presentations, export and import of data and results, customizing individual templates, etc. Excel 2007 gives additional incentive to this idea as now very large arrays (1,048,576 rows by 16,384 columns) can be input and processed directly in the worksheets. We have designed the core functions for the PCA/PLS decompositions and ensured that calculations are performed very quickly even for rather large data sets (200 samples by 4500 variables). These functions are programmed in C++ language and linked to Excel as an Add-In tool named Chemometrics.
We designed "Chemometrics" as an Add-In procedure for Excel. This add-in file is opened by a Tools/Add-Ins menu command. After that, main projection functions can be applied as ordinary user-defined functions in Excel.
All calculations are carried out in the open Excel books.
All results are also output as tables on the worksheets.
All calculations are made "on the fly".
As soon as a user changes any cell in the input data, output data are recalculated automatically (if "automatic" option is switched on in a Tools/Options/Calculation menu).

Fig 9: Common worksheet layout for application of Chemometrics Add-In
List of user-defined functions
PCA Decomposition
ScoresPCA (X, PC, CentWeight, Xnew )
LoadingsPCA (X, PC, CentWeight )
PLS Decomposition
ScoresPLS (X, Y, PC, CentWeightX, CentWeightY , Xnew)
UScoresPLS (X, Y, PC, CentWeightX, CentWeightY, Xnew, Ynew )
LoadingsPLS (X, Y, PC, CentWeightX, CentWeightY )
WLoadingsPLS (X, Y, PC, CentWeightX, CentWeightY )
QLoadingsPLS (X, Y, PC, CentWeightX, CentWeightY )
PLS2 Decomposition
ScoresPLS2 (X, Y, PC, CentWeightX, CentWeightY , Xnew)
UScoresPLS2 (X, Y, PC, CentWeightX, CentWeightY, Xnew, Ynew )
LoadingsPLS2 (X, Y, PC, CentWeightX, CentWeightY )
WLoadingsPLS2 (X, Y, PC, CentWeightX, CentWeightY )
QLoadingsPLS2 (X, Y, PC, CentWeightX, CentWeightY )
| Click the icon
|
Click here to ask for the file by e-mail |
Click here to know more about Chemometrics Add-In.
Click here to read about the project "Distance Learning Course in Chemometrics for Technological and Natural-Science Mastership Education"
Last update 06.12.09