Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs

Читать онлайн.
Название Multiblock Data Fusion in Statistics and Machine Learning
Автор произведения Tormod Næs
Жанр Химия
Серия
Издательство Химия
Год выпуска 0
isbn 9781119600992



Скачать книгу

information for three of the methods covered. The upperfigure illustrates that the two input blocks share some information (C1 and C2), but also have substantial distinct components andnoise (see Chapter 2), here contained in the X (as the darker blueand darker yellow). The lower three figures show how differentmethods handle the common information. For MB-PLS, no initial separation is attempted since the data blocks are concatenated before analysis starts. For SO-PLS, the common predictive informationis handled as part of the X1 block before the distinct part ofthe X2 block is modelled. The extra predictive information in X2 corresponds to the additional variability as will be discussed in the SO-PLS section. For PO-PLS, the common informationis explicitly separated from the distinct parts before regression.

      Figure 7.3 Cross-validated explained variance for various choices of number of components for single- and two-response modelling with MB-PLS.

      Figure 7.4 Super-weights (w) for the first and second componentfrom MB-PLS on Raman data predicting the PUFA sampleresponse. Block-splitting indicated by vertical dotted lines.

      Figure 7.5 Block-weights (wm) for first and second componentfrom MB-PLS on Raman data predicting the PUFAsampleresponse. Block-splitting indicated by vertical dotted lines.

      Figure 7.6 Block-scores (tm, for left, middle, and right Raman block,respectively) for first and second component from MB-PLS onRaman data predicting the PUFA sample response. Colours of thesamples indicate the PUFA concentration as % in fat (PUFAfat)and size indicates % in sample (PUFA sample). The two percentagesgiven in each axis label are cross-validated explained variancefor PUFA sample weighted by relative block contributions andcalibrated explained variance for the block (Xm), respectively.

      Figure 7.7 Classification by regression. A dummy matrix (here with threeclasses, c for class) is constructed according to which groupthe different objects belong to. Then this dummy matrix isrelated to the input blocks in the standard way described above.

      Figure 7.8 AUROC values of different classification tasks. Source: (Deng et al., 2020). Reproduced with permission from ACS Publications.

      Figure 7.9 Super-scores (called global scores here) and block-scores for thesparse MB-PLS model of the piglet metabolomics data. Source: (Karaman et al., 2015). Reproduced with permission from Springer.

      Figure 7.10 Linking structure of SO-PLS. Scores for both X1 and the orthogonalised version of X2 are combined in a standardLS regression model with Y as the dependent block.

      Figure 7.11 The SO-PLS iterates between PLS regression and orthogonalisation, deflating the input block and responses in every cycle. This isillustrated using three input blocks X1, X2, and X3. The upperfigure represents the first PLS regression of Y onto X1. Then the residuals from this step, obtained by orthogonalisation, goes tothe next (figure in the middle) where the same PLS procedure is repeated. The same continues for the last block X3 in the lower partof the figure. In each step, loadings, scores, and weights are available.

      Figure 7.13 Måge plot showing cross-validated explained variance for all combinations of components for the four input blocks (up tosix components in total) for the wine data (the digits for each combination correspond to the order A, B, C, D, as describedabove). The different combinations of components are visualisedby four numbers separated by a dot. The panel to the lower rightis a magnified view of the most important region (2, 3, and 4 components) for selecting the number of components. Coloured linesshow prediction ability (Q2, see cross-validation in Section 2.7.5)for the different input blocks, A, B, C, and D, used independently.

      Figure 7.14 PCP plots for wine data. The upper two plots are the score andloading plots for the predicted Y, the other three are the projectedinput X-variables from the blocks B, C, and D. Block A is not presentsince it is not needed for prediction. The sizes of the points for the Y scores follow the scale of the ‘overall quality’ (small to large) whilecolour follows the scale of ‘typical’ (blue, through green to yellow).

      Figure 7.15 Måge plot showing cross-validated explained variance forall combinations of components from the three blockswith a maximum of 10 components in total. The threecoloured lines indicate pure block models, and the insetis a magnified view around maximum explained variance.

      Figure 7.16 Block-wise scores (Tm) with 4+3+3 components for left, mid-dle, and right block, respectively (two first components foreach block shown). Dot sizes show the percentage PUFAin sample (small = 0%, large = 12%), while colour showsthe percentage PUFA in fat (see colour-bar on the left).

      Figure 7.17 Block-wise (projected) loadings with 4+3+3 components forleft, middle, and right block, respectively (two first for eachblock shown). Dotted vertical lines indicate transition betweenblocks. Note the larger noise level for components six and nine.

      Figure 7.18 Block-wise loadings from restricted SO-PLS model with 4+3+3 components for left, middle, and right block, respectively (two first for each block shown). Dotted vertical lines indicate transition between blocks.

      Figure 7.19 Måge plot for restricted SO-PLS showing cross-validatedexplained variance for all combinations of components fromthe three blocks with a maximum of 10 components in total.The three coloured lines indicate pure block models, and theinset is a magnified view around maximum explained variance.

      Figure 7.21 Loadings from Principal Components of Predictions appliedto the 5+4+0 component solutions of SO-PLS on Raman data.

      Figure 7.22 RMSEP for fish data with interactions. The standard SO-PLS procedureis used with the order of blocks described in the text. The three curves correspond to different numbers of components for the interaction part.The symbol * in the original figure (see reference) between the blocks isthe same interaction operator as described by the above. Source: (Næs et al., 2011b). Reproduced with permission from John Wiley and Sons.

      Figure 7.23 Regression coefficients for the interactions for the fish data with 4+2+2 components for blocks X1, X2 and the interaction block X3. Regression coefficients are obtained by back-transforming the components