Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs

Читать онлайн.
Название Multiblock Data Fusion in Statistics and Machine Learning
Автор произведения Tormod Næs
Жанр Химия
Серия
Издательство Химия
Год выпуска 0
isbn 9781119600992



Скачать книгу

and X3 are represented with proper I × I representation matrices whichare subsequently analysed simultaneously with an IDIOMIX model generating scores and loadings. Source: Smilde et al. (2020). Reproduced with permission of John Wiley and Sons.

      Figure 9.3 True design used in mixture preparation (blue) versus the columnsof associated factor matrix corresponding to the mixture mode extracted by the BIBFA model (red) and the ACMTF model (red). Source: Acar et al. (2015). Reproduced with permission from IEEE.

      Figure 9.4 Cross-validation results for the penalty parameter λbin of themutation block (left) and for the drug response, transcriptome,and methylation blocks (λquan, right) in the PESCA model.More explanation, see text. Adapted from Song et al. (2019).

      Figure 9.5 Explained variances of the PESCA (a) and MOFA (b) model on the CCL data. From top to bottom: drug response, methylation, transcriptome,and mutation data. The values are percentages of explained variation. More explanation, see text. Adapted from Song et al. (2019).

      Figure 9.6 From multiblock data to three-way data.

      Figure 9.7 Decision tree for selecting an unsupervised method. For abbreviations,see the legend of Table 9.1. The furthest left leaf is empty but alsoCD methods can be used in that case. For more explanation, see text.

      Figure 10.1 Results from multiblock redundancy analysis of theWine data, showing Y scores (ur) and block-wiseweights for each of the four input blocks (A, B, C, D).

      Figure 10.2 Pie chart of the sources of contribution to thetotal variance (arbitrary sector sizes for illustration).

      Figure 10.3 Flow chart for the NI-SL method.

      Figure 10.4 An illustration of SO-N-PLS, modelling a responseusing a two-way matrix, X1, and a three-way array, X2

      Figure 10.5 Path diagram for a wine tasting study. The blocks repre-sent the different stages of a wine tasting experiment andthe arrows indicate how the blocks are linked. Source: (Næs et al., 2020). Reproduced with permission from Wiley.

      Figure 10.6 Wine data. PCP plots for prediction of block D from blocks A, B, andC. Scores and loadings from PCA on the predicted y-values on top.The loadings from projecting the orthogonalised X-blocks (exceptthe first which is used as is) onto the scores at the bottom. Source:Romano et al. (2019). Reproduced with permission from Wiley & Sons.

      Figure 10.7 An illustration of the multigroup setup, wherevariables are shared among X blocks and relatedto responses, Y, also sharing their own variables.

      Figure 10.8 Decision tree for selecting a supervisedmethod. For more explanation, see text.

      Figure 11.1 Output from use of scoreplot() on a pca object.

      Figure 11.3 Output from use of scoreplot(pot.sca,labels = ”names”) (SCA scores in 2 dimensions).

      Figure 11.4 Output from use of loadingplot(pot.sca,block = ”Sensory”, labels = ”names”) (SCA loadings in 2 dimensions).

      Figure 11.5 Output from use of plot(can.statis$statis) (STATIS summary plot).

      Figure 11.6 Output from use of scoreplot() (ASCA scores in 2 dimensions).

      Figure 11.7 Output from use of scoreplot() (ASCA scores in 1 dimension).

      Figure 11.8 Output from use of loadingplot() (ASCA scores in 2 dimensions).

      Figure 11.9 Output from use of scoreplot() (block-scores).

      Figure 11.10 Output from use of loadingplot() (block-loadings).

      Figure 11.11 Output from use of scoreplot() andloadingweightplot() on an object from sMB-PLS.

      Figure 11.12 Output from use of maage().

      Figure 11.13 Output from use of maageSeq().

      Figure 11.14 Output from use of loadingplot() on an sopls object.

      Figure 11.15 Output from use of scoreplot() on an sopls object.

      Figure 11.16 Output from use of scoreplot() on a pcp object.

      Figure 11.17 Output from use of plot() on a cvanova object.

      Figure 11.18 Output from use of scoreplot() on a popls object.

      Figure 11.19 Output from use of loadingplot() on a popls object.

      Figure 11.20 Output from use of loadingplot() on a rosa object.

      Figure 11.21 Output from use of scoreplot() on a rosa object.

      Figure 11.22 Output from use of image() on a rosa object.

      Figure 11.23 Output from use of image() withparameter ”residual” on a rosa object.

      Figure 11.24 Output from use of scoreplot() on an mbrda object.

      Figure 11.25 Output from use of plot() on an lpls object.Correlation loadings from blocks are coloured andoverlaid each other to visualise relations across blocks.

       Table 1.1 Overview of methods. Legend: U = unsupervised, S = supervised, C = complex, HOM = homogeneous data, HET = heterogeneous data, SEQ = sequential, SIM = simultaneous, MOD = model-based, ALG = algorithm-based, C = common, CD = common/distinct, CLD = common/local/distinct, LS = least squares, ML = maximum likelihood, ED =eigendecomposition, MC = maximising correlations/covariances. For abbreviations of the methods, see Section 1.11

       Table 1.2 Abbreviations of the different methods.

       Table 2.1 Formal treatment of types of data scales. The first column refersto the scale-type. The second column gives examples of suchscale-types. The third column defines the scale-type in termsof permissible transformations (see text). Finally, the fourthcolumn gives the permissible statistics for the types of scales.

       Table 2.2 Different methods for fusing two data blocks, indicating the properties in terms of explained variation within and between the blocks. Thelast two columns refer to whether the methods favour explaining within- or between-block variation. For more explanation, see text.

       Table 2.3 The matrices of which the weights w are eigenvectorsin its original form and using the SVDs of X and Y.

       Table 4.1 Overview of the data sets used in the genomics example.

       Table 5.1 Overview of methods. Legend: U=unsupervised,S=supervised, C=complex, HOM=homogeneous data,HET=heterogeneous data, SEQ=sequential, SIM=simultaneous, MOD=model-based, ALG= algorithm-based, C=common,CD=common/distinct, CLD=common/local/distinct, LS=least squares, ML=maximum likelihood, ED=eigendecomposition,MC=maximising correlations/covariances. Forabbreviations of the methods, see Section 1.11.

       Table 5.2 Different types of SCA, where Dm is a diagonal matrixand Φ is a positive definite matrix (see Section 2.8). The correlations and variances pertain to the block-scores (see text).

       Table 5.3 Proportions of explained variance per component (C1, C2,…)and total in each of the blocks for the two different methods. Legend: conc is the abbreviation of concatenated; yellow is distinct for TIV; red is distinct for LAIV; green is common (see