Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs

Читать онлайн.
Название Multiblock Data Fusion in Statistics and Machine Learning
Автор произведения Tormod Næs
Жанр Химия
Серия
Издательство Химия
Год выпуска 0
isbn 9781119600992



Скачать книгу

Green arrows are variables of Xm; red arrow is the consensus component t; bluearrow is the common component tm. Dotted lines represent projections.

      Figure 5.7 The logistic function η(θ) = (1 + exp(−θ))−1 visualised. Only thepart between [4,4] is shown but the function goes from −∞ to +.

      Figure 5.8 CNA data visualised. Legend: (a) each line is a sample (cell line),blanks are zeros and black dots are ones; (b) the proportionof ones per variable illustrating the unbalancedness. Source:Song et al. (2021). Reproduced with permission of Elsevier.

      Figure 5.9 Score plot of the CNA data. Legend: (a) scores of a logisticPCA on CNA; (b) consensus scores of the first two GSCA components of a GSCA model (MITF is a special gene). Source: Smilde et al. (2020). Licensed under CC BY 4.0.

      Figure 5.10 Plots for selecting numbers of components for the sensory example. (a) SCA: the curve represents cumulative explained variance for the concatenated data blocks. The bars show how much variance each component explains in the individual blocks. (b) DISCO: each point represents the non-congruence value for a given target (model).The plot includes all possible combinations of common and distinct components based on a total rank of three. The horizontal axis represents the number of common components and the numbers inthe plot represent the number of distinct components for SMELLand TASTE, respectively. (c) PCA-GCA: black dots representthe canonical correlation coefficients between the PCA scoresof the two blocks (x100) and the bars show how much variancethe canonical components explain in each block. Source: Smilde et al. (2017). Reproduced with permission of John Wiley and Sons.

      Figure 5.11 Biplots from PCA-GCA, showing the variables as vectors and the samples as points. The samples are labelled according to the design factors flavour type (A/B), sugar level (40,60,80) and flavour dose (2,5,8). The plots show the common component (horizontal) againstthe first distinct component for each of the two blocks. Source: Smilde et al. (2017). Reproduced with permission of John Wiley and Sons.

      Figure 5.12 Amount of explained variation in the SCA model (a) and PCA models (b) of the medical biology metabolomics example. Source: Smilde et al. (2017). Reproduced with permission of John Wiley and Sons.

      Figure 5.13 Amount of explained variation in the DISCO and PCA-GCAmodel. Legend: C-ALO is common across all blocks; C-ALis local between block A and L; D-A, D-O, D-L are distinctin the A, O and L blocks, respectively. Source: Smilde et al. (2017). Reproduced with permission of John Wiley andSons.

      Figure 5.14 Scores (upper part) and loadings (lower part) of the com-mon DISCO component. Source: Smilde et al. (2017).Reproduced with permission of John Wiley and Sons.

      Figure 5.16 True design used in mixture preparation (blue) versus thecolumns of the associated factor matrix corresponding to themixtures mode extracted by the JIVE model (red). Source:Acar et al. (2015). Reproduced with permission of IEEE.

      Figure 5.17 True design used in mixture preparation (blue) versus thecolumns of the associated factor matrix corresponding tothe mixtures mode extracted by the ACMTF model (red).Source: Acar et al. (2015). Reproduced with permission of IEEE.

      Figure 5.18 Example of the properties of group-wise penalties. Left panel: the family of group-wise L-penalties. Right panel: the GDP penalties.The x-axis shows the L2 norm of the original group of elementsto be penalised; the y-axis shows the value of this norm afterapplying the penalty. More explanation, see text. Source: Song et al. (2021). Reproduced with permission of John Wiley and Sons.

      Figure 5.19 Quantification of modes and block-association rules.The matrix V ‘glues together’ the quantifications T and P using the function f = (T, P, V) to approximate X.

      Figure 5.20 Linking the blocks through their quantifications.

      Figure 5.21 Decision tree for selecting an unsupervised method forthe shared variable mode case. For abbreviations, seethe legend of Table 5.1. For more explanation, see text.

      Figure 5.22 Decision tree for selecting an unsupervisedmethod for the shared sample mode case.For abbreviations, see the legend ofTable 5.1. For more explanation, see text.

      Figure 6.1 ASCA decomposition for two metabolites. Thebreak-up of the original data into factor estimatesdue to the factors Time and Treatment is shown1.

      Figure 6.2 A part of the ASCA decomposition. Similarto Figure 6.1 but now for 11 metabolites.

      Figure 6.3 The ASCA scores on the factor light in the plant example (panel (a); expressed in terms of increasing amount of light) and the corresponding loading for the first ASCA component (panel (b)).

      Figure 6.4 The ASCA scores on the factor time in the plant example (panel (a)) andthe corresponding loading for the first ASCA component (panel (b)).

      Figure 6.5 The ASCA scores on the interaction between light and timein the plant example (panel (a)) and the correspondingloading for the first ASCA component (panel (b)).

      Figure 6.6 PCA on toxicology data. Source: Jansen et al. (2008).Reproduced with permission of John Wiley and Sons. 174 1 We thank Frans van der Kloet for making these figures.

      Figure 6.7 ASCA on toxicology data. Component 1: left;component 2: right. Source: Jansen et al. (2008).Reproduced with permission of John Wiley and Sons.

      Figure 6.9 Permutation example. Panel (a): null-distribution for the first case withan effect (with size indicated with red vertical line). Panel (b): the dataof the case with an effect. Panel (c): the null-distribution of the casewithout an effect and the size (red vertical line). Panel (d): the data of thecase with no effect. Source: Vis et al. (2007). Licensed under CC BY 2.0.

      Figure 6.10 Permutation test for the factor light (panel (a)) and inter-action between light and time (panel (b)). Legend: blue isthe null-distribution and effect size is indicated by a redvertical arrow. SSQ is the abbreviation of sum-of-squares.

      Figure 6.11 ASCA candy scores from candy experiment. The plot to theleft is based on the ellipses from the residual approach inFriendly et al. (2013). The plot to the right is based on themethod suggested in Liland et al. (2018). Source: Liland et al. (2018). Reproduced with permission of John Wiley and Sons.

      Figure 6.12 ASCA assessor scores from candy experiment. The plot tothe left is based on the ellipses from the residual approachin Friendly et al. (2013). The plot to the right is based on themethod suggested in Liland et al. (2018). Source: Liland et al. (2018). Reproduced with permission of John Wiley and Sons.

      Figure 6.13 ASCA assessor and candy loadings from the candy experiment. Source:Liland et al. (2018). Reproduced with permission of John Wiley and Sons.

      Figure 6.14 PE-ASCA of the NMR metabolomics of pig brains. Stars inthe score plots are the factor estimates and circles are theback-projected individual measurements (Zwanenburg et al., 2011). Source: Alinaghi et al. (2020). Licensed under CC BY 4.0.

      Figure 6.15 Tree for selecting an ASCA-based method. For abbrevi-ations, see the legend of Table 6.1; BAL=Balanced data,UNB=Unbalanced data. For more explanation, see text.

      Figure 7.1 Conceptual illustration of the