Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs. Читать онлайн. Mreadz. MREADZ.COM

Название	Multiblock Data Fusion in Statistics and Machine Learning
Автор произведения	Tormod Næs
Жанр	Химия
Серия
Издательство	Химия
Год выпуска	0
isbn	9781119600992

Скачать книгу

decompositions (ED) or maximising covariance or correlations (MC).

The first item (A) is used to organise the different chapters. Some methods can deal with data of different measurements scales (heterogeneous data) and some methods can only handle homogeneous data. The difference between the simultaneous and sequential method is explained in more detail in Chapter 2. Some methods are defined by a clear model and some methods are based on an algorithm. The already discussed topic of common and distinct variation is also a distinguishing and important feature of the methods and the sections in some of the chapters are organised according to this principle. Finally, there are different ways of estimating the parameters (weights, scores, loadings, etc.) of the multiblock models. This is also explained in more detail in Chapter 2.

Table 1.1 is an example of such a table for Chapter 6. This table presents a birds-eye view of the properties of the methods. Each chapter discussing methods will start with this table to set the scene. We will end most chapters with some recommendations for practitioners on what method to use in which situation.

Table 1.1 Overview of methods. Legend: U = unsupervised, S = supervised, C = complex, HOM = homogeneous data, HET = heterogeneous data, SEQ = sequential, SIM = simultaneous, MOD = model-based, ALG = algorithm-based, C = common, CD = common/distinct, CLD = common/local/distinct, LS = least squares, ML = maximum likelihood, ED =eigendecomposition, MC = maximising correlations/covariances. For abbreviations of the methods, see Section 1.11

		A	B	C	D	E	F
	Section	U	S	C	HOM	HET	SEQ	SIM	MOD	ALG	C	CD	CLD	LS	ML	ED	MC
ASCA	6.1
ASCA+	6.1.3
LiMM-PCA	6.1.3
MSCA	6.2
PE-ASCA	6.3

1.10 Notation and Terminology

Throughout this book, we will make use of the following generic notation. When needed, extra notation is explained in local paragraphs. For notational ease, we will not make a distinction between population and estimated weights, scores and loadings which is the tradition in chemometrics and data analysis. For regression equations, when natural we do make that distinction and there we will use the symbol b^ or y^ for the estimated parameters or fitted values.

x	a scalar
x	column vector: bold lowercase
X	matrix: bold uppercase
Xt	transpose of X
X_	three-way array: bold uppercase underlined
m = 1,…, M	index for block
im = 1,…, Im	index for first way (e.g., sample) in block m (not shared first way)
i = 1,…, I	index for first shared way of blocks
jm = 1,…, Jm	index for second way (e.g., variable) in block m (not shared second way)
j = 1,…, J	index for second shared way of blocks
r = 1,…, R	index for latent variables/principal components
R	matrix used to compute scores for PLS
Xm	block m
xmi	i-th row of Xm (a column vector)
xmj	j-th column of Xm (a column vector)
W	matrix of weights
IL	identity matrix of size L×L
T	score matrix
P	loading matrix
E,F	matrices of residuals
1L	column vector of ones of length L
diag(D)	column vector containing the diagonal of D
⊗	Kronecker product
⊙	Khatri–Rao product (column-wise Kronecker product)
*	Hadamard or element-wise product
⊕	Direct sum of spaces

When

Скачать книгу

Multiblock Data Fusion in Statistics and Machine Learning. Tormod Næs

Информация о произведении:

1.10 Notation and Terminology