Rank-Based Methods for Shrinkage and Selection. A. K. Md. Ehsanes Saleh

Читать онлайн.
Название Rank-Based Methods for Shrinkage and Selection
Автор произведения A. K. Md. Ehsanes Saleh
Жанр Математика
Серия
Издательство Математика
Год выпуска 0
isbn 9781119625421



Скачать книгу

12.13 MNIST training with 90 outliers.

       12.14 MNIST training with 180 outliers.

       12.15 MNIST training with 270 outliers.

      It is my pleasure to write this foreword for Professor Saleh’s latest book, “Rank-based Methods for Shrinkage and Selection with Application to Machine Learning”. I have known Professor Saleh for many decades as a leader in Canadian statistics and looked forward to meeting him regularly at the Annual Meeting of the Statistical Society of Canada.

      We are well into the golden age of probability and statistics with the emergence of data science and machine learning. Many decades ago, we could attract many bright students to this field of endeavor but today the interest is overwhelming. The connection between theoretical statistics and applied statistics is an important part of machine learning and data science. In order to engage fully in data science, one needs a solid understanding of both the theoretical and the practical aspects of probability and statistics.

      The book is unique in presenting a comprehensive approach to inference in regression models based on ranks. It starts with the basics, which enables rapid understanding of the innovative ideas in the rest of the book. In addition to the more familiar aspects of rank-based methods such as comparisons among groups, linear regression, and time series, the authors show how many machine-learning tools can be made more robust via rank-based methods. Modern approaches to model selection, logistic regression, neural networks, elastic net, and penalized regression are studied through this lens. The work is presented clearly and concisely, and highlights many areas for further investigation.

      The authors have identified many areas of useful future research that could be pursued by graduate students and practitioners alike. In this regard, this book is an important contribution in the ongoing research towards robust data science.

       Professor N. Reid

       University of Toronto

      June 2021

      The objective of this book is to introduce the audience to the theory and application of robust statistical methodologies using rank-based methods. We present a number of new ideas and research directions in machine learning and statistical analysis that the reader can and should pursue in the future. We begin by noting that the well-known least squares and likelihood principles are traditional methods of estimation in machine learning and data science. One of the most widely read books is the Introduction to Statistical Learning (James et al., 2013) which describes these and other methods. However, it also properly identifies many of their shortcomings, especially in terms of robustness in the presence of outliers. Our book describes a number of novel ideas and concepts to resolve these problems, many of which are worthy of further investigation. Our goal is to motivate the interest of more researchers to pursue further activities in this field. We build on this motivation to carry out a rigorous mathematical analysis of rank-based penalty estimators.

      Rank regression is based on the linear rank dispersion function described by Jaeckel (1972). The dispersion function replaces the least squares loss function to enable estimates based on the median rather than the mean. This book is intended to guide the reader in this direction starting with basic principles such as the importance of the median vs. the mean, comparisons of rank vs. least squares methods on simple linear problems, and the role of penalty functions in improving the accuracy of prediction. We present new practical methods of data cleaning, subset selection and shrinkage estimation in the context of rank-based methods. We then begin our theoretical journey starting with basic rank statistics for location and simple linear models, and then move on to multiple regression, ANOVA and problems in a high-dimensional setting. We conclude with new ideas not published elsewhere in the literature in the area of rank-based logistic regression and neural networks to address classification problems in machine learning and data science.

      We believe that most practitioners today are still employing least squares and log-likelihood methods that are not robust in the presence of outliers. This is due to the long history of these estimation methods in statistics and their natural adoption in the machine learning community over the past two decades. However, the history of estimation theory actually changed its course radically many decades prior when Stein (1956) and James and Stein (1961) proved that the sample mean based on a sample from a p-dimensional multivariate normal distribution is inadmissible under a quadratic loss function for p ≥ 3. This result gave birth to a class of shrinkage estimators in various forms and set-ups. Due to the immense impact of Stein’s theory, scores of technical papers appeared in the literature covering many areas of application. Beginning in the 1970s, the pioneering work of Saleh and Sen (1978, 1983, 1984b, a, 1985a, a, b, c, d, e, 1986, 1987) expanded the scope of this class of shrinkage estimators using the “quasi-empirical Bayes” method to obtain robust (such as R-, L-, and M-estimation) Stein-type estimators. Details are provided in Saleh (2006).

      Unlike the RR estimator, LASSO simultaneously selects and estimates variables. It is the reminiscent of “subset selection”. The subset selection rule is extremely variable due to its inherent discreteness (Breiman, 1996; Fan and Li, 2001). It is also highly variable and often trapped into a locally optimal solution rather than the globally optimal solution. LASSO is a continuous process and stable; however, it is not suggested to be used in multicollinear situations. Zou and Hastie (2005) proposed a compromised penalty function