Modern Big Data Architectures. Dominik Ryzko

Читать онлайн.
Название Modern Big Data Architectures
Автор произведения Dominik Ryzko
Жанр Программы
Серия
Издательство Программы
Год выпуска 0
isbn 9781119597933



Скачать книгу

solving some of its challenges, the idea never became widely adopted.

      Finally, a few years back, I was able to formulate a more concrete conclusion, which can be used as a working thesis for this book – mainstream computer science is on a convergence path with multi-agent paradigms. Or to be more specific: the fundamental building blocks of modern information systems have been gaining the properties of those attributed to agents in MAS and thus the whole system has become more adaptive, autonomous, and intelligent. I decided to devote some time to studying these analogies, by comparing the fundamental assumptions and paradigms as well as by looking at the applications of MAS in solving various problems in the big data area. This book summarizes this research by taking a journey through modern big data architectures viewed through the eyes of the MAS domain.

      I hope the view taken in this book will be fresh and interesting and will inspire further critical thinking about the evolution of contemporary information systems and the direction they are heading.

      Dominik Ryżko

      Warsaw

      August 2019

      As the work on this book from the initial idea to its completion stretches over a period of a few years, it is not possible to mention all the people with whom I have discussed the ideas and the book itself during this period. However, a few of them have had significantly more influence on my thoughts and the final shape of the work.

      Most of all I want to thank my family for supporting me and accepting the effort and time needed for such endeavor. I want to thank my supervisors and directors at the Institute of Computer Science, Warsaw University of Technology, Professors Marzena Kryszkiewicz, Henryk Rybiński, Mieczysław Muraszkiewicz, and Jarosław Arabas for encouraging me to pick up this project and coming up with valuable advice. Special thanks go to my friend and colleague Bartłomiej Trwardowski with whom I have spent numerous hours exchanging thoughts and ideas on various scientific topics and who was kind enough to provide feedback on an early draft. Last by not least I thank my past and future students, who are among the main recipients of this work. Your open and curious minds were a big motivator to make this book insightful, covering the most important ideas but also focusing on practical topics. I hope you will find it this way.

      1.1 Motivation

      In recent years, big data has emerged as one of the leading trends not only in computer science, but due to its potential, also in economy, science, and major branches of the industry. People realized that huge data sets have become a key asset which should be taken into account in evaluating business opportunities, company valuations, or product development. Several major mergers and acquisitions in recent years have been driven not only in order to gain synergies, customer base, or market access, but also to obtain access to valuable customer data. For example, Microsoft's acquisition of Linkedin gave it data on jobs, skills, career paths, and a contact network of millions of workers across the globe.

      For technology vendors, consultancies as well as numerous startups, this rapid growth opened up huge new business opportunities. According to IDC, the market value of big data and business analytics is expected to grow beyond $200 Billion by the year 2020. Forbes [2017]. These forecasts have fueled huge investments in big data related research and development efforts, both in academia and in industry, leading to a wide range of proposed architectures, solutions, models, algorithms, as well as commercial products.

      Large industry players have made the big data concept fundamental to their products, architectures, and strategies. Every day, new ventures emerge which concentrate solely on big data as an opportunity for innovation and growth. Those who failed to follow the trend early see the rising competition and disruption, even in well established and heavily regulated industries such as banking or insurance, as can be observed by the growing number of fintech and insurtech ventures.

      Academia has been intensively updating curricula to educate the next generation of data scientists, big data engineers, DevOps, etc. The research areas and goals of computer science departments have followed suit. New dedicated MOOCs (Massive Online Open Courses) become available every month and gather thousands of attendants. The number of conference tracks and entirely new events around the subjects of analytics and processing of big data is growing rapidly each year.

      Multi-Agent Systems (MAS) use the concept of the agent as a central entity for building systems. This is often confusing as the term is heavily overloaded even within computer science, not to mention its use in multiple other disciplines such as economy, sociology, cognitive science, etc. MAS however iterates specifically the properties an agent should implement. It should be autonomous, understood as making its own decision based on internal state, goals, and observations. An agent should be proactive, so it should act when it believes it is appropriate not only when explicitly called. Finally, it should be intelligent in the AI sense of intelligence, therefore capable of solving complex tasks and learning by past experiences. Building on such components, MAS tries to assemble complex systems in which agents communicate asynchronously and collaboratively solve given tasks.

      Even though MAS emerged as a separate field of research much earlier than big data, it failed to achieve such wide adoption and popularity. We can identify several reasons for this. One is that, until recently, there were no advanced and mature architectures for efficient distributed asynchronous processing. Only in the last decade the limitations to Moore's Law increased the efforts towards parallel computations. Another reason is the radical approach to the distribution of control in MAS. Agents were proposed as highly independent, autonomous, proactive entities communicating with the use of “soft” protocols, e.g. negotiation, argumentation etc. These assumptions were not in line with available means for monitoring of such systems, and so were unacceptable for several practical industry applications, where strict control and risk minimization are key, e.g. energy grid management, financial systems, traffic monitoring, etc.

      It seems we have arrived at the point where several research results achieved in both fields can be combined and benefit from cross-fertilization of ideas, tools, and architectures. Mobile agents for sensor networks can be applied for real time analytics in the fast growing area of the