Smarter Data Science. Cole Stryker

Читать онлайн.
Название Smarter Data Science
Автор произведения Cole Stryker
Жанр Базы данных
Серия
Издательство Базы данных
Год выпуска 0
isbn 9781119693420



Скачать книгу

features to include (feature engineering) or which features to exclude (feature selection), this chapter will help you determine which features you'll need for the models that you develop. You'll also learn about the importance of organizing data and the purpose of democratizing data.

      The most advanced algorithms cannot overcome a lack of data. Organizations that seek to prosper from AI by acting upon its revelations must have access to sufficient and relevant data. But even if an organization possesses the data it requires, the organization does not automatically become data-driven. A data-driven organization must be able to place trust in the data that goes into an AI model, as well as trust the concluding data from the AI model. The organization then needs to act on that data rather than on intuition, prior experience, or longstanding business policies.

      Practitioners often communicate something like the following sentiment:

       [O]rganizations don't have the historical data required for the algorithms to extract patterns for robust predictions. For example, they'll bring us in to build a predictive maintenance solution for them, and then we'll find out that there are very few, if any, recorded failures. They expect AI to predict when there will be a failure, even though there are no examples to learn from.

       From “Reshaping Business with Artificial Intelligence: Closing the Gap Between Ambition and Action” by Sam Ransbotham, David Kiron, Philipp Gerbert, and Martin Reeves, September 06, 2017 ( sloanreview.mit.edu/projects/reshaping-business-with-artificial-intelligence )

      Even if an organization has a defined problem that could be solved by applying machine learning or deep learning algorithms, an absence of data can result in a negative experience if a model cannot be adequately trained. AI works through hidden neural layers without applying deterministic rules. Special attention needs to be paid as to how to trace the decision-making process in order to provide fairness and transparency with organizational and legal policies.

      An issue arises as to how to know when it is appropriate to be data-driven. For many organizations, loose terms such as a system of record are qualitative signals that the data should be safe to use. In the absence of being able to apply a singular rule to grade data, other approaches must be considered. The primary interrogatives constitute a reasonable starting point to help gain insight for controlling all risk-based decisions associated with being a data-driven organization.

      Using Interrogatives to Gain Insight

      In Rudyard Kipling's 1902 book Just So Stories, the story of “The Elephant's Child” contains a poem that begins like this:

       I keep six honest serving-men: (They taught me all I knew)

       Their names are What and Where and When and How and Why and Who.

      Kipling had codified the six primitive interrogatives of the English language. Collectively, these six words of inquiry—what, where, when, how, why, and who—can be regarded as a means to gain holistic insight into a given topic. It is why Kipling tells us, “They taught me all I knew.”

      The interrogatives became a foundational aspect of John Zachman's seminal 1987 and 1992 papers: “A Framework for Information Systems Architecture” and “Extending and Formalizing the Framework for Information Systems Architecture.” Zachman correlated the interrogatives to a series of basic concepts that are of interest to an organization. While the actual sequence in which the interrogatives are presented is inconsequential and no one interrogative is more or less important than any of the others, Zachman typically used the following sequence: what, how, where, who, when, why.

       What: The data or information the organization produces

       How: A process or a function

       Where: A location or communication network

       Who: A role played by a person or computational agent

       When: A point in time, potentially associated with triggers that are fired or signals that are raised

       Why: A goal or subgoal revealing motivation

      NOTE

       Zachman's article “A Framework for Information Systems Architecture” can be found at ieeexplore.ieee.org/document/5387671. “Extending and Formalizing the Framework for Information Systems Architecture” is available at ieeexplore.ieee.org/document/5387433.

      By using Zachman's basic concepts of the six interrogatives, an organization can begin to understand or express how much the organization knows about something in order to infer a degree of trust and to help foster data-driven processes.

      The Trust Matrix

      To help visually grasp how the holistic nature of the six interrogatives can assist in trust and becoming data-driven, the interrogatives can be mapped to a trust matrix (shown in Figure 2-1) as the x-axis. The y-axis reflects the time horizons: past, present, and future.

Schematic illustration of the trust matrix.

      Figure 2-1: Trust matrix

      The past represents something that has occurred. The past is a history and can inform as to what happened, what was built, what was bought, what was collected (in terms of money), and so on. The present is about the now and can inform us as to things that are underway or in motion. The present addresses what is happening, what is being built, who is buying, etc. The future is about things to be. We can prepare for the future by planning or forecasting. We can budget, and we can predict.

      Revealing the past can yield hindsight, present insight, and future foresight. The spectrum across the time horizons provides the viewpoints for what happened, is happening, and could/will happen. While the divisions are straightforward, the concept of the present can actually span the past and the present. Consider, “this year.” This year is part of the present, but the days gone are also part of the past, and the days to come are also part of the future. Normally, the context of inquiry can help to remove any untoward temporal complications.

Schematic illustration of the breadth and depth slivers.

      Figure 2-2: Breadth and depth slivers

      Conversely, depth is a reflection of detail. The topic of ethnography is addressed here. For example, a person may purchase a product, and if that product is gifted to