Practical Data Analysis with JMP, Third Edition. Robert Carver

Читать онлайн.
Название Practical Data Analysis with JMP, Third Edition
Автор произведения Robert Carver
Жанр Программы
Серия
Издательство Программы
Год выпуска 0
isbn 9781642956122



Скачать книгу

set of data, we will summarize life expectancy in different parts of the world. Don’t worry about the details of these steps. The goal right now is just for you to see a typical JMP platform and its output.

      Windows users: the next instruction asks you to select an option from the Analyze menu, but there is no visible menu bar in the Graph Builder window. At the top of the window, just above Graph Builder, find the gray horizontal bar with three dots. (See Figure 1.7.) Hover over the bar and the menus will appear.

      1. Select Analyze ► Fit Y by X. This analysis platform lets us plot one variable (life expectancy) versus another (region).

      Why “fit” Y by X? Analysts often speak of fitting an abstract or theoretical model to a set of data. We can think of models as common or standard patterns of variation, and the process of model fitting begins with exploring how a Y column varies across categories or values of an X column.

      Figure 1.8: Fit Y by X Dialog Box

Figure 1.1 Some JMP Help Options

      By design, the initial output of a JMP analysis platform includes one or more graphs. In this case, the initial report includes only a graph, as shown in Figure 1.9.

      Figure 1.9: Initial Report of Life Expectancy by Region

Figure 1.1 Some JMP Help Options

      We saw a very similar graph earlier in Graph Builder; in this graph, the points are not initially jittered. There are two additional features in this graph: the horizontal line between 70 and 75 years is the mean (average) of the values. Also, in the lower left we learn that there are 18 Missing Rows, which just says that 18 countries did not report life expectancy data.

      The overall grand mean of all countries does not really describe any of the continental regions. We might want to dig deeper and display the regional averages.

      3. Click the red triangular hotspot in the upper left next to Oneway Analysis, and choose Display Options, and check Connect Means. If you wish, invoke the Points Jittered option as well.

      Look again at the modified graph. The new blue line on your graph represents the mean life expectancy of the countries in each region. As a group, the nations of North America appear to have the longest life expectancies, whereas countries in South Asia and Sub-Saharan Africa have far shorter life expectancies. The visual comparison of means is revealing, but suppose we want to know the numerical values of the seven averages.

      4. Click the red triangle once more, and this time choose Means and Std Dev (standard deviations).

      This will generate a table of values beneath the graph, as shown in Figure 1.10. For the current discussion, we will focus our attention only on the first three columns. Later in the book, we will learn the meaning of the other columns. This table (below) reports the mean and number of countries for each region.

      Figure 1.10: Table of Means and Standard Deviations

Figure 1.1 Some JMP Help Options

      Our data table contains 1,075 cells: five variables with 215 observations each, arrayed in five columns and 215 rows. One guiding principle in statistical analysis is that we generally want to use all of our data. We do not casually discard or omit any portion of the data that we have collected (often at substantial effort or expense). There are times, however, that we might want to focus attention on a portion of the data table or examine the impact of a small number of extraordinary observations.

      By default, when we analyze one or more variables using JMP, every observation is included in the resulting graphs and computations. You can use row states to confine the analysis to particular observations or to highlight certain observations in graphs.

      There are four fundamental row states in JMP. Rows can be:

      ● Selected: selected rows appear bolded or otherwise highlighted in a graph.

      ● Excluded: when you exclude rows, those observations are temporarily omitted from calculated statistics such as the mean. The rows remain in the data table, but as long as they are excluded, they play no role in any computations.

      ● Hidden: when you hide rows, those observations do not appear in graphs, but are included in any calculations such as the mean.

      Let’s see how the row states change the output that we have already run by altering the row states of rows 3 and 4.

      1. First, arrange the open windows so that you can clearly see both the Fit Y by X report window and the data table and click anywhere in the data table window to make it the active window.

      2. Move your cursor into the column of row numbers in the data table. Within this column your cursor will become a “fat cross” Figure 1.1 Some JMP Help Options. Select rows 3 and 4 by clicking and dragging on the row numbers 3 and 4. You will see the two rows highlighted within the data table.

      Look at your graph. Almost all the points are dim, except for two bright dots—one above the mean value of Latin America & Caribbean and the other well below the mean of South Asia. That is the effect of selecting these rows. Notice also that the Rows panel in the Life Expectancy data window now shows that two rows have been selected.

      3. Click on another row, and then drag your mouse slowly down the column of row numbers. Do you notice the rows highlighted in the table and the corresponding data points “lighting up” in the graph?

      4. Press Esc or click in the triangular area above the row numbers in the data table to deselect all rows.

      Next, we will exclude two observations and show that the calculated statistics change when they are omitted from the computations. To see the effect, we first need to instruct JMP to automatically recalculate statistics when the data table changes.

      5. Click the red triangle next to Oneway Analysis in the report window and choose Redo ► Automatic Recalc.

      6. Now let’s exclude rows 3 and 4 from the calculations. To do this, first select them as you did before.

      7. Select Rows ► Exclude/Unexclude (you can also find this option by clicking the red triangle above the row numbers; in Windows, you could also right-click). This will exclude the rows.

      Now look at the analysis output. The number of observations in the Latin America & Caribbean region drops from 36 to 35, and the mean value for that group has changed very slightly. Likewise, in South Asia we have 7 rather than 8 observations, and mean life expectancy increased from 70.82375 years to 71.791857 years. Toggle between the exclude and unexclude states of these two rows until you understand clearly what happens when you exclude observations.

      8. Finally, let’s hide the rows. First, be sure to unexclude rows 3 and 4 so that all points appear in the graph and in the calculations. If you are not sure if you have reversed earlier actions, choose Rows ► Clear Row States and then confirm in the Rows panel that the four row state categories show 0 rows.

      9. Once again select rows 3 and 4 and choose Rows ► Hide/Unhide. This will hide the rows (check out the very cool dark glasses icon).

      Look closely at the graph and at the table of means. It is a subtle change, but the two bright dots are gone, leaving only dim points. The numbers in the table of means are unaffected by hiding points. If you toggle the Hide/Unhide state, you will notice the dark points come and go, but the number of observations in each region is stable.