Название | Introduction to Python Programming for Business and Social Science Applications |
---|---|
Автор произведения | Frederick Kaefer |
Жанр | Зарубежная деловая литература |
Серия | |
Издательство | Зарубежная деловая литература |
Год выпуска | 0 |
isbn | 9781544377452 |
Lessons learned: In this section, we learned how to write and execute Python code that we save in plain text files. Using files enables the storing and execution of many lines of code as a program that we can save and run later.
Package Managers
A package manager is a program to install libraries of code. These libraries, or packages, contain previously developed code. Once installed, the code found in the package is available to other Python code, saving a great deal of time and effort. Using a package not only prevents “reinventing the wheel” but also usually benefits from the prior development and testing by an entire community of developers. Python comes with a package manager named pip already installed (in Versions 3.4 and later). The Python Software Foundation, a nonprofit company, maintains documentation for pip, which is online at the website https://pip.pypa.io/en/stable/. We will be using pip to install several packages throughout this textbook.
Another way to set up a Python development environment is to install a Python distribution, such as the Anaconda distribution, found at the following URL: https://www.anaconda.com/download/. Python distributions are alternative bundles and are modified packages that include additional functionality. Alternative bundles may not include the latest versions of Python or other libraries and are not maintained by the core Python team (Python Software Foundation, 2019, “Alternative bundles”). We use the pip package manager to install individual packages in this textbook. Learning to install individual packages is an important skill for people programming in Python, which enables the use of packages developed for use within organizations and for packages that are not in any Python distributions.
Lessons learned: In this section, we learned how to use package managers for convenient organization and management of libraries of code. Learning to install individual packages is an important skill to take advantage of the Python packages for both business and social sciences purposes that are available to the Python programming community.
Data Sets Used Throughout the Book
We use two data sets throughout this book to illustrate numerous issues faced when working with data. The data sets are the City of Chicago’s Taxi Trips data set and data from the General Social Survey. We begin with some simple examples in the next chapter to become acquainted with the nature of the data in these data sets, and in later chapters, we work directly with files containing the data sets as well as retrieve the data directly from the World Wide Web. These data sets provide the basis for our coverage of topics later in the book, including statistical analysis, data visualization, and machine learning.
Taxi Trips Data Set
The Chicago Taxi Trips data set has over 100 million records with 26 fields (variables) per record. Table 1.2 presents a subset of fields and their meaning as described in the Taxi Trips documentation (Levy, 2017). We will later see that the formatting of the data in the taxi trip data set is going to present some challenges when working with the data in Python. On a positive note, these challenges working with real data provide a means of learning practical insights into programming with Python. Table 1.3 has sample data selected from the taxi trips data set that correspond to the fields in Table 1.2.
Table 1.2
Table 1.3
General Social Survey (GSS) Data Set
The General Social Survey has over 5,000 variables collected over a period of more than 40 years. You can explore the data online using a data explorer or download the complete data sets (http://www.gss.norc.org/Get-The-Data). Table 1.4 presents a subset of fields from the GSS and their meaning as described in the GSS Codebook (Smith, Davern, Freese, & Hout, 1972–2016), which is available at http://gss.norc.org/get-documentation.
Table 1.4
Table 1.5
The sample data shown in Table 1.5 is in ascending order of the ID value for each record. Unlike the Trip_ID in the Taxi Trips data set, the ID value is not unique in the GSS data, as we can see by the duplication of both ID 10 and ID 16 in the data in Table 1.5. In the GSS data, it is the combination of the YEAR and ID fields that is unique (we call using several fields to uniquely identify a record in a data set a composite identifier or composite key). For example, the respondent with ID 10 in YEAR 1990 is not the same as the respondent with ID 10 in YEAR 1991. Another important difference is that the data in the GSS all appear to be numeric; however, the values are not all quantitative. For example, the values for HAPPY are coded responses to a survey where 1 = very happy, 2 = pretty happy, and 3 = not too happy. Another important point is that the values for REALINC are not actually continuous (even though they might appear to be) but are discrete. These values correspond to the midpoints of income ranges specified in a survey, and the values prior to 1986 have been recoded in six-digit numbers and converted to 1986 dollars (Ligon, 1994; Smith et al., 1972–2016).
Lessons learned: In this section, we learned about the Chicago Taxi Trips and General Social Survey data sets, which will use throughout the text.
Chapter Summary
In this chapter, we learned that Python is free and open-source software (FOSS) and that more than 212,000 projects with packages written in Python are available to use and modify in the Python Package Index. The specific goal of this book is to teach Python programming to those in the fields of social sciences and business to develop applications using Python packages for data analytics. We next learned how to install Python on our computer and that there are different versions of Python. We also learned that using different operating systems and different versions of Python can affect how we write and execute Python code. We then learned how to write and execute Python code in the IDLE shell window and how to write and execute Python code that we save in plain text files. Using files enables the storing and execution of many lines of code as a program that we can save and run later. We also learned how to use package managers for convenient organization and management of libraries of code. Learning to install individual packages is an important skill to take advantage of the many Python packages for both business and social sciences purposes that are available to the Python programming community. Last, we learned about the Chicago Taxi Trips and General Social Survey data sets, which we will use throughout the text.
In the next chapter, we will cover the basic elements of Python code, using the IDLE IDE to illustrate the outcomes of executing each code example. All coding examples in the textbook (including all Stop, Code, and Understand! exercises and their solutions) are available