Название | Designing Geodatabases for Transportation |
---|---|
Автор произведения | J. Allison Butler |
Жанр | Базы данных |
Серия | |
Издательство | Базы данных |
Год выпуска | 0 |
isbn | 9781589482913 |
Data dependencies and redundancies are inefficient, but data redundancies can be downright dangerous, as they can undermine database integrity. Try to put the same information in more than one place, and you will likely find those places have different information. Put a road’s speed limit in several places, and you must search for places where the speed limit changes. You also have to change them all at once. Otherwise, some users will get different answers when they read the record. Normalization is a big deal with large datasets.
Normalization eliminates redundancies and dependencies so that each piece of data is in only one place for editing. There are five cumulative forms of normalization:
• First Order (First Normal Form, or 1NF) = Attribute domains consist of only scalar values (field contains only a single value) and each row has the same columns
• Second Order (2NF) = 1NF plus every nonkey attribute is irreducibly dependent on the primary key
• Third Order (3NF) = 2NF plus every nonkey attribute is nontransitively dependent on the primary key
• Fourth Order (4NF) = 3NF plus cannot contain two or more independent multivalued facts
• Fifth Order (5NF) = 4NF plus a symmetry constraint
Do not worry about trying to figure out what all this jargon means. An example will make it simple. If you are a computer scientist, you can skip ahead to the end of the chapter.
The example involves an employee database for a national company with offices in multiple cities. The primary key is EmployeeID. The Employee table stores information about each employee, such as where they work.
It is your job to ensure a normalized database design for the Employee table. First Normal Form (1NF) is easy to do. You just have to make sure that each row includes the same columns, and that all the columns contain only scalar values. A scalar value is one with a single value that is atomic, or indivisible. It means here that you cannot just pile a bunch of different people into one row. You get one employee ID, one department, one building, and one city for each employee in the table. There are two skill fields, but they are scalar. 1NF: Mission accomplished.
Table 3.1a Employee table
The Employee table is now in 1NF. The next step, to reach 2NF, removes nonkey dependencies, which means you must eliminate fields that depend on each other rather than the thing you are talking about. The Employee table talks about employees. Reaching 2NF means removing fields from the table that do not relate to an employee. In this case, the city in which the building is located is determined by the building, not the employee. (One city to a building.) So, you can create a lookup table for buildings that lists the building name and the city it is in.
Table 3.1b Employee table
Table 3.1c Building table
Achieving 3NF requires more work. The City field is an attribute that is unique to each building; i.e., building determines city. It turns out that each department is contained in one and only one building, which makes building dependent on department, reducing city to a secondary dependency. Reaching 3NF involves eliminating secondary dependencies, which is where the value for one attribute depends on an attribute that is a nonkey dependency. The Building field has a nonkey dependency in that the value of Dept, not EmployeeID, determines it. A Department table needs to be added to the design.
Table 3.1d Employee table
Table 3.1e Department table
Table 3.1f Building table
Incidentally, you cannot simply turn the Building table into a Department table. That would cause the Building table to violate 3NF. The City field would have a secondary dependency on DeptID.
4NF removes redundant columns, like the two employee skill columns in the Employee table. The original design with two skill columns would force you to look in both columns to see if an employee had a particular skill. You need to work smarter, not harder, and move employee skills to another table where you can have multiple records for each employee, one row for each skill an employee possesses. The database design shown below is in 4NF. You now have four tables, one for assigning employees to departments, one for identifying the building in which a department is located, one for saying where a building is located, and one for describing the special skills each employee possesses. The database design is now fully normalized because there are no redundancies or secondary dependencies.
employeeID | dept |
101 | shipping |
104 | personnel |
102 | sales |
106 | shipping |
Table 3.1g Employee table
Table 3.1h Department table
Table 3.1i Building table
Table 3.1j EmployeeSkill table
Normalization puts each piece of information in one place. If you move the Personnel Department to a new building in Chicago, updating the entire list of employee locations requires only a single change in the Department table. With the original design, you would have needed to change the record for each employee in that department.
Linear transportation facilities are segmented to provide a one-to-one relationship between attribute records and geometry. Such a segmented transportation database is not normalized in that changing a single attribute will often require that multiple records be updated. Consider the following example Street table rows, where SegmentID is the primary key.
Table 3.2a STREET table
There are four rows with information about Sesame Street, which is segmented according to address block. The STREET table has redundancies.