Domain-Sensitive Temporal Tagging. Jannik Strötgen

Читать онлайн.
Название Domain-Sensitive Temporal Tagging
Автор произведения Jannik Strötgen
Жанр Программы
Серия Synthesis Lectures on Human Language Technologies
Издательство Программы
Год выпуска 0
isbn 9781681731858



Скачать книгу

between them can always be determined, for example, as before or identical. In general, the relationship can be assumed to be one of the temporal relations defined by Allen [1983] in the context of temporal reasoning. In addition to the equality relation, there are six symmetrical relations, namely before, meets, overlaps, during, starts, and finishes [Allen, 1983]. In Figure 2.1, these relations are visualized following Allen’s presentation.

      Figure 2.1: Temporal information is well-defined so that one of the relations defined by Allen [1983] holds between any intervals X and Y. Note that all relations except the equality relation are symmetric so that in total there are 13 possible relations between X and Y.

       TEMPORAL INFORMATION CAN BE NORMALIZED

      Regardless of the terms used and even of the languages used, two temporal expressions referring to the same semantics can be normalized to the same value in some standard format. Thus, temporal information can be considered as term- and language-independent. Understanding how temporal expressions can be normalized is one important step toward realizing how temporal information can be exploited in all kinds of application and research scenarios. While we will discuss the details when introducing annotation standards for temporal information in Section 3.1, an example with different temporal expressions carrying the same meaning is shown in Figure 2.2. Note that the expressions are uttered at various reference times (tref) and are normalized to the same value on the timeline t.

      Figure 2.2: Temporal information can be normalized; the expressions uttered at various times tref have the same value in standard format (2015-10-12). Note that explicit expressions such as “October 12, 2015” are normalized independently of when they are stated. The terms “heute” and “hoy” are German and Spanish translations of “today”.

       TEMPORAL INFORMATION CAN BE ORGANIZED HIERARCHICALLY

      Temporal expressions can be of different granularities. For example, they can be of granularity day (e.g., “August 3, 1992”), month (e.g., “August 1992”), or year (e.g., “1992”). Due to the fact that years consist of months and months consist of days, expressions of one granularity (e.g., day) can be mapped to coarser granularities (e.g., month or year) based on the hierarchy of temporal information. In Figure 2.3, this hierarchy information is shown using the concept of timelines. A timeline is associated with a specific granularity (e.g., tday, tmonth, tquarter, tyear) so that expressions of respective granularities can be placed on the timelines as points in time. Note, however, that coarse expressions represent a point on the timeline with the same granularity (e.g., “August 1992” on tmonth) but span a time interval on finer granularities (e.g., “August 1992” spans from “August 1, 1992” to “August 31, 1992” on tday).

      Figure 2.3: Temporal information can be organized hierarchically. The blue triangles show how points on coarser timelines (e.g., “1990s” on tdecade) span an interval on finer timelines (e.g., “1990s” spans from “1990” to “1999” on tyear).

      There are different types of temporal expressions according to what kind of temporal information an expression refers to, for example, a point in time or a duration. Note that we use the term point in time to refer to an expression if it can be anchored on a timeline of any granularity although, strictly speaking, expressions of coarse granularities span a time interval on finer granularities (cf. Figure 2.3).

      In the context of temporal tagging, it is common practice to distinguish between the following four types of expressions—as it is specified in the temporal markup language TimeML, which will be detailed in Section 3.1 together with further annotation standards.

      • Date expressions: A date expression refers to a point in time of the granularity “day” (e.g., “July 10, 2015”) or any other coarser granularity, for example, “month” (e.g., “July 2015”) or “year” (e.g., “2015”).

      • Time expressions: A time expression refers to a point in time of any granularity smaller than “day” such as a part of a day (e.g., “Friday morning”) or time of a day (e.g., “3:30 pm”).

      • Duration expressions: A duration expression provides information about the length of an interval. They can refer to intervals of different granularities (e.g., “three hours” or “five years”). In addition to the length of the interval, it might also be possible to specify the point in time when the interval starts or ends. However, the main semantics of a duration expression is about the length of the interval.

      • Set expressions: A set expression refers to the periodical aspect of an event, that is, it describes a set of times or dates (e.g., “every Monday”) or a frequency within a time interval (e.g., “twice a week”).

      As mentioned above, date expressions—and also (coarse) time expressions—can also be considered as time intervals since there is always a smaller temporal unit out of which such expressions consist, for example, a single “day” as a point in time consists of hours and could thus be regarded as a duration of the granularity “hour”. However, time and date expressions can be placed on timelines as single points—although the timelines are of different granularities depending on the expressions, as exemplified in Figure 2.3. In contrast, a duration expression cannot be placed on a timeline as a single point although the point in time when the interval starts or ends might be specified in addition to the length of the interval. Thus, time and date expressions of different granularities are not treated as durations despite the fact that they often have a duration.

      Temporal expressions, in particular those of the types “date” and “time”, can be realized in natural language in several different ways. Besides the fact that the full variety of realizations should be covered and thus extracted by a temporal tagger, a major issue is that depending on the realization, the difficulty in the normalization of date and time expressions varies significantly.

      Many different terms have been used in the literature to describe various realizations and characteristics of point expressions, and a brief survey of alternative namings and their descriptions is given below. In this book, we use the four types of realizations described by Strötgen [2015], whose namings are motivated by observations earlier discussed in the literature. However, the goal of the four types is to cover those characteristics of point expressions that are particularly relevant for temporal tagging. In Table 2.1, the four categories are shown with sample expressions and an explanation of what information is required for their normalization.

      • Explicit expressions: Explicit expressions are date and time expressions that carry all the required information for their normalization. Thus, no further knowledge or context information is required, the expressions are fully specified and context-independent.