Deep Learning Approaches to Text Production. Shashi Narayan

Читать онлайн.
Название Deep Learning Approaches to Text Production
Автор произведения Shashi Narayan
Жанр Программы
Серия Synthesis Lectures on Human Language Technologies
Издательство Программы
Год выпуска 0
isbn 9781681738215



Скачать книгу

more complex MR-to-text transformations. They are, in particular, prevalent when the input MR is an Abstract Meaning Representation (AMR) or a deep unordered dependency tree.

      For AMR-to-text generation, Flanigan et al. [2016] convert input graphs to trees by splitting reentrencies (nodes with multiple parents), and then translating the trees into sentences using a tree-to-string transducer. Song et al. [2017] use a synchronous node replacement grammar to simultaneously parse input AMRs and generate sentences. Pourdamghani et al. [2016] first linearise input graphs by breadth-first traversal, and then use a phrase-based machine translation (MT) system to generate text from the resulting linearised sequences. Castro Ferreira et al. [2017] frame generation as a translation task, comparing two different Machine Translation (MT) approaches (phrase-based and neural MT) and providing a systematic study of the impact of 3 AMR pre-processing steps (delexicalisation, compression, and linearisation) applied before the MT phase.

      For deep dependency trees, Bohnet et al. [2010] uses a cascade of support vector machine (SVM) classifiers whereby an initial classifier decodes semantic input into the corresponding syntactic features, while two subsequent classifiers first linearise the syntax and then perform morphological realisation to inflect the input lemmas.

      Statistical approaches have also been used to generate from shallow dependency trees. For instance, Filippova and Strube [2007, 2009] propose a two-step linearisation approach using maximum entropy classifiers, first determining which constituent should occupy sentence-initial position, then ordering the constituents in the remainder of the sentence.

Image

      Figure 2.3: Simplifying a sentence.

      2.3TEXT-TO-TEXT GENERATION

      Just like data- and MR-to-text production usually decompose the generation task into several sub-tasks, pre-neural Text-to-Text generation focuses on modelling several operations and their interactions. Simplification, compression, and paraphrasing are generally viewed as involving all or some of four main operations: sentence splitting, phrase rewriting, phrase reordering, and phrase deletion, while summarisation is generally decomposed into a three-step process involving content selection (selecting the key information in the input), aggregation (grouping together related information), and generalisation (abstracting from the input document to produce a better text).

      Sentence Simplification. As illustrated in Figure 2.3, sentence simplification maps a sentence to a simpler, more readable text approximating its content. Typically, a simplified text differs from the original version in that it involves simpler words, simpler syntactic constructions, shorter sentences, and fewer modifiers. In pre-neural approaches, simplification has thus often been modeled using four main operations:

      • splitting a complex sentence into several simpler sentences;

      • deleting words or phrases;

      • reordering phrases or constituents;

      • substituting words/phrases with simpler ones.

      As for data-to-text generation, existing approaches vary in whether they capture all or some of these operations and on how these operations are integrated (pipeline vs. joint approach).

      Earlier work on sentence simplification focused mostly on splitting and rewriting, relying on handcrafted rules to capture syntactic simplification, e.g., to split coordinated and subordinated sentences into several, simpler clauses or to model active/passive transformations [Bott et al., 2012, Canning, 2002, Chandrasekar and Srinivas, 1997, Siddharthan, 2002, 2011].

Image

      Figure 2.4: A Sentence/Compression pair.

      A large strand of work has focused on developing machine learning approaches to sentence simplification and used the parallel data set formed by Simple English Wikipedia (SWKP)1 and traditional English Wikipedia (EWKP)2. Zhu et al. [2010] learned a simplification model inspired by syntax-based machine translation techniques [Yamada and Knight, 2001], which encodes the probabilities for the four rewriting operations (substitution, reordering, splitting, and deletion) on the parse tree of the input sentence. It is combined with a language model to improve grammaticality and the decoder translates sentences into simpler ones by greedily selecting an output sentence with the highest probability. Woodsend and Lapata [2011] learn a quasisynchronous grammar [Smith and Eisner, 2006] describing a loose alignment between parse trees of complex and simple sentences. Following Dras [1999], they then generate all possible rewrites of a source tree and use integer linear programming to select the best simplification. In Coster and Kauchak [2011], Wubben et al. [2012], simplification is viewed as a monolingual translation task where a complex sentence is the source and a simpler one is the target. To account for deletions, reordering and substitution, Coster and Kauchak [2011] trained a phrasebased machine translation system on the PWKP corpus while modifying the word alignment output by GIZA++ in Moses to allow for empty alignments. Similarly, Wubben et al. [2012] train a phrase-based machine translation system augmented with a post hoc reranking procedure designed to rank the outputs based on their dissimilarity from the source. Finally, Narayan and Gardent [2014] present a hybrid approach combining a probabilistic model for sentence splitting and deletion with a statistical machine translation system trained on PWKP for substitution and reordering.

      Sentence Compression. Most work on sentence compression is extractive.3 That is, the generated compressions are composed of sub-sequences of the input sentence. As a result, work on sentence compression mainly focuses on deletion. However, syntax is often used to optimise the well-formedness of the compressed output. This partially captures syntactic transformations that may result from extractive compression.

      Dorr et al. [2003], Jing and McKeown [2000], Zajic et al. [2007] rely on rule-based approaches to determine which words should be kept and which should be deleted. Galanis and Androutsopoulos [2010], Gillick and Favre [2009], Knight and Marcu [2000], Turner and Charniak [2005] developed supervised models trained on parallel data. Finally, Clarke and Lapata [2006], Filippova and Strube [2008], Woodsend and Lapata [2011], Woodsend et al. [2010] present unsupervised approaches which rely on integer linear programming and a set of linguistically motivated constraints to infer globally optimal compressions.

      Sentence Paraphrasing. Unsurprisingly, work on paraphrasing has mainly been concerned with modelling phrase reordering and substitution. Pre-neural approaches include rule-based approaches, drawing on linguistic resources and approaches based on statistical machine translation (SMT).

      Thus, McKeown [1983] present a rule-based method for deriving a paraphrase from the input sentence and its syntactic tree. Bolshakov and Gelbukh [2004] use WordNet and internet statistics of stable word combinations (collocations) to generate paraphrases: words or expressions are substituted with WordNet synonyms only if the latter form valid collocations with the surrounding words according to the statistics gathered from internet.

      Paraphrase generation can be viewed as a monolingual machine translation task. Building on this intuition, Quirk et al. [2004] train a paraphrasing SMT system on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Zhao et al. [2008] use SMT and multiple resources to generate paraphrases. Specifically, a phrasal paraphrase table and a feature function are derived from each resource, which are then combined in a log-linear SMT model for sentence-level paraphrase generation. Similarly, Napoles et al. [2016] provide a black-box machine translation model which uses the PPDB paraphrase database and a statistical machine translation model to generate paraphrases.

      Grammar-based models of paraphrases have also been explored. For instance, Narayan et al. [2016] introduces a grammar model for paraphrase generation which is trained on the Paralex corpus, a large monolingual parallel corpus, containing 18 million pairs of question paraphrases.

      Intuitively,