Site sponsored by Linguistic Technologies, Inc.

PROPOSALS FOR ORDERING WELL-FORMED SYNTACTIC STATEMENTS

Floyd Billings and Tracy Thomson
Translation Sciences Institute
Brigham Young University

March 1972

Explanatory Note

The components of deep structure (J-trees) in Junction Grammar are not chronologically ordered in relation to each other but, rather, in their joint structural capacity constitute a mobile whose elements remain undisturbed by lexical interpretation. The sequential ordering that is characteristic of speech or writing is introduced during the process of transposition from Language-Of-Mind (junction trees), as it were,to Language-Of-Mouth/Hand.

This article is adapted from a presentation by two members of BYU’s One-To-Many translation project in 1972. The objective of the presentation was to demonstrate how word order for diverse languages can be obtained by formulating Lexical Ordering Rules (LO rules). The nature of such rules and how they operate is the subject of the article. The languages for which ordering is generated are German and Japanese.

This discussion of lexical ordering does not reflect the incorporation of the referment template into J-trees, an upgrade which was implemented computationally shortly thereafter and provided explicit nodes for articles and auxiliaries, as well as a number of other syntacto-semantic elements.

We note parenthetically that the referment template, while predating in its formulation the X-bar generalization of Transformational Grammar, is comparable in many respects. See Doing More with Structure, included in the present collection of articles, for further discussion of the referment.

Ordering German

The Synthesis[1] programs generate a language specific Level IV string which may be either a sequence of sounds or a sequence of written symbols from a Level II representation (Figure 1).

Figure 1

Several processes are involved in synthesis: Lexemes must be produced which correspond to the underlying semantic indices; inflection and other types of agreement must be considered; duplicate and unnecessary elements must be suppressed. In addition, of course, the string must also be ordered.

In this paper we will discuss the ordering aspect of synthesis. In particular we will present some proposals which have been found useful in ordering Well-formed Syntactic Statements (hereafter referred to as a W-F-S-S, or "WFSS").[2] Although parts of these proposals were developed while working on German synthesis, their applications can be extended to all languages.

Before approaching my subject directly, I would first like to explain the basic cycle of junction grammar synthesis. In transformational grammar, structure is processed in what might be called a horizontal cycle (see Slide 1 below). A basic, or kernel, structure is first generated. The basic structure may then be processed -that is, transformed - into a more complex structure. This resulting structure may be further transformed, perhaps repeating the cycle several times. Only when all transformations are complete is it possible to apply other rule components.

In a generative junction grammar, on the other hand, a structure is processed only once. When a WFSS, or any part of a WFSS, is generated, it is never changed (except for transfer during translation between languages). This makes possible a vertical cycle (Slide 2) in which the first part of a WFSS may be completely processed and lexicalized before the last part is even generated.

Slide 2

This seems to approximate the natural speech process in which a person may begin a sentence before he has decided how it will end. The only restriction on the vertical cycle is that it must be of sufficient width to encompass a logical unit of meaning. Ordering is accomplished within the framework of the vertical cycle through the application of Lexical-Ordering Rules, or LO Rules, as we refer to them. These rules may be defined as a group of statements which specify ways in which Level II structure is manifest in Level IV word order. (Slide 3 below).

LO Rules are, of course, language specific. The most fundamental LO Rules are those which specify that a particular structure is to be processed from right to left or from left to right (see figure next below). When one of these rules is applied to a junction rule, it determines the order in which the nodes of that rule are processed and hence the order in which the Level IV string is generated. For example the rule which adjoins a V and an N to form a PV will place the direct object before the verb in Level IV when ordered right to left (slide 4). The same rule ordered left to right will place the direct object after the verb in Level IV. This does not in any way change or transform the Level II structure.

In analysis the grapheme sequence Verb-Object and the sequence Object-Verb are found to have the identical Level II structure (see Slide 5).

When two or more junction rules are combined in one Reverse polish[3] (hereafter, simply polish) string, the possibilities for ordering the string are increased. A polish string with two junction rules, and consequently two equivalence operators, can be processed in four orders. Each of these orders generates a distinct sequence of graphemes (see Slide 6 below). In general, a polish string with N equivalence operators can be ordered in 2^N different ways. Thus a polish string with three equivalence operators can generate 2³ ( 8) different or sequences in Level IV by applying different language specific LO rules to each junction rule involved. Thus a junction grammar is equipped with a powerful device for generating a great variety of Level IV strings with language-specific ordering from a single, universal, neutrally-ordered Level II structure.

However, this process cannot generate all possible Level IV sequences. As we saw, a polish string having two equivalence operators can be ordered in four ways so as to generate four different sequences of Level IV graphemes. A simple calculation, though, will show that the three elements - Subject, Object, Verb - can be arranged not only in these four ways but also in two other ways: namely, Verb-Subject-Object and Object-Subject-Verb. Thus the polish string with two equivalence operators can be ordered to generate four out of six possible Level IV sequences (slide 7).

With longer polish strings and longer Level IV sequences, the proportion of the total possible orders that can be generated drops. In general, as we saw, the polish string with N equivalence operators is capable of 2^N orders. However N elements in Level IV have N! (N factorial) potential orders. This is not to suggest that all, or even most, of these N! orders would ever occur. However, it does raise the possibility that the basic LO rules, as powerful as they may be, may still not be adequate to account for all language-specific orders.

As we formulate LO rules for German synthesis we encounter such word orders. Take for example the sentence (slide 8) "Erfuhr das Auto in die Stadt." "He drove the car into the city."

In order to obtain the Level IV string in the proper order, it is necessary to process the terminal nodes of the polish string in the order shown. There is no combination of LO rules which can generate this sequence. When the same sentence appears in a compound tense (slide 9), the situation seems even worse. Now Level IV not only seems to require an impossible sequence, but it also requires that the V node be lexicalized in segments which appear in different parts of the string.

However this seemingly more complicated situation also suggests a solution. Except for one element - the finite verb – everything conforms to an order that can be generated (slide 10). We can order the entire polish string right to left and generate 1, 3, 4, and 5 in the proper sequence. What we need is a means of generating 2 between 1 and 3 - that is, from the PV node. Of course we can program this to happen, but can we find any justification for it?

Let's consider the nature of the equivalence operator and the PV node. Every equivalence operator signifies a junction rule. Every equivalence operator has a domain which specifies the amount of the polish string which is affected by the particular operation which the rule defines (slide 11). In each case the first node to the left of the equivalence operator, called a dominant node, is actually a label which specifies which structure is formed by the operation of the rule. According to the LO rule used, the dominant node is either the first or the last node processed when synthesizing the part of the Level II string within a particular domain.

The concept of the domain is important in Level II. Therefore it should not be surprising if the entrance into or exit from a domain should somehow be manifest in Level IV. This appears to be what happens here (slide 12). The elements 1, 3, 4, and 5 are generated in proper sequence by use of the right to left LO rules throughout. The finite verb is generated at the point of entrance into the domain of the PV. This concept, with a few amplifications properly places verbs according to the requirements of German word order.

There are also other manifestations of entrance into a domain. Consider the common article-adjective-noun sequence (slide 13). Theoretically the article is specified as a feature on the noun. Therefore to obtain the proper sequence it is necessary to first process the terminal N node for the article, then the subjunct for the adjective, and then the N node a second time for the noun. There is no LO rule which can do this.

If we employ the concept just developed for placing German verbs, (Figure 13) we can generate the article from the dominant node and make it appear in its proper level IV sequence. Theoretically we are saying that the article is a Level IV manifestation of the Level II entrance into the domain of the rule which joins an N and a subjoined structure to form an N (slide 14).

In German there seems to be a strong feeling for the limits of this particular domain. This is shown by the degree to which the subjunct can be extended without losing the continuity of thought (slide 15). One can say, for example, "Ein vor Jahren durch Erdbeben und Feuer zerstörtes Haus... A years-ago-by-earthquake-and-fire-destroyed house."

This principle of generating elements from the dominant node supplements the right to left and left to right LO rules in a most effective manner. This method does not allow us to generate all of the orders not able to be generated by those LO rules, or even any major portion of them, but it does enable us to generate precisely the sequences we need most.

There is another type of sequence that needs to be dealt with, namely, the discontinuity. Discontinuities are manifest in sentences such as: Es überrascht mich, dass erkommt. "It surprises me that he comes." If this sentence were ordered continuously, it would read something like: Es, dass erkommt, über raschtmich . "It, that he comes, surprises me." Such an order is not considered normal in either English or German.

To order discontinuities we have developed an LO procedure which requires only that we identify three points in the polish string (slide 16 below):

1. the beginning of the discontinuous element (called PRE),

2. the end of the discontinuous element (called END), and

3. the insertion point (called INS).

When these three points are marked, a simple program will then redirect the normal processing at these points so that the proper sequence is generated (slide 16).

With the three ordering procedures very briefly outlined here we have the needed tools to develop most of the LO rules we need for German synthesis:

1. The application of the left to right and right to left LO rules enables us to generate many basic patterns.

2. The understanding and application of the concept of marking the point of entrance into domains enables us to handle many additional patterns.

3. The use of a system for the insertion of discontinuous elements allows us to order a number of other patterns correctly.

It is assumed that future research will show other principles which we do not yet fully understand and also suggest additional applications of the concepts we are already using.

Ordering Japanese

We will now consider the capabilities of this ordering system in providing for the synthesis of Japanese. The study of this language was undertaken as an opportunity to test Junction Grammar concepts, and expand or revise them if necessary through study of a language whose Level IV phenomena were vastly different from those of the Indo-European group previously examined. In spite of the Level IV differences, we found that a very simple LO system in combination with the Junction Grammar rules proved to be a very powerful tool in the synthesis of Japanese.

The discussion will include an examination of some Japanese examples, the formulation of a general LO rule, and a brief review of the application of this general LO rule to machine programs.

As has been stated, Level II structure is neutral in order. The Junction Grammar rules, however, have a specific format, e.g., X(o Y)^1..n= Z, where X is the primary operand, followed by an adjunctive, conjunctive, or subjunctive operation (o) and the secondary operand (Y), then the equivalence operation (=) and finally the label or dominant node (Z).

"Right-left" and "left-right" refer to the order in which the nodes of junctions are lexically realized. For example, ordering the rule V + N = PV in the order left-right yields a verb-object Level IV order, while ordering it right-left yields an object-verb order (slide 17).

To facilitate the following discussion, the trees will be diagrammed to reflect the rule format as nearly as possible, by positioning the primary operand to the left of the secondary operand under each label node. This will necessitate the duplication of some topic nodes, which will be indicated by circles around the duplicated nodes, with a line drawn to connect the circles.

We would call attention at this point to our intention of choosing Japanese examples which exhibit the greatest degree of neutrality. By neutral, we mean those examples which conjure emphatic overtones to the lowest degree, given a particular structure. Japanese allows several alternate word orders for most of its sentences depending upon whether something is to be emphasized or not. As nearly as possible, the Level IV forms examined were void of emphasis.

Slide 18 depicts the tree diagram and Junction Grammar rules for the Japanese equivalent of "Tom eats apples." English word order is obtained by ordering rule (1) right-left and rule (2) left-right. Japanese order requires that both rules be ordered right-left.

Slide 19 compares English and Japanese when a PV modifier, "at school," is added to the structure. Although English order requires both right-left and left-right LO rules, Japanese order is achieved by right-left ordering of all rules.

Slide 20 indicates the ordering of the Japanese equivalent of "The man who ate the apple died."Right-left ordering of all rules will yield the proper Japanese order. Note also that Japanese does not lexicalize the topic node, as English does in the case of "who."

We studied numerous examples and prepared a list of the Junction Grammar rules used in Japanese and their respective ordering patterns (slide 21). A perusal of the chart indicates Japanese ordering consistency.

Slide 22 depicts ten Japanese sentences of moderate complexity. All sentences are examples of straight right-left ordering of all rules. Sentence (9a) requires that one rule be ordered discontinuously to achieve a more comfortable order. Sentence (9b) is the better form of the sentence.

The application of a general right-left ordering pattern to machine synthesis of Japanese proves very successful, with two main exceptions:

(1) Discontinuities, some of which may be attributable to vague antecedent-topic relationships as a result of distance in the output string;

(2) Conjunctions, whose elements order themselves chronologically rather than as a reflection of rule ordering. For example, if one said, "I saw John, and Bill, and Mary," it would normally be a reflection of the order in which they came to mind, or their chronological order: John first, then Bill, then Mary.

There is evidence that in Japanese conjunction rules, like the other rules, order right-left. In language, there is a natural pause between elements of a conjunction. This is expressed as a comma in written English. Note the form of the general conjunction rule (slide 23). If we carry the brackets into the expanded rule, the pauses are found to come immediately preceding each new bracket. Note the parallel between the English out-put string and the rule. On the other hand, notice that if the rule is ordered right-left, the English out-put string looks very much like the Japanese. With chronological ordering of the sememes, and a right-left LO rule, the proper Japanese is obtained.

For most cases in translation, the order of the conjoined sememes must reflect the order of the sememes in the source text to preserve chronological order. Since it is easier for the computer to process the rule left-right (as in English) to preserve the chronological order of the English input than to order right-left and attempt a change of sememic processing order, we have chosen for the present to order conjunctions left-right in the computer and let our system of graphological rules insert the proper particles at the proper places to achieve the desired form.

[1] Computer translation models typically divide the translation task into three components, namely, Analysis, Transfer, and Synthesis (or Generation). Word order determination by way of Lexical Ordering Rules is a function of Synthesis, the third and final phase of the sequence.

[2] Junction trees are referred to formally as well-formed-syntactic-statemenst, or WFSS’s, for short.

[3] Reverse polish notation places the operator after its operands rather than between them.


	[Home] [Origins] [Article Archive] [Foundations] [Formalizations] [Analyses] [Pedadogy] [Forum] [Guest Book] [BYU]


	Copyright© 2004 Linguistic Technologies, Inc.