Specialty operators

This describes specialty FST functions for grammar compilation.

fst::CrossProduct

The cross-product operation generates a transducer from two acceptors A, B. The resulting transducer represents a relation mapping from any string in A to any string in B (hence the name "cross-product").

In the case that A is a transducer, it is implicitly reinterpreted as the acceptor given by its input projection; similarly, in the case that B is a transducer, it is implicitly reintrepreted as the acceptor given by its output projection. The user may also specify a final weight; this is multiplied by the weights of all final states.

The algorithm is as follows:

  1. Replace all output labels of A with epsilon.
  2. Replace all input labels of B with epsilon.
  3. Compose (1) and (2).
  4. If a final weight (other than semiring One, the default), is specified, multiple the weight of each final state by it.

fst::LenientlyCompose

The lenient composition (Karttunen 1998) of two FSTs A, B is simply the priority union (to be defined) of (AB) and A. The priority union of two FSTs C, D is similar to the union of C and D except that the relations in C have "priority" over those in D. For example, let it be the case that:

C(a)b

D(a)c

The ("vanilla") union of C, D is then the relation a → { b, c }, whereas the priority union of C, D is the narrower relation ac. We can implement this in Thrax as follows:

func PriorityUnion[C, D, sigma_star] {
  input = Determinize[RmEpsilon[Project[C, 'input']]];
  return C | ((sigma_star - input) @ D);
}

where sigma_star represents the closure over the alphabet.

The lenient composition of FSTs A, B is simply the priority union of (AB), A. We can implement this in Thrax as follows:

func LenientlyCompose[A, B, sigma_start] {
   return PriorityUnion[A @ B, A, sigma_star];
}

where sigma_star represents the closure over the alphabet.

fst::ConcatRange

The range-based concatenation operation is a generalization of closure. A path through the closure of an FST A is valid if it can be generated by taking zero or more paths through A.

The first argument to fst::ConcatRange represents a lower bound on the number of paths through the input FST required to form a valid path in the output FST. The second argument represents the upper bound on the number of paths through the input FST; by convention, 0 is used to indicate an infinite upper bound. Naturally, the lower bound must be less than or equal to the upper bound.

We can then derive the following familiar operations as special cases:

  • To compute "star"-closure, use arguments 0, 0.
  • To compute "plus"-closure, use arguments 1, 0.
  • To generate exactly k concatenations of an FST, use arguments k, k.

References

Karttunen, L. 1998. The proper treatment of Optimality Theory in computational phonology. In Finite State Methods in Natural Language Processing, pages 1-12.

Edit | Attach | Watch | Print version | History: r8 < r7 < r6 < r5 < r4 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r8 - 2021-05-28 - KyleGorman
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback