- OpenFst Forum 2015 Archive
- converting standard lattice format into and FST file
- Minimization for fsts with negative weights
- Gentle Intro to Lazy FST Operations
- Wildcard arcs
- TopSort not working on FST
- Generation of words accepted by a Finite-State Machine
- Arcs labeled with strings
- Putting only one expression on arcs
- Expected edit distance
- Assigning the symbol table to FSTs in C++
- From weighted words to weighted phrase: how to isolate word scores at phrase level?
- Compiling with mingw32
- Is there an existing implementation to check if one node is reachable from another in an fst ?
- Lookahead composition without PushWeightsComposeFilter
- fstshortestpath --unique?
- farcreate bug

I get weird results when I try to minimize a tropical weight fst with negative weights as shown below. Additionally, the minimization takes quite long (around 20 seconds) although the fst only has 2 states.

$ cat bar.txt 0 1 1 1 -5 1 1 2 2 -1 1 0 $ fstcompile bar.txt bar.ofst $ fstminimize bar.ofst bar.min.ofst $ fstprint bar.min.ofst 0 1 1 1 -16777220 1 1 2 2 1 16777216

As far as I understand, this fst is already minimal so `fstminimize`

shouldn't really do anything.

If I convert all weights to positive weights, the problem vanishes

$ cat baz.txt 0 1 1 1 5 1 1 2 2 1 1 0 $ fstcompile baz.txt baz.ofst $ fstminimize baz.ofst baz.min.ofst $ fstprint baz.min.ofst 0 1 1 1 5 1 1 2 2 1 1

I'm using the latest stable release (http://openfst.org/twiki/pub/FST/FstDownload/openfst-1.5.0.tar.gz).

The minimization algorithm first applies weight pushing to normalize the distribution of weights along the paths of the input automaton. Weight pushing algorithm requires shortest distance from any state to final states to be defined. The example automaton with negative weights violates this condition due to the negative weight cycle.

You can look into Jason Eisner's WFSA minimization algorithm if you need to minimize automata with negative weights, particularly if they have negative weight cycles.

Cheers, Dogan

I don't know if there is a publicly available implementation of Eisner's algorithm.

Cheers, Dogan

- accept any as-yet unspecified symbol as input
- output the same symbol

So in my head I imagine something like:

0 1 a a [wgt_a] 0 1 * * [wgt_other]

Where the asterisks represent "any other symbol", i.e. that arc operates as a catch-all for any symbol that is not "a".

(Update 17:35 - I think what I want is a "rho" matcher. Can I specify this inside a text file? Or is there another way to do it?)

Thanks!

In OpenFst, special symbol information is not stored in the FSTs themselves, but are handled by the FST operations that accept these symbols as arguments. Unfortunately, special symbol matching operations are not available through the command line interface of standard OpenFst binaries. What you can do instead is to use a custom FST composition binary that can handle `rho`

symbols.

Check out https://github.com/dogancan/lexicon-unk. There you will find a tool called `fstphirhocompose`

which allows bare bones matching of `phi`

and `rho`

symbols on the second FST. This tool won't do what you need out of the box since you want matched symbols to be copied to the output side. To fix that, simply change each occurrence of `MATCHER_REWRITE_NEVER`

with `MATCHER_REWRITE_ALWAYS`

in `fstphirhocompose.cc`

before compiling the code. You use this tool exactly the same way you would use fstcompose, however you can also specify the labels corresponding to the `phi`

and `rho`

symbols in the second FST through the command line interface. Take a look at the `run.sh`

script in that repo to see how this tool can be used to match special symbols.

Cheers, Dogan

5 0 1 66 66 0 2 81 81 0 3 2837 2837 1 2 10 10 1 3 749 749 1 4 930 930 2 3 66 66 2 4 324 324 2 5 864 864 3 4 8 8 3 5 146 146 4 5 30 30

fsttopsort on this fst does not sort the states either. I overcame this by just piping fstprint X | sort -k 1 -n | fstcompile

Any ideas as to why this might be happening and how it can be fixed inside the C++ code?

It is hard to debug your problem without seeing the code. My best guess is that you are not setting state 0 as the initial state.

Cheers, Dogan

I would like to know how to print all the words that can be accepted for a Finite-State Machine. Note that I'm using acceptors. There is a library called Lextools who does the Job, specifically using a function of that library called lexfsmstrings , but unfortunately it's not longer available free by the guys of AT&T.

If the FST is not acyclic, it would, of course, take not just "somewhat" longer but infinitely longer

I have another question. I need to print using Graphviz a FST whose arc labels could be displayed as letters. I will highly appreciate a solution for this case. Thanks

Julio

I need to put only one expression in the arcs of my FST since I'm not<br> modelling transducers, but acceptors. <br>

I mean, for example, to put in an arc only "x" instead of "x:y". <br>

If someone has a quick solution to this, I will highly appreciate it.

Regards, <br> Julio Carrasquel <br> juliocarrasquel@gmail.com

I found a solution using the option --acceptor=true through the command fstdraw.

Thanks in any case.

I was wondering if it is possible using the library to compute the expected edit distance between two distributions over strings.

So, if X and Y are FSAs that contain a single sequence each, and T is my edit distance transducer then shortestdist(X o T o Y) over the tropical SR computes the edit distance.

(a) Now assume X and Y are probabilistic FSAs (log SR), i.e. they define a valid distribution over a set of sequences, how do I compute the expected edit distance, i.e. \sum_x \sum_y P(x) P(y) ED(x,y)?

(b) And finally, given a probabilistic FST that defines a joint distribution for X and Y, how do I compute the expected edit distance over the joint, i.e. \sum_x \sum_y P(x,y) ED(x,y)?

I guess in the case (a) the expectation SR might help? But how do I do it in the case (b)?

Any help would really be appreciated.

Thanks,

Roland

There is an algorithm by Mohri for computing expected edit distance between weighted automata. See this paper for details.

I implemented a slightly modified version of this algorithm some time ago. I just put together a simple setup based on that implementation and pushed it to GitHub in case you want to take a look at it.

https://github.com/dogancan/expected-edit-distance

Cheers, Dogan

Roland

Take a simple example: the input is a character stream, made of words inter-spaced with blanks. A non-weighted and functional lexicon FST maps sequences of letters to words. To deal with uncertainty, a weighted and non-functional extension of it maps a sequence of letters to word/score pairs.

For handling a sequence of words, a concatenation of WFSTlex is needed, resulting in multiplying (in the weight semiring) word scores but eventually, the contribution of each word is lost: the only score available encompasses the entire input.

So a language model FST, further down in the cascade has no access to individual word scores, preventing it to compute a context-dependent weighted average for example, associating a word x with a different coefficient depending on it preceding y or z.

make LDFLAGS=-lmman

Here is the patch:

diff -rupN old/src/lib/mapped-file.cc new/src/lib/mapped-file.cc

old/src/lib/mapped-file.cc 2014-04-30 00:15:18.000000000 +0200 +++ new/src/lib/mapped-file.cc 2015-02-26 08:53:46.339405863 +0100 @@ -20,6 +20,10 @@ #include <fcntl.h> #include <unistd.h>

+#if (defined _WIN32 || defined _WIN64 || defined ** WINDOWS** || defined

// Alignment required for mapping structures (in bytes.) Regions of memory
@@ -76,7 +80,13 @@ MappedFile* MappedFile::Map(istream* s,
size_t pos = spos;
int fd = open(source.c_str(), O_RDONLY);
if (fd = -1) {
+#if (defined _WIN32 || defined _WIN64 || defined ** WINDOWS** || defined

typedef PUSH_LABELS_FILTER COMPOSE_FILTER;

// My compose options fst::CacheOptions cacheOptions; fst::ComposeFstImplOptions<LOOK_MATCHER, SORTED_MATCHER, COMPOSE_FILTER> composeOptions(cacheOptions); composeOptions.matcher1 = new LOOK_MATCHER(_graph1, fst::MATCH_OUTPUT);

//Relabel _graph1 such as fst::StdOLabelLookAheadFst graph1Look(_graph1); LOOK_MATCHER::MatcherData* matcherData = composeOptions.matcher1->GetData(); fst::LabelReachable<Arc> graph1Reachable(matcherData); std::auto_ptr<GRAPH> graph1Relabeled(&_graph1); graph1Reachable.Relabel(graph1Relabeled.get(), false); graph1Relabeled->SetInputSymbols(NULL);

//Relabel _graph2 such as fst::LabelLookAheadRelabeler<GRAPH::Arc>::Relabel(graph2TrueRelabel.get(), graph1Look, true); fst::LabelReachable<Arc> graph2Reachable(matcherData); std::auto_ptr<GRAPH> graph2Relabeled(&_graph2); graph2Reachable.Relabel(graph2Relabeled.get(), true);

fst::ArcSort(graph2Relabeled.get(), fst::StdILabelCompare());

composeOptions.matcher2 = new SORTED_MATCHER(*graph2Relabeled.get(), fst::MATCH_INPUT); composeOptions.filter = new COMPOSE_FILTER(*graph1Relabeled.get(), *graph2Relabeled.get(), composeOptions.matcher1, composeOptions.matcher2);

std::auto_ptr<GRAPH> composedRes(new GRAPH); *composedRes = fst::ComposeFst<Arc>(*graph1Relabeled.get(), *graph2Relabeled.get(), composeOptions); </verbatim>

But result of my lookahead composition code is not equal to result of etalon code: my graph is much smaller than etalon. What I'm doing wrong? May be exists other way?

Etalon lookahead composition code: http://pastebin.com/wbD2ZudV My lookahead composition code: http://pastebin.com/LjAUdv91

My questions: 1) What is the meaning of the --unique parameter? 2) How can I get the behavior I want using OpenFst (i.e. all the possible output strings)?

Thanks! Géza

(Presumably the reason --unique doesn't have the semantics you expected it to is that the machine might be cyclic, in which case termination would not be guaranteed. But any suggestions to improve the documentation would be taken under consideration.)

I just wanted to report a small bug in `farcreate`

. When the `--file_list_input`

option is set to `true`

, `farcreate`

skips the first input file on the command line due to an off by one error. Below is the relevant fix.

Cheers, Dogan

diff -ur openfst-1.4.1.orig/src/include/fst/extensions/far/create.h openfst-1.4.1/src/include/fst/extensions/far/create.h --- openfst-1.4.1.orig/src/include/fst/extensions/far/create.h 2014-04-25 02:55:40.000000000 -0700 +++ openfst-1.4.1/src/include/fst/extensions/far/create.h 2014-04-25 02:57:48.000000000 -0700 @@ -48,7 +48,7 @@ vector<string> inputs; if (file_list_input) { - for (int i = 1; i < in_fnames.size(); ++i) { + for (int i = 0; i < in_fnames.size(); ++i) { ifstream istrm(in_fnames[i].c_str()); string str; while (getline(istrm, str))

Edit | Attach | ~~Watch~~ | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions

Topic revision: r1 - 2018-05-30 - MichaelRiley

Copyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.

Ideas, requests, problems regarding TWiki? Send feedback

Ideas, requests, problems regarding TWiki? Send feedback