TWiki
>
GRM Web
>
NGramLibrary
>
NGramQuickTour
>
NGramCount
(revision 5) (raw view)
Edit
Attach
---+ NGramCount ---++ Description This utility counts n-grams from an input FST archive. This produces a count FST with the same topology as the eventual normalized model, complete with backoff transitions. The option _order_ specifies the maximum order n-gram to count, and the utility counts all n-gram orders less than or equal to the parameterized maximum order. The option _--epsilon_as_backoff_ causes the counter to interpret _<epsilon>_ as a backoff transition while counting, which is only appropriate in very specialized circumstances (see caveats below). ---++ Usage |<verbatim> ngramcount [--options] [in.far [out.fst]] --order: type = int64, default = 3 --epsilon_as_backoff: type = bool, default = false </verbatim> | | |<verbatim> class NGramCounter(size_t order); </verbatim>| | ---++ Examples The default counts trigrams, bigrams and unigrams from an input corpus: <verbatim> ngramcount earnest.far >earnest.3g.cnts </verbatim> --- To count trigrams, bigrams and unigrams from a single FST using the library functions: <verbatim> NGramCounter<Log64Weight> ngram_counter(3); StdMutableFst *fst = StdMutableFst::Read("in.fst", true); ngram_counter.Count(*fst); VectorFst<StdArc> fst; ngram_counter.GetFst(&fst); fst.Write("out.fst"); </verbatim> ---++ Caveats Backoff transitions, labeled with _<epsilon>_, have weight One() in the semiring. By default, the count FSTs are in the tropical semiring, hence backoff weight is 0 and n-gram transitions have weight -log(count). The _--epsilon_as_backoff_ switch interprets _<epsilon>_ in the input fst archive as a backoff transition. This is only appropriate when the corpus is randomly sampled from a model and shows where backoff transitions were taken. It allows for the use of the _presmoothed_ method in _ngrammake_. These are not typical scenarios, hence these options should be used with care. ---++ References
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r6
<
r5
<
r4
<
r3
<
r2
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r5 - 2012-03-04
-
BrianRoark
GRM
Log In
or
Register
GRM Web
Create New Topic
Index
Search
Changes
Notifications
Statistics
Preferences
Webs
Contrib
FST
Forum
GRM
Kernel
Main
Sandbox
TWiki
Main
Copyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback