Class MakeKneserNeyArpaFromText

java.lang.Object
edu.berkeley.nlp.lm.io.MakeKneserNeyArpaFromText

public class MakeKneserNeyArpaFromText extends Object
Estimates a Kneser-Ney language model from raw text, and writes the language model out in ARPA-format. This is meant to closely resemble the functionality of SRILM's ngram-count -text <text file> -ukndiscount -lm <outputfile>) , with two main exceptions:
(a) rather than calculating the discount for each n-gram order from counts, we use a constant discount of 0.75 for all orders
(b) Count thresholding is currently not implemented (SRILM by default thresholds counts for n-grams with n > 3).

Note that if the input/output files have a .gz suffix, they will be unzipped/zipped as necessary. If no input files or given (or "-" is specified), lines will be read from standard input.

Author:
adampauls
  • Constructor Details

    • MakeKneserNeyArpaFromText

      public MakeKneserNeyArpaFromText()
  • Method Details

    • main

      public static void main(String[] argv)