DCGM · stweil · Jun 1, 2022 · Jun 1, 2022 · Jun 1, 2022 · Jun 1, 2022
diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # pero-enhance
 
-Tool for text-guided textual document scan quality enhancement. The method works on lines of text that can be input through a PAGE XML or detected automatically by a buil-in OCR. By using text input along with the image, the results can be correctly readable even with parts of the original text missing or severly degraded in the source image. The tool includes functionality for cropping the text lines, processing them with our provided  models for either text enhancement and inpainting, and for blending the enhanced text lines back into the source document image. We currently provide models for OCR and enhancement of czech newspapers optimized for low-quality scans from micro-films.
+Tool for text-guided textual document scan quality enhancement. The method works on lines of text that can be input through a PAGE XML or detected automatically by a built-in OCR. By using text input along with the image, the results can be correctly readable even with parts of the original text missing or severely degraded in the source image. The tool includes functionality for cropping the text lines, processing them with our provided  models for either text enhancement and inpainting, and for blending the enhanced text lines back into the source document image. We currently provide models for OCR and enhancement of czech newspapers optimized for low-quality scans from micro-films.
 
 This package can be used as a standalone commandline tool to process document pages in bulk. Alternatively, the package provides a python class that can be integrated in third-party software.
 
@@ -11,9 +11,9 @@ The method is based on Generative Adversarial Neural Networks (GAN) that are tra
 ## Installation
 The module requires python 3 and CUDA capable GPU.
 
-Clone the repository (which clones pero-ocr as submodule) and add the pero_enhance and pero_ocr package to your `PYTHONPATH`:
+Clone the repository (which clones pero-ocr as submodule) and add the pero-enhance and pero-ocr package to your `PYTHONPATH`:
 ```
-clone --recursive https://github.com/DCGM/pero-enhance.git
+git clone --recursive https://github.com/DCGM/pero-enhance.git
 cd pero-enhance
 export PYTHONPATH=/abs/path/to/repo/pero-enhance:/abs/path/to/repo/pero-enhance/pero-ocr:$PYTHONPATH
-export PYTHONPATH=/abs/path/to/repo/pero-enhance:/abs/path/to/repo/pero-enhance/pero-ocr:$PYTHONPATH
+export PYTHONPATH=$PWD:$PWD/pero-ocr:$PYTHONPATH
-export PYTHONPATH=/abs/path/to/repo/pero-enhance:/abs/path/to/repo/pero-enhance/pero-ocr:$PYTHONPATH
+export PYTHONPATH=$PWD:$PWD/pero-ocr:$PYTHONPATH
 ```
@@ -33,15 +33,15 @@ Images in a folder can be enhanced by running following:
 ```
 python repair_page.py -i ../example/ -x ../example/ -o /path/to/outputs
 ```
-The above command runs OCR, stores the OCR output in ./example/, and stores the enhance images in /path/to/outputs. The generated OCR Page XML files can be manualy revised if the OCR quality is not satisfactory, and the command can be repeated to use these changes for better image enhancement.
+The above command runs OCR, stores the OCR output in ./example/, and stores the enhance images in /path/to/outputs. The generated OCR Page XML files can be manually revised if the OCR quality is not satisfactory, and the command can be repeated to use these changes for better image enhancement.
 
 Alternatively, you can run interactive demo by running the following, where the xml file is optional:
 ```
 python demo.py -i ../example/82f4ac84-6f1e-43ba-b1d5-e2b28d69508d.jpg -x ../example/82f4ac84-6f1e-43ba-b1d5-e2b28d69508d.xml
 ```
 When Page XML file is not provided, automatic text detection and OCR is done using `PageParser` from the pero-ocr package. 
 
-The commands use by default models and settings optimized for czech newspapers downloaded during instalation. The models can be changed Different models for enhancement can be specified by `-r /path/to/enhancement-model/repair_engine.json` and OCR models by `-p /path/to/ocr-model/config.ini`. 
+The commands use by default models and settings optimized for czech newspapers downloaded during installation. The models can be changed Different models for enhancement can be specified by `-r /path/to/enhancement-model/repair_engine.json` and OCR models by `-p /path/to/ocr-model/config.ini`.
 
 ### EngineRepairCNN class
 In your code, you can directly use the EngineRepairCNN class to enhance individual text line images normalized to height of 32 pixels or of whole page images when the content is  defined by pero.layout class. The processed images should have three channels represented as numpy arrays.

diff --git a/training/ocr_engine/line_ocr_engine.py b/training/ocr_engine/line_ocr_engine.py
@@ -72,7 +72,7 @@ def process_lines(self, lines):
             if line.shape[0] == self.line_px_height:
                 ValueError("Line height needs to be {} for this ocr network and is {} instead.".format(self.line_px_height, line.shape[0]))
             if line.shape[2] == 3:
-                ValueError("Line crops need three color channes, but this one has {}.".format(line.shape[2]))
+                ValueError("Line crops need three color channels, but this one has {}.".format(line.shape[2]))
 
         all_transcriptions = [None]*len(lines)
         all_logits = [None]*len(lines)

diff --git a/training/transformer/compute_bleu.py b/training/transformer/compute_bleu.py
@@ -65,7 +65,7 @@ def bleu_tokenize(string):
   except when a punctuation is preceded and followed by a digit
   (e.g. a comma/dot as a thousand/decimal separator).
 
-  Note that a numer (e.g. a year) followed by a dot at the end of sentence
+  Note that a number (e.g. a year) followed by a dot at the end of sentence
   is NOT tokenized,
   i.e. the dot stays with the number because `s/(\p{P})(\P{N})/ $1 $2/g`
   does not match this case (unless we add a space after each sentence).

diff --git a/training/transformer/data_download.py b/training/transformer/data_download.py
@@ -83,7 +83,7 @@
 _TARGET_THRESHOLD = 327  # Accept vocabulary if size is within this threshold
 VOCAB_FILE = "vocab.ende.%d" % _TARGET_VOCAB_SIZE
 
-# Strings to inclue in the generated files.
+# Strings to include in the generated files.
 _PREFIX = "wmt32k"
 _TRAIN_TAG = "train"
 _EVAL_TAG = "dev"  # Following WMT and Tensor2Tensor conventions, in which the

diff --git a/training/transformer/model/beam_search.py b/training/transformer/model/beam_search.py
@@ -521,7 +521,7 @@ def _gather_beams(nested, beam_indices, batch_size, new_beam_size):
     Nested structure containing tensors with shape
       [batch_size, new_beam_size, ...]
   """
-  # Computes the i'th coodinate that contains the batch index for gather_nd.
+  # Computes the i'th coordinate that contains the batch index for gather_nd.
   # Batch pos is a tensor like [[0,0,0,0,],[1,1,1,1],..].
   batch_pos = tf.range(batch_size * new_beam_size) // new_beam_size
   batch_pos = tf.reshape(batch_pos, [batch_size, new_beam_size])

diff --git a/training/transformer/model/embedding_layer.py b/training/transformer/model/embedding_layer.py
@@ -32,7 +32,7 @@ def __init__(self, vocab_size, hidden_size, method="gather"):
       hidden_size: Dimensionality of the embedding. (Typically 512 or 1024)
       method: Strategy for performing embedding lookup. "gather" uses tf.gather
         which performs well on CPUs and GPUs, but very poorly on TPUs. "matmul"
-        one-hot encodes the indicies and formulates the embedding as a sparse
+        one-hot encodes the indices and formulates the embedding as a sparse
         matrix multiplication. The matmul formulation is wasteful as it does
         extra work, however matrix multiplication is very fast on TPUs which
         makes "matmul" considerably faster than "gather" on TPUs.

diff --git a/training/transformer/model/transformer.py b/training/transformer/model/transformer.py
@@ -40,7 +40,7 @@ class Transformer(object):
   Implemented as described in: https://arxiv.org/pdf/1706.03762.pdf
 
   The Transformer model consists of an encoder and decoder. The input is an int
-  sequence (or a batch of sequences). The encoder produces a continous
+  sequence (or a batch of sequences). The encoder produces a continuous
   representation, and the decoder uses the encoder output to generate
   probabilities for the output sequence.
   """

diff --git a/training/transformer/utils/tokenizer.py b/training/transformer/utils/tokenizer.py
@@ -498,7 +498,7 @@ def _gen_new_subtoken_list(
     subtoken_counts, min_count, alphabet, reserved_tokens=None):
   """Generate candidate subtokens ordered by count, and new max subtoken length.
 
-  Add subtokens to the candiate list in order of length (longest subtokens
+  Add subtokens to the candidate list in order of length (longest subtokens
   first). When a subtoken is added, the counts of each of its prefixes are
   decreased. Prefixes that don't appear much outside the subtoken are not added
   to the candidate list.
@@ -516,7 +516,7 @@ def _gen_new_subtoken_list(
 
   Args:
     subtoken_counts: defaultdict mapping str subtokens to int counts
-    min_count: int minumum count requirement for subtokens
+    min_count: int minimum count requirement for subtokens
     alphabet: set of characters. Each character is added to the subtoken list to
       guarantee that all tokens can be encoded.
     reserved_tokens: list of tokens that will be added to the beginning of the