Streak2O: Data Augmentation for Handwritten Text Recognition in Neural Networks
Abstract
Streak2O is a machine learning data augmentation algorithm based on the combination of two other independent algorithms: Streak and Droplet. These three augmentations are implemented as non-trainable TensorFlow custom Keras layers to optimize execution time in a GPU based environment. They generate configurable random artifacts that imitate real life handwritten historical document or manuscript water damage and document mishandling. Testing this augmentation algorithm with small subsets of the NIST-SD19 dataset on a convolutional neural network architecture shows that they can help reduce neural network overfitting falling partially into the category of synthetic data generation.
Key Terms ⎯ Handwritten Text Recognition, Machine Learning, Synthetic Data Augmentation, TensorFlow.