In spite of
current technological advances, there are not still algorithms allowing a
computer to transcript the content of any “difficult” handwritten document
(e.g. a historical document). The general handwriting recognition problem presents
many difficulties produced by interpersonal and intrapersonal variations when
writing, the cursive nature of handwriting, the use of different pen types or
the presence of paper with noisy background. It has been studied and determined
with scientific rigor the individuality of handwriting. Regarding the
handwriting recognition problem, there are two variants: offline and online
recognition. The offline problem consists in recognizing handwritten text
that has previously been written on paper, and then digitized. The online
handwriting problem aims to recognize the text that was written using some kind
of electronic device. The sensors of this device also record a set of dynamic
measures about how the act of writing is produced (e.g. writing pressure, pen
altitude and azimuth, among others). In recent years, there has been more
progress on the online modality but the offline one is still far to be solved
in an unrestricted manner.
Psychology can also get benefits from research on
handwriting style since it could be possible to identify correlations between
the handwriting and some personality attributes of the writer. In the field of
Human-Computer Interaction, if gender of a user can be automatically predicted,
the computer applications could offer him/her a more personalized interaction
(e.g. gender-oriented advertising). Biometric Security can also benefit from
handwriting prediction since this fact can be combined with other biometric
modalities in order to improve security when accessing computer systems.
These handwriting-based prediction problems include
gender, handedness, age ranges or even nationality of a person. This group of
supervised learning problems can be considered as binary or multi-class ones.
The most common binary problems are gender prediction (where handwriting texts
can be classified as written by men or by women), and handedness prediction
(where handwriting texts can be classified as produced by right-handed or by
left-handed writers). Among the multi-class problems, one can discriminate
among texts written by people included in different age intervals, in specific
human races or even in groups of nationalities. A property of all these
problems is that they can be either balanced (i.e. where approximately half of
the population belong to each class) as in the case of gender classification,
or they can be unbalanced as it is the case of the handedness classification.
In general, these demographic classification problems are very complex, even
for humans, since it is quite difficult to find which handwriting features
properly characterize each involved class. An example of this occurs in the
classification of gender. Although it is accepted that feminine writing is
rounder and neater than masculine one, there are some cases where masculine
writing may have a “feminine” appearance and vice versa. In this
paper, we additionally aim to analyze the relationships between the gender
There are relatively few works in the literature on these
problems which have been started to be investigated recently in an automatic
form. One important difficulty is that there are few handwriting databases with
annotated demographic information of the writers. Other aspects that hinder
this problem are similar to those presented by the general handwriting
recognition problem (e.g. cursive features).
Neural networks have been applied for many years in the
analysis of high-dimensional, nonlinear and complex classification problems, as
it is the case of automatic handwriting recognition. The handwriting problem
has been investigated since many years using different types of NN for both
online and offline cases, and even also for alphabets different from Latin.
Two main situations can be distinguished in the automatic
offline handwriting recognition of text. First, the recognition of isolated
characters which is actually solved with error rates lower than 1%. Second, the
recognition of groups of connected characters (e.g. words or text patches),
where the success rates are still far from this value. Traditionally,
continuous handwriting recognition from digitized documents followed a sequence
of stages including: preprocessing, segmentation, feature extraction and
classification. Handwritten character segmentation is a particularly complex
problem because it is sometimes impossible to determine where one letter ends
and where the next one begins. To overcome this difficulty, holistic methods
have been recently proposed, which handle each word as a whole. These solutions
were usually based on Hidden Markov Models (HMM) or Neural Networks (NN). In
recent years, this has changed with the emergence of algorithms that allow
training deep networks presenting multiple hidden layers which are able to
extract more complex and relevant features. Since each hidden layer computes a
non-linear transformation of the previous layer, a deep network can have
significantly greater representational capacity (i.e. it can learn more complex
functions) than a shallow network.
In a 2015 survey, M. Patel and S. Thakkar Patel2015 pointed
out that a 100% success rate is still far behind in the problem of continuous
handwriting recognition. Holistic methods eliminate the need to perform complex
segmentation tasks on handwriting. In 2016, Bluche and his colleagues presented
a system that uses a modification of a Long Short-Term Memory (LSTM) neural
network that performs the processing and recognition of complete paragraphs.
However, these methods limit the vocabulary that may appear in the text. For
this reason, only good recognition results are obtained in cases of limited
vocabularies. To break this line of reduced vocabularies, some authors are
successfully employing recurring networks such as Connectionist Temporal
analysis of handwriting, a system for demographic classification of individuals
is presented in 2 where the authors predict age group, gender and handedness of
the writer with an average classification rate of around 70%. Liwicki et al.
18 extracted a set of online and offline features to predict gender and handedness
from online handwriting samples. Classification is carried out using Support
Vector Machine (SVM) and Gaussian Mixture Models (GMM) and classification rates
of 67% and 85% are realized for gender and handedness prediction respectively
on a database of 200 writers. In another study 19, authors propose the combination
of Fourier descriptors with tangent and curvature information and bending
energy, to classify gender from handwritten samples. Likewise, Siddiqi et al.
20 compute a set of global and local features capturing information on the
curvature, slant, texture and legibility of writing. These features are used to
train two classifiers, Support Vector Machine and Artificial Neural Network. Results
of the study are reported on QUWI and MSHD databases reading classification
rates from 68% to 74%.
recent study, geometric features are exploited to characterize the gender, age
group and handedness of writers. For classification, the authors employ random
forests and kernel discriminant analysis. Evaluations are carried out the
writing samples in the QUWI database in text-independent as well as
text-dependent mode and classification rates of up to 74% are reported. In
another study, a dimensionality reduction scheme is proposed and is evaluated
on handedness detection from handwriting. The authors conclude that more than
30% reduction in dimensionality of feature vector is realized while maintaining
high classification rates. Bouadjenek et al. employed the Histogram of Oriented
Gradients (HOG) and Local Binary Patterns (LBP) with SVM classifier to detect
gender from handwriting. Evaluations on 200 writers of the IAM-on DB database
realized a classification rate of 74%. The same system was extended to classify
gender, age and handedness from handwriting and was evaluated on the databases.
In addition to HOG, the authors also investigated the effectiveness of gradient
local binary patterns (GLBP) for characterizing gender from handwriting.
recent works present results for more than one demographic problem using
handwriting (e.g. they separately handle both gender and handedness problems).
Other recent papers additionally include some multi-class problems like age range
prediction and nationality.