In spite ofcurrent technological advances, there are not still algorithms allowing acomputer to transcript the content of any “difficult” handwritten document(e.g. a historical document). The general handwriting recognition problem presentsmany difficulties produced by interpersonal and intrapersonal variations whenwriting, the cursive nature of handwriting, the use of different pen types orthe presence of paper with noisy background. It has been studied and determinedwith scientific rigor the individuality of handwriting. Regarding thehandwriting recognition problem, there are two variants: offline and onlinerecognition. The offline problem consists in recognizing handwritten textthat has previously been written on paper, and then digitized.
The onlinehandwriting problem aims to recognize the text that was written using some kindof electronic device. The sensors of this device also record a set of dynamicmeasures about how the act of writing is produced (e.g.
writing pressure, penaltitude and azimuth, among others). In recent years, there has been moreprogress on the online modality but the offline one is still far to be solvedin an unrestricted manner.Psychology can also get benefits from research onhandwriting style since it could be possible to identify correlations betweenthe handwriting and some personality attributes of the writer.
In the field ofHuman-Computer Interaction, if gender of a user can be automatically predicted,the computer applications could offer him/her a more personalized interaction(e.g. gender-oriented advertising). Biometric Security can also benefit fromhandwriting prediction since this fact can be combined with other biometricmodalities in order to improve security when accessing computer systems.These handwriting-based prediction problems includegender, handedness, age ranges or even nationality of a person.
This group ofsupervised learning problems can be considered as binary or multi-class ones.The most common binary problems are gender prediction (where handwriting textscan be classified as written by men or by women), and handedness prediction(where handwriting texts can be classified as produced by right-handed or byleft-handed writers). Among the multi-class problems, one can discriminateamong texts written by people included in different age intervals, in specifichuman races or even in groups of nationalities. A property of all theseproblems is that they can be either balanced (i.e. where approximately half ofthe population belong to each class) as in the case of gender classification,or they can be unbalanced as it is the case of the handedness classification.
In general, these demographic classification problems are very complex, evenfor humans, since it is quite difficult to find which handwriting featuresproperly characterize each involved class. An example of this occurs in theclassification of gender. Although it is accepted that feminine writing isrounder and neater than masculine one, there are some cases where masculinewriting may have a “feminine” appearance and vice versa. In thispaper, we additionally aim to analyze the relationships between the genderhandwriting features. Related Work-There are relatively few works in the literature on theseproblems which have been started to be investigated recently in an automaticform. One important difficulty is that there are few handwriting databases withannotated demographic information of the writers. Other aspects that hinderthis problem are similar to those presented by the general handwritingrecognition problem (e.
g. cursive features). Neural networks have been applied for many years in theanalysis of high-dimensional, nonlinear and complex classification problems, asit is the case of automatic handwriting recognition. The handwriting problemhas been investigated since many years using different types of NN for bothonline and offline cases, and even also for alphabets different from Latin. Two main situations can be distinguished in the automaticoffline handwriting recognition of text. First, the recognition of isolatedcharacters which is actually solved with error rates lower than 1%. Second, therecognition of groups of connected characters (e.
g. words or text patches),where the success rates are still far from this value. Traditionally,continuous handwriting recognition from digitized documents followed a sequenceof stages including: preprocessing, segmentation, feature extraction andclassification. Handwritten character segmentation is a particularly complexproblem because it is sometimes impossible to determine where one letter endsand where the next one begins. To overcome this difficulty, holistic methodshave been recently proposed, which handle each word as a whole.
These solutionswere usually based on Hidden Markov Models (HMM) or Neural Networks (NN). Inrecent years, this has changed with the emergence of algorithms that allowtraining deep networks presenting multiple hidden layers which are able toextract more complex and relevant features. Since each hidden layer computes anon-linear transformation of the previous layer, a deep network can havesignificantly greater representational capacity (i.e.
it can learn more complexfunctions) than a shallow network.In a 2015 survey, M. Patel and S. Thakkar Patel2015 pointedout that a 100% success rate is still far behind in the problem of continuoushandwriting recognition.
Holistic methods eliminate the need to perform complexsegmentation tasks on handwriting. In 2016, Bluche and his colleagues presenteda system that uses a modification of a Long Short-Term Memory (LSTM) neuralnetwork that performs the processing and recognition of complete paragraphs.However, these methods limit the vocabulary that may appear in the text. Forthis reason, only good recognition results are obtained in cases of limitedvocabularies. To break this line of reduced vocabularies, some authors aresuccessfully employing recurring networks such as Connectionist TemporalClassification (CTC). Among automatedanalysis of handwriting, a system for demographic classification of individualsis presented in 2 where the authors predict age group, gender and handedness ofthe writer with an average classification rate of around 70%. Liwicki et al.
18 extracted a set of online and offline features to predict gender and handednessfrom online handwriting samples. Classification is carried out using SupportVector Machine (SVM) and Gaussian Mixture Models (GMM) and classification ratesof 67% and 85% are realized for gender and handedness prediction respectivelyon a database of 200 writers. In another study 19, authors propose the combinationof Fourier descriptors with tangent and curvature information and bendingenergy, to classify gender from handwritten samples. Likewise, Siddiqi et al.
20 compute a set of global and local features capturing information on thecurvature, slant, texture and legibility of writing. These features are used totrain two classifiers, Support Vector Machine and Artificial Neural Network. Resultsof the study are reported on QUWI and MSHD databases reading classificationrates from 68% to 74%. In anotherrecent study, geometric features are exploited to characterize the gender, agegroup and handedness of writers. For classification, the authors employ randomforests and kernel discriminant analysis.
Evaluations are carried out thewriting samples in the QUWI database in text-independent as well astext-dependent mode and classification rates of up to 74% are reported. Inanother study, a dimensionality reduction scheme is proposed and is evaluatedon handedness detection from handwriting. The authors conclude that more than30% reduction in dimensionality of feature vector is realized while maintaininghigh classification rates. Bouadjenek et al. employed the Histogram of OrientedGradients (HOG) and Local Binary Patterns (LBP) with SVM classifier to detectgender from handwriting.
Evaluations on 200 writers of the IAM-on DB databaserealized a classification rate of 74%. The same system was extended to classifygender, age and handedness from handwriting and was evaluated on the databases.In addition to HOG, the authors also investigated the effectiveness of gradientlocal binary patterns (GLBP) for characterizing gender from handwriting.Mostrecent works present results for more than one demographic problem usinghandwriting (e.g. they separately handle both gender and handedness problems).Other recent papers additionally include some multi-class problems like age rangeprediction and nationality.