INFORMATION explosion highlights the need for machines to better understand natural language texts.
In this paper,we focus on short texts which refer to texts with limited contextext. Many applications, such as web search and microblogging services etc., need to handle a large number of short texts. Obviously, a betterunderstanding of short texts will bring tremendous value. One of the most important tasks of text understanding is to discover hidden semantics from texts.
Many efforts have been devoted to this field. For instance, named entityrecognition (NER) 1, 2 locatesnamed entities in a text andclassifies them into predefined categories such as persons, organizations, locations, etc. Topic models 3, 4 attempt to recognize”latenttopics”, which are represented as probabilistic distributions on words, from a text. Entitylinking 5, 6, 7, 8focuses on retrieving “explicittopics” expressed as probabilistic distributions on an entire knowledgebase.
However, categories, “latent topics”, as well as “explicit topics” still have a semantic gapwith humans’ mental world. As stated in Psychologist Gregory Murphy’s highlyacclaimed book 9, “concepts are the glue that holdsour mental world together”. Therefore, we define short text understanding as to detect concepts mentioned in a short text.
Fig. 1 demonstratesa typical strategy for short text understanding which consists of three steps: 1) text segmentation- divide a short text into a collection of terms contained in a vocabulary (e.g., “book dis- neyland hotel california” is segmented as fbook disneyland hotel californiag); 2) type detection – determine the types of terms and recognize instances (e.g., “disneyland”and “california” are recognized as instances, while “book” is averb and “hotel” a concept); 3) concept labeling – infer the con- cept of each instance (e.
g., “disneyland” and “california” referto the concept theme park and state respectively). Overall,three concepts are detected from short text “book Disneyland hotel California” using this strategy, namely theme park, hotel, and state in Fig. 1.