Last updated: September 6, 2019
Topic: BusinessConstruction
Sample donated:

Computers have become an integral part of scientific world. The real life problems, we face, are dealt with an algorithmic approach. Algorithms being independent of programming language, they can be developed using any natural spoken language that a person is comfortable with. However, the problem lies in implementing it. People who do not have a programming background often face difficult in writing an efficient code to implement their algorithm. For programming, however, the necessity of a formal programming language for communicating with a computer has always been taken for granted. In this project, we approach such problem by carrying out mapping at Semantic level using NLP and ontology. Then applying Ontology Matching techniques to derive an automatic translator of natural language problem statement into Artificial language (here Java). It look at a corpus of English descriptions used as programming assignments, and develop some techniques for mapping linguistic constructs onto program structures, which we refer to as programmatic semantics. It is believe that modern Natural Language Processing techniques can make possible the use of natural language to express (at least partially) programming ideas, thus drastically increasing the accessibility of programming to non-expert users. Overall, this is a knowledge based expert system which uses facts and rules to build solution.Keywords – Natural Language Processing, Ontology, SPARQL.Introduction:An expert system is a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning about knowledge, represented mainly as if-then rules rather than through conventional procedural code. Expert system is considered as an application of artificial intelligence. Here Interface can be a user interface which can be graphical as well as command line. This proposed system will have simple graphical user interface build using java swing.Next is Inference Engine which is a part of system which applies logical rules to the knowledge to extract information from database. In this system the inference engine will be in Python programming language using NLTK (Natural language toolkit) for applying syntax and semantic analysis on problem statement. NLTK has plenty of parsers and libraries to implement analysis of given natural language statement. It is also capable of generating a feature based grammar.Working memory is nothing but the facts which in turn will be used to build solution by applying logical rules. The output of this component can be considered as extracted intermediate solution for given problem. Domain knowledge is the large database which consists of all information related to problem statements. The proposed system will be represented in the form of ontologies. It represents all entities and their interactions about the domain. Many open source ontologies with different knowledge are available on internet. We are going to build an ontology related to java programming language syntax and libraries, which will then work as set of rules as well as will be used for extracting proper functions comparing ontology with domain ontology. Most of the time the both ontology will have different representation due to construction by different knowledge engineers. For that it will have to use heterogeneous entity based ontology matching techniques.The semantic gap between natural language and programming language can be overcome using natural language programming. It covers both descriptive as well as procedural programming aspects of programming. NLP can be used to identify steps, loops, conditions, etc. from natural language text, which can be converted into some intermediate programming code (cite reference 1 and reference 5). Same approach has been used to derive an automatic translator of natural language questions into their associated SQL queries by carrying out mapping at syntactic level and then applying machine learning algorithms like SVM and kernel functions (cite reference 2).NLP is branch of AI with goal of building system that analyze and generate language to reduce man-machine communication gap. NLP consist of many phases to analyze language such as Lexical analysis, syntactic analysis, semantic analysis, morphology, pragmatics, discourse, etc. These phases are implemented in the form of tools available open source namely NLTK, R, OpenNLP, LingPipe and some commercially available are SAS text analysis, SPSS tools, etc. In this system NLTK will be used. NLTK consist of multiple Corpus such as Brown, Webtext, Reuters, Inaugural and udhr. NLTK also consist of WordNet which is structured and semantically oriented English dictionary. Finally NLTK has various parser to get more accurate result.The system uses ontologies to represent knowledge, generally used by researchers for information retrieval. It is the best way to represent data, which is machine-interpretable as well human readable. Semantic gap between domains of natural language and programming language can be reduced using ontology matching between different domain ontologies. For English language, a well-developed ontology exists called as WordNet. Another domain i.e. programming/artificial language ontology has to build manually or by some automated method. Constructing ontology manually requires tool like protégé, OWL, etc. and to construct ontology automatically includes use of NLP and various machine learning algorithms (cite reference 4). SPARQL is query language used to retrieve data from ontology. Apache Jena is semantic web framework for java, consist of API for RDF graphs, ontology, SPARQL. It can be used to implement all possible functions like query processing and ontology matching on domain ontologies.All results has to be represented in some intermediate representation, graphical representation like tree gives more robust structure. Object oriented concepts and rules are easily mapped and verified using trees.The proposed system is more like natural language compiler but instead of directly generating machine code, here java code will be generated. Hence, all compiler phases are mandatory to be define and implemented to get actual result. All the technologies above will provide abstract analysis and intermediate representation of given natural language statement. This result needs to be converted into java code. For this purpose, the system will use modern compiler implemented in java (cite: compiler book). The similar system of automated generation from unstructured algorithm has been developed using technique inspired from microprocessor working (cite reference 3).