Recently I found myself looking for a way to parse through the source code syntax of several languages. Rather than write my own syntax parser, I began looking for an existing open source solution. The first option I came across was JavaCC (https://javacc.dev.java.net/), which has a lot of documentation regarding how to install it, create grammars, create parsing code for grammars, but not really much on how to use it. Also according to the JavaCC FAQ:
JavaCC does not automate the building of trees (or any other specific parser output), although there are at least two tree building tools JJTree and JTB (see Chapter 6.) based on JavaCC, and building trees "by hand" with a JavaCC based parser is easy. (http://www.engr.mun.ca/~theo/JavaCC-FAQ/javacc-faq-moz.htm)
While searching for an easy way to for building trees "by hand" using JavaCC I came across another tool called ANTLR. ANTLR, ANother Tool for Language Recognition, (formerly PCCTS) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing Java, C#, Python, or C++ actions (http://www.antlr.org/about.html). With an abundance of grammars (http://www.antlr.org/grammar/list) and articles (http://www.antlr.org/article/list) it was pretty easy to get what I wanted working fast.
My specific example is a program that parses Java 1.5 source code for is imports, its "has a" relationships, its "is a" relationships, and the classes that it realizes (implements). Here are the steps to getting it to work:
- Download and install ANTLR (version 2.7.7) from http://www.antlr.org/download.html
- Download the Java 1.5 grammar for ANTLR 2.7.7 by Michael Studman's located at http://www.antlr.org/grammar/1090713067533/index.html.
- If running on Windows, add the location of ANTLR to the CLASSPATH environment variable. For example my library is located in "C:\antlr\277\lib\antlr.jar".
- Open a command window and navigate to the location of the grammar file, for example I put the Java 1.5 grammar in "C:\antlr\277\examples\java15-grammar".
- Run the command "java antlr.Tool java15.tree.g java15.g" assuming you are
using Michael Studman's Java 1.5 grammar. This will generate the source code for
the parser to be included in your project:
- Create a Java project using the above files along with the ANTLR JAR, and then use the following as the main class:
Using the following the test file UMLHelper.java as the input to the above program:
The following output is generated:
Full Class Name: a.b.c.umlhelper.UMLHelper Imports: umlhelper.service.ServiceInterface umlhelper.gui.MainFrameInterface umlhelper.gui.MainFrame umlhelper.service.Service java.io.File a.b.c.d.E Extends (Is A): AB Has A: MainFrame Service ArrayList
The following tree is displayed: