Thursday, February 14, 2008

Code Generator Part III: Code Generation using MDA

Code Generation using MDA mainly involves three phases.
  • Parsing the code that is supplied as the input. This is known as import.

  • Collecting the theme or model of a language from the input, known as internal modeling.

  • Generating the theme in a form of code that can be understood by the target, known as export. The target might be hardware or another software interpreter.

Based on the functionality the architecture of a typical Code Generator can be perceived as below. A code generator can be imagined of being consisted of three different modules / layers as shown in (Fig 2):

Fig2

Importer – This module reads the model as input and translates it into a platform-independent internal format based on an object model. An importer could be seen as a sort of XML parser. But, by generalizing the design of an Importer any kind of input can be read.

Internal Object Model (IOM) – This is the platform-independent internal format of the input model and could be considered as the core of the code-generator architecture. IOM contains a set of classes that make it easy to manipulate the information coming from the model to generate outputs. The structure of the IOM is a very important issue— if designed carefully it can be a powerful target-technology (Java, C++, etc.), independent representation that can be converted easily in source code.

Exporter – This module accesses the IOM and takes the relevant information to generate code. It could use templates, which drives the generation process and makes the exporter more generic and independent of the output language syntax. For the current version code generator, the exporter has been designed without using the Template-Based approach.

The architecture described above has following benefits for code generation:

  • The input is completely independent of the output in terms of technologies. For instance, an importer can be created that reads a UML diagram and then an exporter writing C#-ADO code and another one writing Java-JDBC code, but both work with the same UML model as input.

  • If the IOM is implemented following the UML, everything that can be represented with UML can be done with IOM as well, which means that every kind of input can be represented and converted into a real object-oriented design. It is a big advantage in writing importers and exporters manipulating the model. The IOM layer adds flexibility to the code generator and introduces loose coupling between the importer and the exporter.

  • By implementing multiple exporters and applying a simple pattern, different layers of code can be created—source code, descriptors, documentation, scripts, and so on—with a single pass that assures consistency and synchronization among the different outputs.

Tuesday, February 12, 2008

Code Generator Part II: Architecture of a typical Code Generator

A code generator is a program that takes a model as an input and produces an output source code, by implementing that model. The model consists of a set of metadata containing information about the code that will be generated. Based on the requirement a model can be a Unified Modeling Language (UML) design, a proprietary descriptor file, a database schema or even another source code. The model format can be XML, plain text, CSV, or other kinds of sources such as directories, databases, or repositories. Starting from the information contained in the model, a code generator creates source code, which belongs to a programming language (C, C++, Java, C#, VB, and so on) or to another type of output: documentation, descriptors, configuration files, SQL code, and so on (Fig 1).




Fig 1

Depending on the type of input, code generators can be classified in two types – code driven and model driven. A code-driven generator takes as input a file containing source code and special tags, which drive the code generation process e.g. JavaDoc. Model-driven generators can also be sub-classified into two types – custom and MDA. A custom generator takes as input a proprietary model representing the information that must be converted in source code e.g. Apache Velocity. When the model as input is a representation of UML, then the code generator follows the Model-Driven Architecture (MDA). An MDA code generator takes a platform-independent model (usually XMI, an XML representation of UML) as input and turns it into a platform-specific model, which can be converted easily in source code by means of templates.

In this article and the following articles I will try to explain a code generator that converts a set of object definitions (expressed in an
object-oriented paradigm i.e. UML) of simple Data Transfer Objects (DTO) from a XML text to a Java program. The same code generator can be enhanced to generate C++ programs, SQL scripts, PHP scripts from the same XML input. In the real world scenario, a programmer might not possess the skills in all the languages but he/she can be well conversant with the concepts of the object-oriented (OO) paradigm and is able to design an OO system. The concepts or design of a system can be stored in a widely popular common format e.g. XML. The utility of this tool is to facilitate the transformation of the concepts and design from the XML to a desired language. Most of Computer-aided Software Engineering (CASE) tools like Rational Rose etc. support this sort of code generation.

Thursday, February 7, 2008

Code Generator Part I: Compiler, Interpreter and Code Generator

The evolution of computer programming languages started from assembly languages and traversed a long way through first, second, third and fourth generation. The clue of this advancement was mostly based on abstraction, meaning we always tried using a programming language to make the hardware understand human instructions. At the same there is a continuous effort to make our programming languages look good and similar to a human language (e.g. English). So every time a new language evolved it came along with an interpretation tool like a compiler or an interpreter. The role of an interpretation tool is basically to interpret the human like language to something that the machine can understand. For example if I write a computer program to make the computer print something on the screen – “Hello World”. The program might look like - Print(“Hello World”); The interpretation tool transforms the program above in a binary code like “!#$233#4@36644!~~2%4^ Hello$@@$63World#&&^##%” (believe me it’s a pure example, not an abusive language and has got no link to an actual machine/binary code). Finally the hardware understands this binary code and generates signals to print out the “Hello World” on the monitor.

This is a very basic instance of code generation – from a human like programming language to machine code. Nowadays there nice and beautiful graphic interfacing tools are available that lets you create a document (e.g. Word Processors), draw a picture (Drawing tools), send a mail (mailing software), browse the internet (browsers) and even designing a user interface graphically using mouse can be done by tools. They all use code generation to save and read documents or pictures, interpreting the mail or web pages and showing them in nice graphical interfaces, generating codes for user interfaces that can be executed by the machine later after designing.

In the internet age, the most popular way of expressions and communications are being done using the markup languages like HTML, XML, WML, SGML, SVG etc. The reason behind the popularity of these languages is that they are English text based and easily explicable to the common people and other software systems. The languages are very helpful in communicating in the global community of heterogeneous software systems.
Think of a scenario where my program which is built using a latest technology (e.g. C++/Java, J2EE .Net etc.) is trying to talk with another program which is built using COBOL running on the other part of the world, we need a language or protocol of communication that can be understood by both of them. This is where the markup languages could help a lot. My program can interpret its data to a XML text and send it to the other program while it can translate the data back from XML to a format that it can understand. The other program can reply to my program in a similar fashion.

I will try to explain the Architecture of a typical Code Generator in the following posts.