Thursday, February 7, 2008

Code Generator Part I: Compiler, Interpreter and Code Generator

The evolution of computer programming languages started from assembly languages and traversed a long way through first, second, third and fourth generation. The clue of this advancement was mostly based on abstraction, meaning we always tried using a programming language to make the hardware understand human instructions. At the same there is a continuous effort to make our programming languages look good and similar to a human language (e.g. English). So every time a new language evolved it came along with an interpretation tool like a compiler or an interpreter. The role of an interpretation tool is basically to interpret the human like language to something that the machine can understand. For example if I write a computer program to make the computer print something on the screen – “Hello World”. The program might look like - Print(“Hello World”); The interpretation tool transforms the program above in a binary code like “!#$233#4@36644!~~2%4^ Hello$@@$63World#&&^##%” (believe me it’s a pure example, not an abusive language and has got no link to an actual machine/binary code). Finally the hardware understands this binary code and generates signals to print out the “Hello World” on the monitor.

This is a very basic instance of code generation – from a human like programming language to machine code. Nowadays there nice and beautiful graphic interfacing tools are available that lets you create a document (e.g. Word Processors), draw a picture (Drawing tools), send a mail (mailing software), browse the internet (browsers) and even designing a user interface graphically using mouse can be done by tools. They all use code generation to save and read documents or pictures, interpreting the mail or web pages and showing them in nice graphical interfaces, generating codes for user interfaces that can be executed by the machine later after designing.

In the internet age, the most popular way of expressions and communications are being done using the markup languages like HTML, XML, WML, SGML, SVG etc. The reason behind the popularity of these languages is that they are English text based and easily explicable to the common people and other software systems. The languages are very helpful in communicating in the global community of heterogeneous software systems.
Think of a scenario where my program which is built using a latest technology (e.g. C++/Java, J2EE .Net etc.) is trying to talk with another program which is built using COBOL running on the other part of the world, we need a language or protocol of communication that can be understood by both of them. This is where the markup languages could help a lot. My program can interpret its data to a XML text and send it to the other program while it can translate the data back from XML to a format that it can understand. The other program can reply to my program in a similar fashion.

I will try to explain the Architecture of a typical Code Generator in the following posts.

No comments: