Separating DSL Semantics from Implementation

Article
09/09/2009

James Lapalme

December 2007

Revised February 2008

Summary: This article will discuss the need for separation between the semantics of an internal DSL and its implementation, and how to achieve it. (13 printed pages)

"All problems in computer science can be solved by another level of indirection (abstraction)."
–Butler Lampson

Introduction
A Simple Look at DSLs
Internal and External Domain-Specific Languages
A Quick Look at NMock2
Programming and Modeling
Separation of Concerns Between Modeling and Simulation
The Rules of the Game
A Simple Example
Implementing a Simulation Framework
Putting Everything Together
Benefits of the Approach
Conclusion
Acknowledgements
References

Introduction

Many people believe that by using Domain-Specific Languages (DSLs), it is possible to be more productive when developing applications, because of the higher level of abstraction and the richer semantics. Many people also have debated about the best approach to use when implementing a DSL; the two basic schools of thought are embedding (or internal) and stand-alone (or external).

The question now is, "Should an internal DSL be separated into two parts when implemented: (1) its semantics (or modeling semantics) and (2) the implementation of those semantics?" With Microsoft .NET Framework 2.0/C#, it is possible to create a clean separation of concerns between the semantics of an internal DSL and the implementation of those semantics. A clean separation can be achieved by creating internal DSLs that are based on a few simple rules. Moreover, because of the reflective capabilities of .NET Framework 2.0 (especially on generics), it becomes possible to bind a solution that is written with an internal DSL to an implementation for the DSL by using modern software patterns such as Inversion of Control (IoC) and Proxies.

A Simple Look at DSLs

"What is a DSL?" might be such a simple question, but the answer is, well, not so simple. No official and universally accepted definition exists for the moment. Because many people are still arguing over the correct answer, I propose to jump in and offer a definition of my own. Simply put, a DSL can be defined by the combination of semantics and syntax in order to provide a tool to define solutions for a specific group of problems. A solution that is expressed with a DSL defines the "what" of the solution, but not the "how" of its implementation.

Because nothing is better then a concrete example, let's look at the SQL language. The SQL language allows the definition of solutions to data problems that can be solved by using relational calculus, the backbone of modern databases. When we write a SQL program, we are basically expressing the solution of a certain data problem by using a certain number of standardized semantic and syntactical elements. However, our SQL program abstracts how the solution will be executed or implemented. The implementation is left to the relational database engine, which will compile our solution in order to create and execute a specific execution plan. The SQL programming language allows us to be more productive by permitting us to concentrate on the solution of a data problem, and not on the details of its implementation.

If one is ready to accept my simple DSL definition, DSLs are not a new concept; we have been creating them for many years now. We can view the various generations of programming languages as a stack of DSLs, in which each layer of the stack is the "how" of the next:

· First-generation programming languages (1GL), or binary languages—Allow the definition of solutions that are based on the semantics of a specific instruction-set architecture, while abstracting how the instruction set is implemented with transistors.

· Second-generation programming languages (2GL), or assembler languages—Allow the definition of solutions that are based on the semantics of a specific microarchitecture, while abstracting the encoding of the instructions

· Third-generation languages (3GL), or general-purpose programming languages (such as C and C#)—Allow the definition of solutions to problems that can be solved by algorithmic approaches, by abstracting the microarchitecture that will execute the solution

· Fourth-generation languages (4GL), such as SQL and Mathematica—Allow the definition of solution by using semantics that permit the declaration of the solution, without specifying the individual steps; hence, abstracting the algorithm nature of the solution implementation

· Fifth-generation languages (5GL), such as Prolog and Mercury—Allow the definition of a solution by specifying the constraints of the problem; hence, abstracting the solution definition altogether

Bb896746.11_07_DSL_Semantics_Fig01(en-us,MSDN.10).gif

Figure 1. DSL layering

Internal and External Domain-Specific Languages

Most people recognize two approaches to the definition of DSLs: external and internal. External DSLs are defined by creating the necessary semantic and syntax elements from scratch. After the language has been defined, it is necessary to create a compiler or an interpreter to implement the execution semantics of the language. A simple example of a language of this type would be HTML.

In contrast, internal DSLs are defined by using as a starting point the semantics and syntax of a host language, such as a (usually) general-purpose programming language. Hence, the host language becomes both a basis and a limit for the definition of the DSL. A simple example of a language of this type would be Ruby on Rails.

Many people have debated over the better approach between internal and external DSLs. However, internal DSLs are gaining a lot of popularity rapidly, because of strong Ruby, Smalltalk, and Lisp communities. These communities have demonstrated the effectiveness of their respective host languages to support elegant internal DSLs that can be developed rapidly. Despite all of the debate about internal versus external and which language is best suited for hosting, I have seen very little debate about the separation between the semantics of an internal DSL and its implementation. This article will discuss the need for this separation and how to achieve it.

A Quick Look at NMock2

Before discussing the core of this article, I want to take a look at and discuss a real internal DSL.

The official NMock Web site defines NMock as a "dynamic mock-object library for .NET." A Mock object is like a proxy for a type instance that does not delegate to the type instance, but instead impersonates it. Mock objects are very useful when testing software, because they allow the isolation of a software component from its dependencies in order to test the component independently of its dependencies.

Although the official site defines NMock as an object library, it should be viewed as an internal DSL that permits the definition of the behavior of a mock object, and I will make that case. The following is some code from the official cheat sheet:

1. Mockery mocks = new Mockery();
2. InterfaceToBeMocked aMock = mocks.NewMock();
3. Expect.Once.On(aMock).Method(...).With(...).Will(Return.Value(...));
4. Expect.Once.On(aMock).GetProperty(...).Will(Return.Value(...);
5. Expect.Once
6. Expect.Never
7. Expect.AtLeastOnce
8. Expect.AtLeast(<# times>)
9. Expect.Exactly(<# times>)
10. Expect.Between(<# times>, <# times>)
11. mocks.VerifyAllExpectationsHaveBeenMet();

The preceding lines of code are elements of the library. The NMock2 library is implemented as an object-oriented framework. Lines 1-2 are necessary to create the mock. Lines 3-10 are the method calls that are used to specify the behavior of the mock object. I want to emphasize the use of the word specify in the last line. Method calls, such as those on lines 3-10, enable developers to declare/define the behavior that they want the mock to exhibit, but they do NOT define the implementation of the behavior.

If we return to the definition of a solution expressed with a DSL that I proposed earlier—"A solution that is expressed with a DSL defines the 'what' of the solution, but not the 'how' of its implementation"—a program that used NMock2 would definitely fall into this category.

NMock2 uses the syntax and semantics of method calls of the .NET platform to define new semantic elements that are the specific method definitions in the framework.

Programming and Modeling

When we work on a problem, we often use tools such as UML to create models. We also use UML to model solutions to problems. Again, I want to emphasize the use of the word model. When we use UML (no matter the level of the diagrams), we consider that we are modeling a solution and not implementing the solution, because we are not writing code. Through mapping/interpretation, most modern UML tools will allow the generation of code for a specific technology. But, why are we modeling and not coding? Or, better yet, is there a difference between modeling and coding, or is it just a question of perspective?

I would like to point out two things. Firstly, if we replace the word model with define in the second sentence of the last paragraph we get: "We also use UML to define solutions to problems." It seems that UML would also be a DSL for defining solutions, while abstracting the final implementation that is usually a 3GL language. Secondly, it seems that a DSL allows us to program, specify, or model a solution—all three words being interchangeable.

If we accept the last two points, and accept that defining or programming a solution is, in fact, modeling a solution, the next question is: "What is the role of an interpreter or compiler towards a model?" One possible point of view is that both interpreters and compilers have the role of understanding a model that is defined with the semantics and syntax of a DSL. Then, they create a representation of that model that can be executed. The main difference between an interpreter and a compiler is that the interpreter executes the executable representation, while the compiler generates it but does not execute it. In some communities, the execution of a model is often referred to as the simulation of the model.

If we take a moment to summarize what was just discussed, we started the article by proposing a definition for DSLs. We then worked our way to creating a parallel between programming with a DSL and creating models. We then finished with the conclusion that executing programs that are written with DSLs can be viewed as simulating models.

So, if we come back to our discussion on NMock2 with our new point of view, we can state that creating an NMock2 program is, in fact, the activity of creating a model of a behavior that we want to define; and executing the program is, in fact, simulating that behavior. So, now the question is: "What does this have to do with this article?"

Separation of Concerns Between Modeling and Simulation

Frameworks such as NMock2 and Ruby on Rails actually define two things: a modeling language (or domain-specific language) and an infrastructure that is capable of simulating the modeling language. When working with external DSLs, there exists a clear separation of concerns between both elements: The modeling language is defined by a language specification (usually textual), and the simulation infrastructure that implements the semantics of the language is implemented in a compiler or interpreter. This separation of concerns is NOT present in most internal DSLs. The rest of this article will discuss how it is possible to use the .NET Framework in order to create pairs of modeling and simulation frameworks, to achieve internal DSLs that posses a clear separation of concerns between modeling and simulation aspects.

By separating the modeling and simulation aspects of an internal DSL, we are creating a true language, because no implementation details are present. However, it is clear that, to execute (simulate) a model that is created with a modeling framework, it is necessary to have the model interpreted by the associated simulation framework. Figure 2 and Figure 3 contract the ideal separation of concerns that we should try to achieve, to compare to typical approaches.

Bb896746.11_07_DSL_Semantics_Fig02(en-us,MSDN.10).gif

Figure 2. Typical separation

Bb896746.11_07_DSL_Semantics_Fig03(en-us,MSDN.10).gif

Figure 3. Ideal separation

The Rules of the Game

To keep a modeling framework "clean" of all simulation semantics and artifacts, it is necessary to eliminate all traces of implementation details. If a framework does not contain implementation aspects, it necessarily contains only semantic and syntactical elements.

Object-oriented platforms such as .NET offer a well-defined set of semantic elements, such as methods, properties, inheritance, interfaces, and so forth. The objective is to use the right combination of theses elements as a basis for defining a modeling framework. Because our objective is to eliminate all implementation artifacts, we must use only elements that by definition do not contain implementation, such as interfaces, abstract classes, abstract methods (and properties), inheritance, attributes, and class variable declarations (not instantiation). We can consider that the aforementioned elements do not contain implementation aspects, because they are by definition not executable; they do not add any behavioral information to the framework.

All other elements (such as static/virtual/instance members and class variable instantiations) must have implementation details. A modeling framework that is defined with only non-implementation-oriented semantic elements of the .NET platform will be inherently non-executable; hence, it will contain only the semantics of the modeling language that is represented by the framework. By our rules, a modeling-framework design basically defines a modeling-language specification—but in binary form, instead of textual.

NMock2 is almost designed according to theses rules; only a couple of slight modifications would be needed, such as making the Expect class abstract, and making all of the methods of the class protected abstract. To get access to the methods, a programmer would derive a class from the Expect class. In this context, the Expect class can be compared to a scope; the derive class in a model would be like a scope instantiation.

The modeling-framework design approach that we just discussed was successfully used in SoCML a software-/hardware-modeling framework. For more information about this DSL, see the "References" section of this article.

A Simple Example

To demonstrate the design approach and the separation of concerns that I propose, we will create a very simple DSL. All traditional programming languages have three basic concepts: scopes, operational semantics, and symbols. A scope permits the definition of an enclosing context. This context allows the declaration of identifiers, and determines the possible operational semantics that are available. Symbols are used to represent operational semantics in a concrete way in programs. For example, in a language such as C#, a method body is a scope, the "+" token is a symbol, and the operational semantics of "+" are either addition or concatenation.

Let's use the NMock2 API as a starting point for our language, which we will call NMock3. We first must create an implementation for the concept of a scope. Our scope will define a context in which the usage of an "Expect" operational semantic will be possible. The semantics of "Expect" will be the same as in NMock2.

To begin, we will create our scope. To do this, we will use an abstract class to create the definition of a scope, and use inheritance from this class to instantiate scopes. We will call our scope "MockingContext".

1. public class MockingContext{}

In this scope definition, we will make our "Expect" operational semantic available by defining an Expect method within the scope definition.

public class MockingContext{

protected something Expect(){body}

}

The preceding code has implemented our scope definition, our "Expect" operational semantic, and the "Expect" symbol. The preceding example also has the implementation of the "Expect" operational semantic. In the example, the something and body are left intentionally vague, because they are not important, for the moment.

Now, if we wanted to instantiate our scope and use the "Expect" operational semantic, we might do something like the following:

public class MyMockingScope: MockingContext{

           void MyTest(){
                 ...
                 Expect()...;
                 ...
           }
}

If we wanted to achieve a chaining of symbols, such as the following:

*** Expect.Once.On(aMock).Method(...).With(...).Will(Return.Value(...));

from the NMock2 API, we could use properties instead of methods to eliminate unwanted parentheses, and then use a chaining of compatible properties and return values to chain the symbols.

public class MockingContext{

           protected MockingContext Expect{get{body}}
           protected MockingContext Once{get{body}}
           protected MockingContext On(Mock mock){body}
           protected MockingContext Method(...){body}
           protected MockingContext With(...){body}

}

The preceding method definition would permit the same chaining of symbols as in NMock2. It would also permit a lot of unwanted chains, such as the following:

***Once.Expect.With(...)

The sequencing of the symbols (of the grammar) can be achieved by designing the types of the properties and the return values of the methods, to create constraints.

public abstract class ExpectContext{
           protected abstract OnceContext Expect{get;}
}

public interface OnceExpression{
           OnceContext Once{get;}
}

public interface OnExpression{
           WithExpression On(Mock mock);
}

public interface MethodExpression{
           WithExpression Method();
}

public interface WithExpression{
           void With{get;}
}

Our new API achieves two things: (1) It implements a basic grammar for the symbols by designing the return types of the properties, and (2) it eliminates all of the implementation of the operational semantics by using only interfaces and abstract classes. In our first design, the Expect method had an implementation, so that the framework combined both modeling and simulation (implementation) semantics. Because our new design does not have implementation, it achieves perfect separation of concerns between modeling and simulation aspects.

Implementing a Simulation Framework

Now that we have defined the rules of the game for our modeling language and we have a simple modeling language with which to work, the question becomes: "How do we put the implementation aspects back in?" There exist many modern software-design approaches and technologies that can come to our aid, such as inversion of control and reflection.

Class Instantiation Problem

The first implementation challenge is how to instantiate the implementation of the variable declaration that might be present in the modeled solution. The problem of instantiating the implementation of a variable of a given type is basically the problem of class instantiation. The software-design pattern that is called Inversion of Control (IoC) is a perfect fit for the problem.

IoC is a design pattern that enables the decoupling between types. A type instance that is designed according to IoC does not instantiate objects that fulfill its dependency needs. Instead, it delegates the instantiation responsibility to an execution environment, and consumes the dependency through an interface contract that the dependency instance implements. The execution environment—through the use of a defined dependency-need declaration between it and the requesting type or type instance—locates an implementation for each required dependency and instantiates it. After the implementation has been instantiated, the environment gives the requester access to it through another defined convention.

Figure 4 represents a typical UML diagram that depicts the players in an IoC scenario.

Bb896746.11_07_DSL_Semantics_Fig04(en-us,MSDN.10).gif

Figure 4. Inversion of Control

Inversion of Control with Reflection

The mainstream techniques that are used to implement IoC—such as construction injection, setter injection, and so forth—are usually satisfactory in the concept of business application, but they are not sufficiently transparent enough to be used in the context of a modeling framework. It would be necessary to "pollute" the modeling framework with IoC implementation mechanisms that have nothing to do with modeling with the DSL. An example of this "pollution" would be the use of superfluous setters when using setter injection; the setters would add value only to the IoC implementation. It is possible to use model analysis through reflection in order to act as a container. By using reflection to traverse the declaration hierarchy of a model recursively, it is possible to detect variable declaration, and then set those variables with the necessary implementations.

The .NET Framework 2.0 offers very flexible reflective capabilities when manipulating generic types—capabilities that are not present in Java and other versions of .NET. Because .NET generics are resolved at run time, through the use of reflection, it is possible to determine whether an object is an instance of a generic type, as well as the bound types of the generic instance. It is possible also to bind a generic type dynamically and create an instance of that binding. These capabilities enhance the ease of creating modeling and simulation frameworks.

Replacing Constructors

By not allowing a programmer to instantiate an instance variable, the problem of variable initialization arises. This can be solved simply by defining an abstract method in a base modeling class that will serve as an initialization method. The programmer will have to inherit from the base class an implementation of this method. Within the method, the programmer will be able to set the properties of its instance variables.

1. public abstact class Base{
2.   public abstract void Init();
3. }
4.
5. public abstract aClass : Base{
6.   AType instance;
7.
8.   public override void Init(){
9.    instance.AProperty = aValue;
10. }
11. }

The following section will address the problem of instantiating the instance variable, so I won't discuss it here. However, for the preceding approach to work, it is necessary for the instantiation mechanism to instantiate all variables, before calling recursively in a top-down manner the Init methods of the declaration hierarchy.

Abstract-Method Implementation Problem

The problem of implementing an abstract method or class without the user being aware is a problem that arises often in the context of distributed applications. In traditional distributed applications, a designer uses an object that impersonates a remote object. The responsibility of the impersonating object is to offer a simple interface to the user and marshal the calls to the remote object. The same basic technique can be used for an abstract-method implementation problem. The technique is based on the Proxy design pattern.

Combining IoC and Proxy Design Pattern

To solve the problem, we can use a combination of the IoC and the Proxy design pattern. By creating a class that inherits from a class that contains abstract methods, it is possible to implement the methods and assign variables of the original class type with instances of our proxy, because of polymorphism. The same approach is often used in aspect-oriented containers to intercept method calls and inject aspects.

We can achieve the creation of the proxy classes and instances by using the DynamicProxy.NET Framework that is distributed by the Castle Project. The framework supports the creation of proxy types for generic types. The DynamicProxy.NET Framework works by using reflection to analyze an abstract class or interface that is to be "proxied." The framework then emits the necessary IL code in order to create a proxy. All of the methods of the proxy are implemented by delegating the calls to an interceptor, which is provided by the code that requests the proxy. The implementation of the interceptor mechanism that is used by the framework resembles the one that is used by the CLR to intercept calls on context-bound objects.

Putting Everything Together

By using the techniques that we have just discussed, we can create an implementation for our NMock3 language by creating a Mocking engine class that is capable of:

· Taking a subclass of ExpectContext.

· Creating a dynamic subclass of the original subclass by using a DynamicProxy.NET-style tool.

· Intercepting the method calls to Expect, to implement the necessary operational semantics.

· Implementing all of the interface definitions of the API.

Our engine is basically an interpreter for the language; instead of consuming source code, it consumes the compile form of our program (or model) through the type class of our program—our subclass of ExpectContext. To create an executable representation of our model that is created with our NMock3 language, we must create a third program that instantiates our engine with the type class of our program—our subclass of ExpectContext : MyMockingScope.

Benefits of the Approach

The modeling/simulation framework approach has many subtle but very important benefits.

Possible Interception for Verification

The utilization of design patterns such as IoC and Proxy enables a simulation framework to create chains of interceptors that can monitor different aspects of a model under simulation, without having to add the monitoring elements in the model itself. The combination of a flexible framework design and the reflective capabilities of the .NET Framework offer many possibilities for the creation of simulation framework enhancements.

Transparency to Alternative Simulation Implantation

Because all models that are created with the modeling framework are separated from the implementation of the simulation framework, it becomes possible to use a model with different simulators in order to take advantage of alternative implementations. An alternative implementation scenario could be as simple as a new version of the same simulation framework. It would be nice to be able to take compiled NMock2 programs and use them with another, better implementation of the NMock2 frameworks without recompiling. This might sound fairly trivial, because of the versioning support of the .NET Framework; but other platforms of language such as Java and Ruby do not have this support.

Conclusion

So, what was this article about? My first objective was to give a simple but clear definition of the concept of domain-specific languages, and then demonstrate that we have been using them for a long time. My second objective was to propose a new perspective on the relationship between programming with a domain-specific language and modeling. My third objective was to make the case for the separation of modeling and execution aspects in current internal domain-specification languages based on object-oriented frameworks. My final objective was to demonstrate an approach to the design and implementation of modeling and execution frameworks based on separation of concerns.

Acknowledgements

I would like to thank Christian Dubé and Gia-Hao Banh of PSP Investments for their invaluable help and feedback on the material in this article.

References

Fowler, Martin. "Language Workbenches: The Killer-App for Domain Specific Languages?" MartinFowler.com, June, 2005.

Fowler, Martin. "Inversion of Control Containers and the Dependency Injection Pattern." MartinFowler.com, January, 2004.

Freeman, Steve, and Nat Pryce. "Evolving an Embedded Domain-Specific Language in Java." International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), Portland, Oregon (U.S.), October 22-26, 2006.

Gamma, Erich, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented Software. Reading, MA: Addison-Wesley, 1995.

Lapalme, James, El Mostapha Aboulhamid, Gabriela Nicolescu, and F. Rousseau. "Separating Modeling and Simulation Aspects in Hardware/Software System Design." In Proceedings of the 18th IEEE International Conference on Microelectronics (ICM '06), Dhahran, Saudi Arabia, December 16-19, 2006. (6 pages)

Wikipedia, the Free Encyclopedia

About the author

James Lapalme currently works as a solution (Microsoft technologies) and enterprise-architecture consultant for CGI. He possesses a recognized expertise in the fields of system modeling and design. James has published a number of articles in international IEEE/ACM peer-reviewed conferences, and has been invited to present his research during international conferences. He has extensive knowledge of OO technologies and software engineering. Currently, James is a PhD candidate at l'Université de Montréal.