5. Evaluation Approach
Whenever anyone says, "theoretically",
they really mean, "not really".
Dave Parnas

Because of the versatility of design patterns and the extensive human interaction required for utilisation, there is no straightforward way to benchmark the correlation between patterns and their implementations using test frameworks, simulations, or other automated tools. Ideally, applying design patterns require human interaction in all phases of the software development life—cycle, including in the final evaluation of the developed system. The evaluation approach defined here, however, focuses on the practical application of the "Gang of Four" patterns using a given language catalyst. This chapter defines and explains a simple evaluation approach that is independent of any given language. It can thus be used in similar evaluations with different language catalysts and perhaps different pattern systems providing they are described in the "Gang of Four" format. The goal of the evaluation is simply to implement all representative pattern functionality described in the Implementation and Sample Code elements for each pattern, if possible, using a single language. The evaluation outcome is then reported using a sub—set of the familiar "Gang of Four" format. Using Java 6 as the catalyst, this will allow us to perform a reasonably structured evaluation of the entire "Gang of Four" pattern system, because the individual implementations must be juxtaposed to identify common traits as well. We start by establishing the focus of the evaluation approach before we outline the approach itself. The approach requires both individual and collective evaluations of the "Gang of Four" patterns. Once the approach is defined, we use it to state the goals for this evaluation using Java 6 as the language catalyst, and we determine the language features that will be used.

5.1.  Focus
Design patterns are not an exact science. There is no mathematical way to deduce if a pattern is correct or not since it is based on empirical knowledge and experience, though several formalisation techniques have emerged within the last few years (see for example [Baroni03; Eden04; Taibi07]). The concept of patterns cannot exist without human interaction, as patterns are described and interpreted by humans. The idea of a pattern must be captured and described by the author ("what does it do?"); based on it, pattern behaviour and applicability may be inferred by the user ("how is it done?"), but the interpretation will be based on the user's point of view. Neither part can be excluded. It is hard to speculate upon, which part is easier to evaluate. Evaluating well—written pattern descriptions and/or implementations could be easier than evaluating pattern abstractions because well—written descriptions could be more tangible than the concept they describe. The reverse could also be true. The evaluation performed here does not evaluate the validity of the abstractions, merely practical issues encountered during application from our point of view. How a user views the pattern will affect the application of it, and only through implementation and testing in the given scenario can the desired behaviour be confirmed. Because of the human factor and the versatility of patterns, there is no straightforward way to benchmark patterns using test frameworks, simulations, or other benchmarking tools. To evaluate patterns is to implement them from a specific point of view, which is what the evaluation approach conveys. This implies that any evaluation of patterns will be subjective and that its conclusions must adhere to the initial point of view and interpretation. Hence, the goals of the evaluation can only make sense if the viewpoints used are established and explained.

Example 5.1 Example 5.1 To illustrate how different points of views can affect the evaluation, consider evaluating a car, say an Aston Martin, for some magazine from the point of view of a mechanic and from the point of view of the owner. The mechanic may approach the evaluation in a technical fashion, focusing on the design of the engine, e.g. engine performance. The evaluation could investigate different parts of the engine in turn, e.g. specific criteria, and comment on issues deemed relevant by the mechanic, as well as state a conclusion to the overall performance. The conclusion might be that the engine is inferior to a car in its price range. Furthermore, certain issues could be independent of the specific engine (and car), and related to the general design of a combustion engine. The owner of the car could instead evaluate the car based on its physical design compared to other cars, perhaps focusing on the front, rear, interior, etc. The subjective conclusion could be that the car is the most beautiful one. The result is two evaluations of the same car with completely different results, one negative, and one positive. For others to use the evaluations to anything meaningful, the premise, e.g. point of view, and the specific criteria used must be known. The point of view alone is not enough, because different criteria could be used for the same point of view. For example, a vaguely formulated criteria such as "How durable is it?", where it thus means engine or car design depending on the viewpoint, yielding a positive evaluation for the design of a combustion engine, whereas car designs traditionally have a much shorter lifespan, i.e. less durable

The general idea is that the evaluation and pattern implementations as a whole must try to express the Gamma et al. themes and concepts described in section 2.1. This makes sense because the individual patterns by definition must express the themes and concepts regardless of the language used. Determining if this is indeed the case is not easy. However, if we assume that the individual patterns as described by Gamma et al. express the desired properties, then their implementation should as well. By trying to implement all functionality described in the Implementation and Sample Code pattern elements, the pattern implementations attempt to express the largest possible set of desirable pattern qualities. These pattern elements are chosen because they explicitly focus on the practical application in context of specific languages and features. The contained information can rather easily be compared to other languages. The focus is on the practical use of the programming language to implement the design patterns, not on how the features are constructed internally.

The focus of the evaluation is practical and applied from the perspective of a practising designer and/or developer. The "Gang of Four" patterns should be used in a realistic, varied, and a practical manner. This requires an "application" of some size and complexity. In our view, this will produce much more realistic pattern applications than merely isolating individual pattern implementations in trivial shell—like implementations; such implementations are plentiful to be found on the Internet. Our evaluation contains no enervating "Dogs and Cats" examples; this is a Master's Thesis, not a petting zoo :-). As such, the evaluation merits rather advanced and complex implementations.

5.2.  Description
The approach demands that all implementation issues related to pattern functionality described in the Implementation and Sample Code elements in the "Gang of Four" patterns must be addressed, and if possible, provide a solution in the language catalyst. It is sufficient to refer to similar solutions in other patterns, but the features used must in any case be established. As both the implementation and the selection of features used may be determined by the evaluator, the evaluation and its conclusions will be subjective. The detailed evaluation of the solutions in the given language must be expressed using the Name, Intent, Structure, Participant, and Implementation elements from the "Gang of Four" pattern format. This includes at least an UML Class diagram in the Structure element and identification of the pattern participants expressed in the solutions. While familiar pattern elements are used to describe the evaluation outcome, the contents are much more detailed and specific compared to the "Gang of Four" pattern descriptions. The comparative evaluation must identify common traits in the pattern implementations and establish where various features are used and what their purposes are. Common traits include both pattern and language behaviour. The format of the comparative evaluation is not defined since it is completely dependent of the language and features investigated. It must be defined by the evaluation in question.

5.3.  Evaluation Goals
The purpose of the evaluation is to investigate how the use of languages features indigenous to Java 6 can affect application of the "Gang of Four" patterns, individually and collectively. As the whole concept of pattern correctness and behaviour is so elusive, the evaluation and its conclusions will be subjective. Hence, the objective is not to provide a definitive conclusion as this goes against the very idea of design patterns. Instead, the objective is to provide a realistic, but subjective, evaluation, which may be useful in disclosing how the "Gang of Four" patterns and Java 6 can cooperate. The goal is not to establish that a given pattern should be implemented using a set of specific features, but to illustrate that a given set of features may be useful in the application of the pattern.

In order to perform a reasonably structured evaluation of the entire "Gang of Four" pattern system using Java 6, we use the defined approach to implement all representative pattern functionality described in the Implementation and Sample Code pattern elements (in compliance with sub—goal II from the introduction). For each pattern, the outcome of the detailed evaluation will thus be (sub—goal III and IV):

The outcome of the comparative evaluation will be (sub—goal IV):

The comparative evaluation is presented in chapter 7, while the detailed evaluations are presented in chapter 8. Furthermore, based on all evaluation results, overall evaluation conclusions will be made in chapter 9.

5.3.1.  Features
As the last thing before we can conduct the evaluation, we need to select the set of features to investigate. A fixed set is a necessity to keep the evaluation focused, but it must be realistic. Excluding interfaces, for example, is not an option. The following core features will at least be investigated: type usage (classes, enumerations, interfaces, abstract classes, and exceptions), implementation and inheritance, generics and generic methods, inner and anonymous classes (closures), covariant return types, and varargs. Many of these features have similar constructs in C++, such as classes, generics, and covariant return types (for virtual functions [Stroustrup91, p.647]), while others do not, such as generic methods and anonymous classes. Many of these features are given, as writing any form of code in Java would otherwise not be possible. These features also encompass many of the Eiffel features used in the study by Meyer and Arnout from section 4.3.4.

As the related work examined in section 4.2 all concluded that runtime dynamic features aid in the application of the "Gang of Four" patterns, it is obvious to examine the use of Java's reflective capabilities in this evaluation. Reflective usage of class literals, constructors, and methods is examined, as well as dynamic proxies that allow a type at runtime to implement a given interface using reflective methods for dispatching. The use of annotations is also examined, especially when used reflectively at runtime. These features cannot be matched by C++, but Smalltalk possesses several similar features. Numerous "Gang of Four" descriptions illustrate or discuss pattern functionality relying on runtime features that cannot be directly implemented in C++, for example using classes to create objects in Abstract Factory [Gamma95, p.90-91] and Factory Method [Gamma95, p.112], or changing the class of an object runtime for State behaviour [Gamma95, p.309].

Java's built—in mechanisms for synchronisation, serialization, and cloning are also examined. C++ cannot match these mechanisms either.

The comparative evaluation will provide short descriptions of the relevant features where deemed necessary.

5.4.  Summary
Below, we summarise the most important points related to the defined evaluation approach and its practical use in this thesis:

The evaluation tries to express the themes and concepts described by Gamma et al. as realistic as possible. The pattern implementations will be non—trivial, and all relate to a few common model classes to convey the sense of a stand—alone "application". This requires more effort on behalf of the reader. On the other hand, we will strive to produce better and fully documented program code. The implementation in Java 6 will try to express "Best Practices" as described by Bloch [Bloch01].

The objective of the evaluation is to provide a subjective investigation, not a definitive conclusion as this goes against the very idea of design patterns. The evaluation may help identify how the "Gang of Four" design patterns and Java 6 can cooperate by illustrating how a given set of features may be useful in the application of a pattern. Three categories of features will be examined: core language features, reflective capabilities, and special language mechanisms. Core language features include types, generics, closures, covariant return types, and varargs. Reflective capabilities include class literals, methods, dynamic proxies, and annotations. Special language mechanisms include synchronisation, serialization, and cloning.