1 Introduction

Automatic code generation is a typical need for the industry to avoid time delays on project delivery. The challenge is converting the model to text and extracting the proper properties of the model. UML is the de facto standard for modeling and design of software systems [1] in terms of structure and behavior. Thus, UML diagrams are largely classified into structural diagrams and behavioral diagrams. These diagrams are the high-level abstractions of the system.

The structural diagrams focus on the static structural aspects like entities and their relations to the system. The diagrams such as class diagrams, component diagrams, deployment diagrams fall into a structural category. The behavioral diagrams depict the dynamic nature of the system. The diagrams such as sequence diagrams, use-case diagrams, collaboration diagrams, state chart diagrams, and activity diagrams fall into the behavioral category. A sequence diagram is an interaction diagram that deals with the sequence of message exchanges between one object and another. An activity diagram is one of the significant behavioral modeling diagrams. It is the only UML diagram to represent the control flow (workflow) of the business process. JC_Gen uses a sequence diagram to generate the skeletal code and then the activity diagram to incorporate the business logic into the skeletal code generated earlier.

To obtain the perfect code, the association between the activity diagram and sequence diagram is needed, which represent object interactions and their behavior. Even though activity and sequence diagrams are behavioral models, they represent two different perspectives of the same system. This eases the code generation process of JC_Gen.

BoUml is the modeling tool that supports drawing large-scale models. It also runs on different platforms. To a greater extent, the XMI schema of the UML models is produced through this tool.

In this article, part 2 represents a detailed study on various aspects of the proposed system. Part 3 details the complete methodology and algorithms of the suggested system. Part 4 has given a complete case and step-by-step output/result production of the system. Finally, part 5 includes the future extension possibilities.

2 Review of related publications

The data collection of research has stepped into different dimensions such as, software modeling, model transformation, code generation, XMI tools. In this section, consolidation of the search is presented based on the dimensions. To note, code generation is a sub-category of model transformation.

UML is a standardized approach to represent models in the field of software engineering. It represents a set of graphic notations to create visual models of object-oriented software-intensive systems. There are a variety of applications that uses models as backbone are published in recent years. A model-based aspect-oriented framework [2] is proposed for building intrusion-aware software systems. In [3], proposes an aspect-oriented modeling (AOM) for incorporating security mechanisms in an application. Kong proposes [4], a graph grammar to summarize the hierarchy of states. The execution of a set of non-conflicting state transitions is predicted by a sequence of graph transformations. A group of experiments [5] investigates whether the use of stereotypes improves the comprehension of UML sequence diagrams. Even Babenko describes [6] a concept of information support as reusable. The proposed system also uses the UML Models to produce the code. A methodology [7] in which the aspect-oriented modeling AOM technique is used to customize the primary model by integrating different business requirements.

Model transformation of the MDA approach focuses on considering a model as an initiator and generating other models or the programming code automatically. Process model of SDLC decides the project management and determines the schedule, cost, time, and resource according to their liability. The model transformation approach eases the designing phase that saves effort and reduces errors by automating the building of other models as per the need. It also performs a better role in change management, where changes can be done in a single model and its impact will be injected into other models automatically. CIM (Computational Independent Business Model) to PIM (Platform Independent Behavioral Model) transformation is a mandate operation that converts the business view of a model into an information view [8]. This brings the need for information on business logic on model representations. There are a variety of algorithms [9] and prototypes [10]on model transformation to make an auto-conversion. Model transformation is classified as unidirectional, bidirectional, declarative, imperative, and rules [11]. There are many references to demonstrate how model transformation is achieved through different approaches. The article [12] suggests model transformation from the class model to relational model transformation with the help of model-driven engineering (MDE) principles. A generalized approach of mapping guidelines for CIM- high-level business model into PIM-low-level independent behavioral model is defined by [8]. A matching algorithm was proposed by [13] to convert a particular structure of source to destination model. Models can be transformed with different cardinalities like one to many models or many to one model, etc. [10]suggests a transformation approach of ATLAS model language to different modeling languages.

UML statechart diagram is used for modeling a system’s dynamic behavior. [14] described an object-oriented approach for generating compact and efficient Java code from the statechart diagram. The states are represented as objects and all the behaviors associated with the states are retained as another set of objects. State design patterns have been extended with the help of object composition and delegation. JCode follows this approach to generate Java code after reading the specifications of the UML statechart diagram.

In their previous research [15], described a methodology where each state in the statechart has a class that encapsulates all the transitions and actions of the state. A readable, compact, and efficient code can be generated in the case of states without using controls such as if and case statements. Also, they published [16], produced executable legible, efficient, and compact code of the state diagram to an object-oriented language like Java. Representing states like objects that stretch out the hierarchical states’ representation using the concept of composition of objects and delegation.

The gaps between the modeling and high-level programming languages are an obstacle to produce satisfactory solutions. The automation tool proposed by [17] addresses this issue by mapping the UML notations to Java. It can generate directly the high-level Java code from multiple UML statecharts. It suggested a process of requirements engineering that composes UML scenarios to obtain a comprehensive description of a given service system. The derived services are transformed into the source code. Four operators are suggested as, sequential, competing, conditional, and iteration operators to compose a set of scenarios that describe the use case of a given system.

An [18] investigation on the viability of automatic generation of code from current systems design is taken place. Several different approaches have been experimented with in terms of short-duration and futuristic approaches.

Usman and Nadeem [19] extended their work on a tool called UJECTOR [19] for the automatic generation of executable Java code from UML diagrams. A set of three UML diagrams, i.e., class diagram, sequence diagram, and activity diagram are the input to the tool to generate a completely executable Java code automatically.

Parada et al. [20] presented a work to automatically generate structural and behavioral code from UML class and sequence diagrams. In [21], Engels et al. concentrated on collaboration diagrams. The main objective is automatically generated java code fragments to build a substantial part of the system’s functionality and to avoid the loss of important information during the transformation process.

A comparison study on generated code [22] from rhapsody OPCAT, using object-process methodology (OPM) case tool. The comparison concludes that the UML consistency problem and its distributed representation of the system behavior are reflected in the code. OPM models, which capture the static and dynamic aspects of a system in a single view, also enable the generation of potentially complete application logic rather than just skeleton code. The study also explained the unique architecture and functionality of OPM- GCG (Generic Code Generator) of OPCAT.

The research mainly focused on bridging the gap between software design and implementation. Same as indented by [23], the systems-based components are used in software architecture at the level of modeling/design. Then, the coordination paradigm components are used at the level of implementation.

In Singh has proposed [24], UML class diagram is used to generate XML (Extended Markup Language) schema. The generated XML schema is used for code generation. JIBX (Binding XML to Java code) is a Java-based open-source tool used for code generation which is developed by IBM.

Gene-Auto ITEA European project [25], which aims at building a qualified C code generator from mathematical models under MATLAB-Simulink and Scilab-Scicos. The first version of the Gene-Auto code generator has already been released and has gone for a validation phase on real-life case studies defined by each project partner.

Automation of model generation improves the reusability of the software development process. This process is intended in designing the pictorial model of the information and exporting it as a schema. Some of these tools assist the conversion of the desired format of input and output of the system. ArgoUML is an object-oriented case tool that facilitates the generation of a source model. ArgoUML contains multiple functionalities for the UML model generation, and it exports the XMI of the model [26].

A new approach of auto-code generation is proposed in [27], which uses the Rete algorithm to generate rapid code generation using a uniform coding style.

BoUml is a modeling tool to support large-scale models and is easy to draw the model. It also runs under different platforms. To a great extent, the XML schema of the UML models produces through the tool [28]. The exported XML schemas are taken as the source to develop the applications in Java [29]

The decisions about the functionality and structure of any software system are so critical and decided at the design phase and the design of XML schemas involved in the development as a consequence of those decisions. In [30] proposed, transformation algorithm to convert a UML profile into XML schema. This model-driven approach allows designers to be freed from low-level implementation issues, by the fully automatic mechanisms that transform UML models to XML schemas.

A learning system analysis metrics value from the existing system ad uses principal feature analysis to find the complexity, cohesion, and coupling is addressed [31].

A tool called PlantUML is a flow charter or UML model generator that takes rendered text as input to generate specific models according to the given text annotations [31]. Rendering is a process of joining the entire element and their relationship to form the standard textual format so that the tool can regenerate images from text [32].

Graphviz is an open-source tool that is used for generating graphical representations described in [33]. For the graphical representation, the design patterns and dot tool of the Graphviz package are used in [34]. Object Relation Diagram (ORD) is a directed graph, where nodes are classes and the edges represent the relationship between classes, and advantage of this approach is that it reduces the cost of stub creation.

3 Java-specific code generator (JC_Gen)

The proposed tool JC_Gen makes use of sequence diagrams and activity diagrams in generating code. Parsers segments data from the XMI into categories like action, activity, model, state, transition, etc. Some of the Java APIs like DOM, SAX, dom4j, and XOM are used for checking and validating the XML with DTD and Schema(s). A DTD is a document type definition that defines the structure, legal elements, and attributes of an XML document.

Kraft et al. [35] perform a comparative study on various parsers concerning their parsing time and throughput. This involves a set of stream-based parser APIs such as SAX (Simple API for XML), StAX, XMLPull, and tree-based APIs such as document object model (DOM), JDOM, ElectricXML, DOM4j. Another deliberate discussion on Java APIs of XML parser is done in [36].

A sequence diagram shows object interactions arranged in the time sequence of a system. It depicts the objects involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario.

An activity in Unified Modeling Language (UML) is a major task of object-oriented development that defines an activity as a sequence of activities that make up a process.

The architecture of JC_Gen is shown in Fig. 1 which outlines the flow of processes performed in obtaining the java code. Using the above models, the major perspective of the coding can be retrieved. Especially sequence helps to identify classes, data members, calling function, and called function. The activity model fills the logic of the declared function. Therefore, these model suits well to generate code automatically.

Fig. 1
figure 1

Architecture of JC_Gen

The following steps are involved in code generation.

  1. 1.

    XMI generation

  2. 2.

    SD Parsing (Sequence Diagram Parsing)

  3. 3.

    AD Parsing (Activity Diagram Parsing)

  4. 4.

    Structural code generation

  5. 5.

    Behavioral code generation

    1. a.

      Mapping between sequence and class artifacts

    2. b.

      Activity Interpretation

The XMI schemas of both sequence diagram and activity diagram form the input to the respective parsers. The SD parser retrieves the information of structural aspects such as class definition and members’ declaration from the SD XMI schema. The output of the SD parser is used to produce the partial code of the software by the structural code generator. The AD parser retrieves the logic of each member in the form of activity in terms of action and flows. The behavior code generator takes structural code and extracted elements from activity diagram to incorporate the business logic code in the generated code.

3.1 XMI generation

The XMI schema generated by the modeling tool represents the behavior of the model elements in the form of tags and their attributes. These elements are also organized according to the order of their structure and action. The extraction of the sequence diagram is intended to gather the structural information of the code. However, the Activity XMI representation focuses in-depth on each activity and its sub-actions to derive the behavior aspects of the code. Dependency issues may raise with XMI version usage due to the update of schema versioning. Until now the experiment was carried by carefully using compatible schemas. Also, an alert message is provided when the version is not supported by the developed system.

In the sequence diagram, each lifeline represents the object's interaction time ordered. The message passed between the objects decides the flow of the program. Messages are the communication between objects. Each message reflects either an invocation of a method or sending and receiving of a signal, so that message could be of actions like synchronous call, asynchronous call, and asynchronous signal [37]. Each message specification has sender and receiver in which the sender will be calling or passing a message which resides on the receiver side. In a sequence diagram, each message can be considered as part of the sequence flow, from one object to another object over the timeline.

The activity diagram identifies the procedure of each method by its action and sub-actions. The start and the top nodes of the activity delimit the scope of the method specified. The action state provides a pseudovalue of the procedure rather than a java code. These actions are mapped with the sequence diagram messages to produce method definitions of corresponding members.

3.2 SD parser

The SD parser extracts the necessary elements from the XMI document. The extracted elements are segregated according to their type value such as class, property, operations, and messages. The relationships between the elements are maintained to generate the code. To achieve the extraction of elements and their relationship in an optimized way, the parenthesis balancing algorithm is used [38]

XMI tags, the foundation of XMI define the scope of an element in XMI. They can also be used to insert comments, declare settings required for parsing the environment, and insert special instructions. XMI tags are represented in the form of tree structure. Each tag has some relevant sub-tags. It has the facility to store the property of each tag. For example,

figure a

It is found storage of extracted elements with their members and relationships can be obtained efficiently by using tree structure storage. Therefore, the output of the SD stage 1 parser is a tree with a set of nodes representing the XMI tags. Each node retains its name, type, and xmi:id. It parses only the open and close type tags. The algorithm used to create the class structure with its data members, and member function is given in Algorithm 1.

figure b

The SD parser stage 1 generates a tree structure using a parenthesis balancing algorithm. This is adopted and modified in the work to identify each pair of tags presented in the XMI Schema. Through this, the tree structure is formed by the Stage 1 SD parser.

It determines the hierarchies of the classes and the data members as shown in Fig. 2. This algorithm neglects the combined tag which only focuses on the collaboration and more details of the model than the core components. The tree structure thus produced by the SD stage 1 parser along with XMI schema is used to produce mtd_Array and var_Array by the SD stage 2 parser.

Fig. 2
figure 2

Tree structure generated by SD stage 1 parser

The datatype and its initial value of the data members and prototype of each method with its return type are obtained by the SD stage 2 parser using algorithm 2. The SD stage 1 tree along with the XMI document is given as input to this algorithm. The “operation” and “property” nodes of tree ‘T’ are processed repeatedly to find this type. The output of the algorithm is stored in a sequence structure for further processing. The SD stage 2 parser produces two structures to store data members (var_Array) and member functions with (met_def) data types and return types, respectively.

figure c

The SD stage 3 parser extracts the sequence of object flow among the classes using algorithm 3. The algorithm SD stage 3 uses the same inputs used for the stage 2 parser. The output of the SD stage 3 parser is used to frame the function call inside the main function of the classes according to the source sequence model.

figure d

The 3 stages of the SD parser generate a set of data structures such as T, mtd_Array, var_Array, and objFlow_array. These details produce an array of structural information to generate the code.

3.3 AD parser

AD parser helps to extract the behavior aspects of the software under development. It considers the XMI schema of the activity model generated corresponding to the sequence model. Multiple discontinuous activity models are used to represent the tasks performed inside the methods.

Each activity delimits the other activities by start and stop nodes. In between nodes depict the logic of the operation. The output of the AD parser is stored in an array where each element is the head of a singly linked list. Each singly linked list is used to represent the execution sequences of an operation. AD parser incorporates the methods defined in the appropriate place of the generated code by the SD parser. Algorithm 4 represents the AD parser. The element in mtd_def_Array shown in Fig. 6 represents the logic of each method in its sequence.

figure e

3.3.1 Structural code generator

The output of three SD stage parsers is given as an input to the structural code generator. SD stage 1 parser defines the classes and their data members. The tree structure ‘T’ indicates the relation between the different types of elements. SD Stage 2 parser identifies the variable and returns types of members, and finally, the SD stage 3 parser represents the function calls. This process is a clear mapping of parser extracted elements into the code statements. Each element of tree structure ‘T’ is fixed into the code structure according to its position and type. (e.g., Each class type element is created as a Class and the child of these elements in ‘T’ are its members. The data members and member function are identified based on the type.) Then, mtd_Array produces the return type and var_Array produces data type and initial value.

objFlow_array indicates the class where the function is to be called and the class to which the function belongs. Using this data, the function calling statements are transformed to code structure. The template code generator [39] adheres to the information by the parser to produce the java code. The structural code generator produces the class definition inclusive of variable and method declaration. It also focuses on the function calling to represent the object flow from one class to another class. When the array is combined with the result of the structural code generator will produce the compilable code of the given input models.

3.3.2 Behavioral code generator

This process updates the already generated code with method definition. The method definition is a procedure of task accomplishment in the software. This behavior is extracted from the activity diagram by the AD parser and stored in mtd_def_Array in Fig. 6. Each element stores a particular method’s definition in the form of a linked list. During this incorporation of definitions, the following sub-tasks are performed.

3.3.2.1 Mapping between sequence and class artifacts

This mapping identifies the appropriate place to incorporate the method definition. If it matches the function name of the code generated with the element name of mtd_def_Array in Fig. 6. If there is a matching found, then the entire linked list with the same element name is inserted inside the block below the matched node as in the sequence Tree ‘T’. Set of pseudo-codes are produced as the method definitions, and it will be modified by the activity interpreter.

Activity interpreter. The pseudo-code is replaced by java code by activity interpreter. The pseudo-statements of method definition are converted to java code with the help of an interpreter. The start and final statements are replaced with ‘{‘and ‘}’ by the activity interpreter. Hence, the scope of the method is determined. The pseudo-code is treated as a keyword, and the replacement is done for that statement. E.g., “add a, b” in the pseudo-code is replaced by “a = a + b;” otherwise “calculate c = a + b;” replaced by “c = a + b;”. The output of this process produces the pure java code.

Assumptions

Some of the assumptions made after complete code generation are listed below.

  • If any method declaration does not found a match in mtd_def_Array then treat it as an abstract function.

  • If a class has one or more abstract functions, then the class is also considered abstract.

  • If no method definitions are found in a class, then it is an interface.

  • Method calling happens only in the main method until it is explicitly specified in the method definition.

4 Results and discussion

A case study on car driving has been taken as a sequence model with a set of activity diagrams. The car, driver, and engine are identified as the call actions and flow between these actions is captured by the corresponding objects. The input models are generated using the BOUML tool and an XMI schema of the corresponding source models is exported.

For the given sequence diagram in Fig. 3, the type of objects required (i.e., the corresponding classes) and their members are retrieved. A partial tree of the source model is shown in Fig. 4 which represents the call action “car” of the sequence model. The XMI schema exported from the modeling tool is further classified by different levels of parsers. This schema contains different tags with multiple attributes to reproduce the modeling diagram. The SD parser stage 1 generates a tree structure using a parenthesis balancing algorithm. This Tree structure is represented as ‘T’ in further processes and each node of this tree keeps track of its type, id, and link to the child. The elements are extracted from the XMI schema, and a tree is formed. The types of nodes such as class, operation, and property are retrieved by the SD stage 1 parser. It also determines the hierarchy of the elements as shown in Table 1. The members are incomplete without their return type and data types. The driving model consists of 3 classes such as driver, car, and engine.

Fig. 3
figure 3

source model

Driving sequence diagram as

Fig. 4
figure 4

‘T’ Structure of Car class and members

Table 1 Mtd_Array of driving model

To retrieve the missing information SD stage 2 parser takes the ‘T’ as input and traverses the entire array. If the node type is “operation,” then the return type of function is identified and stored in mtd_Array; otherwise, if the node type is “property,” then data type and the initial value of the member identified are stored in var_Array. Two arrays are produced by the SD stage 2 parser concerning the source sequence model and are shown in Tables 1 and 2.

Table 2 Var_Array of driving model

The next level of the parser is used to find the object flow from one class to another class with the help of message flow. This SD stage 3 parser determines the place of the function call and creates the object used to call the function. The different function calls of each class are identified and inserted into the main function. According to the identified owner class, the objects are created to utilize the functions. Even though there is more than one class in a source code file, only one main function can exist in it. Hence, the main function is created under the initial class with message flow in the model. The output of the SD stage 3 parser is stored in objFlow_Array as shown in Table 3.

Table 3 Objflow_Array in driving model

After the SD stage parsers of structural code are generated by the structural code generator, it converts the data stored in mtd_Array, var_Array, and objFlow_array into the code format. Each class structure is stored in a separate file with the class name. The stage-by-stage code achievement concerning SD parsers is shown in Table 4.

Table 4 Code obtainment from SD stage parsers

The behavior aspects of each operation are represented by another input model called activity diagram as shown in Fig. 5. The input activity diagram represents procedures for each method with a set of states. The set of activities of a method is grouped with a distinct name. The AD parser extracts the elements from the XMI schema and stores each activity in a separate list. The head of each list is stored in an array.

Fig. 5
figure 5

source model

Driving activity model as

The array with car class methods is shown in Fig. 6. The partition of the activity model is mapped with the mtd_Array of SD stage 2 parser. If any matching occurs, then the activity list is inserted as the definition of the member function.

Fig. 6
figure 6

Mtd_def_Array with an activity list of Car class

After the mapping, the behavior code remains as pseudo-code. To address this issue, an activity interpreter is used to convert the pseudo-code into java code. This activity interpreter makes the code a platform-specific model such as Java code. A sample is shown in Table 5. This reduces the burden of the designer on code generation.

Table 5 Sample interpreter table

Finally, a complete java code that represents both structural and behavioral aspects of the models is produced. Through the source models, three classes are achieved in the given case study as driver, car, and engine. The complete Java code in Table 6 shows the compilable code placed in the given input models. Each class is stored in a separate file, and the output of the main class is shown in Table 6. Similar attempt of generating code was attempted earlier using different diagrams [40].

Table 6 The output of the AD parser

LPMT [13] is a logical prediction model transformation system that takes the activity diagram as the source then produces a class diagram, use-case diagram, and sequence diagrams. Extracts information from an XMI schema and applies rules on the source model to destination models.

A comparison is between JC_Gen and LPMT model proposed in [LPMT] is shown in Table 7. The table depicts the correlation among them since they are different forms of model transformation. The software metrics [41] do not reveal the accomplishment of code generation from the models. But code generation approach has proven the time reduction of code generation and only a few more contribution expected from the human.

Table 7 Comparison between JC_Gen and LPMT

5 Conclusion

This initiative is a stepping stone in the development of java applications from the models. The procedures followed here can also be extended to develop GUI programs using java without the knowledge of syntax. The final code produced by the work is 95% compliable code. This research reduces the gap between design models and code generation. It strengthens the connectivity between the phases of the SDLC process. It supports large-scale models compared to other flowchart interpreters such as raptor, flowgorithm, visirule, etc., and concentrate on business aspects more than case tools such as StarUML, BoUML, ArgoUML, etc. The addressing of behavioral aspects of the software using advanced models such as, activity and sequence diagram is an added advantage of this system as it is capable of producing the code with business logic when compiled produce necessary executables, which are production-ready.

However, this system can be extended to other UML models and the high-level concepts such as dynamic method dispatch, reusable design patterns, and inclusion of library function.