Data Design for Flexible Workflow Processing

Problem Statement

How can we entice the developers of applications involved in a workflow process to implement them in a flexible fashion?

In particular, how can we design the data that flows between the workflow applications in a way that the developers must implement the applications in a flexible fashion?

Example - Flight Scheduling

An airline schedules future flights by assigning the following data:

  1. Route - the time and waypoints of the flight
  2. Plane - model and tailnumber of the plane that will be used for the flight
  3. Crew - the pilot, navigator, stewards and stewardess' that will staff the flight

The airline has three corresponding applications for computing the data:

  1. Route scheduler
  2. Plane scheduler
  3. Crew scheduler

In the past, business requirements dictated that flight scheduling occur in a certain order: the route was scheduled first, then the plane was assigned, and lastly the plane was staffed with crew:

Traditional workflow schedules the route first then the plane then the crew

An XML document would flow from one application to the next. Each application was responsible for "filling in" its part of the flight data. For example, the crew scheduler would examine the XML document that it received from the plane scheduler to see what plane is being used, and given that data it would look inside its database to determine what crew has experience with that airplane.

Data Design - Old

To support the business requirements, the data is structured in a nested fashion. Here's how the XML document looks after all three applications have inserted their data:

<?xml version="1.0" encoding="UTF-8"?>
<Flight_Schedule>
    <Route>
        <!-- data about the route that will be flown -->
        <Plane>
            <!-- data about the plane that will fly the route -->
            <Crew>
                <!-- data about the crew that will staff the plane -->
            </Crew>
        </Plane>
    </Route>
</Flight_Schedule>

The crew data is nested inside the plane data, which is nested inside the route data.

When the plane scheduler receives the XML document it fills in its data inside the <Route> element. Likewise, when the crew scheduler receives the XML document it fills in its data inside the <Plane> element.

Disadvantage of a Nested Data Design: Doesn't Support Decision-Making with Partial Information

Suppose that as the XML document is flowing through the applications we intercept it at some arbitrary point. Given the way the data is designed, we can be certain that:

Thus, the nested data design imposes dependencies on the data, and the order in which the data is filled in: the crew data is filled in after the plane data, which is fill in after the route data.

The data design is imposing an order on application processing!

Data dependencies can result in application dependencies!

If I am the application developer for the crew scheduler, then I may assume that the input document will always contain route data and plane data. So, I hardcode my application to assume that all data (route and plane) is available to make my decision on crew staffing. Making a decision on crew staffing based on partial information — such as knowing the route but not the plane — is ignored.

Nested data designs can lead to inflexible workflow applications.

Flexible Workflow Applications

New business requirements have led the airline to require its applications be more flexible. For example:

Applications must be able to make decisions based on imperfect or incomplete information.

Rather than a rigid sequence of applications, the airline desires that application processing occur in any order:

The new workflow allows processing to occur in any order

Data Design to Support Flexible Workflow Applications

To avoid data dependencies, and thus application dependencies, keep the data flat:

<?xml version="1.0" encoding="UTF-8"?>
<Flight_Schedule>
    <Route>
        <!-- data about the route that will be flown -->
    </Route>
    <Plane>
        <!-- data about the plane that will fly the route -->
    </Plane>
    <Crew>
        <!-- data about the crew that will staff the plane -->
    </Crew>
</Flight_Schedule>

Make each element — <Route>, <Plane>, and <Crew> — optional. In doing so, each application will not be able to assume what will be the in its input, and must necessarily be designed in a flexible fashion.

Remove the dependencies in the data, and it will increase the flexibility of the applications!

Conclusions

How you design your data can have a big impact on the flexibility of the workflow applications that process the data.

Data with dependencies will likely produce inflexible workflow applications.

Hierchically nested data designs impose dependencies on the data. Avoid nested data designs for workflow processing.

Data without dependencies will likely produce flexible applications.

Flat data designs impose no dependencies on the data. Use flat data designs for workflow processing.


Last Updated: February 11, 2008