Dynamic Distribution for the Application Designer

Charles Weir
14 October 1998

Introduction

This paper examines, from the point of view of an application designer, how to model applications that use dynamic distribution and agent technologies.

In this paper, we'll assume a framework that already solves the problems of distribution, load balancing, and object migration. This may use CORBA, DCOM, or an agent-based framework such as Voyager or AgentTcl. Given this framework, how do we design practical applications? What kinds of 'business objects' get distributed; what do their interfaces look like; and how do we write down our resulting designs?

A simple example

Throughout this paper, we'll consider a simple example of a distributed system as follows. A news agency collects the results of football matches on a central computer. Each match has a journalist, who enters a new score in a local computer system (perhaps a Personal Organiser with web access). The local computer then uses distribution facilities to update the central computer.

What do the Distributed Objects Look Like?

A naïve design for a non-distributed system might contain classes for 'domain objects' such a Match, Team and Stadium. There would also be suitable User Interface components to modify the Match instances (and enter Team and Stadium information, and display current scores), and perhaps additional classes to support persistence.

Suppose we want to add distribution to such a naïve model, but to distribute only the two Use Cases (see [Jacobsen]) that involve Match information: entering a goal and updating the team list. How would we design the distributed objects - the objects visible on one host from another? And how do the choices we make here influence the following two aspects of dynamic distribution?

Load Distribution

We may want to distribute similar processing over several different hosts to improve the effective host processing power and also to add fault-tolerence.

Dynamic Distribution

This load balancing may be dynamic, so that the choice of server host can vary with time according to the loadings and availability of the individual hosts.

The following sections examine several possible approaches to designing distributed objects, and identify the benefits and the drawbacks. For clarity, benefits and drawbacks are separated with a boldface 'However'.

The 'CORBA' model.

Although CORBA potentially supports many possible models, some of the services (particularly the persistence service) encourage designers to think of domain objects as distributable. So in our example the only distributed class would be Match, with distributed functions AddGoal, GetScore, GetTeamList, UpdateTeamList, etc. Each server Match instance contains the state associated with that match. It must be persistent, existing as long as the system needs to retain information about that particular match.

This approach is easy to design. The interface corresponds closely to the domain model. The client can locate match instances using name-server services or the like, and use each instance as though it were a local object. However: it's difficult to provide load balancing, since each logical Match instance can only have one physical location. We could put different Match instances on different servers, but any processing involving several Match instances would then become difficult. We also get multithreading problems when two clients update a single Match simultaneously; for example one journalist might update the team list while another enters a goal. Also, accessing each separate data item requires a separate client-server call - which proves slow when these take a long time. Certainly this approach doesn't lend itself very readily to dynamic distribution; moving a Match instance from one host to another would mean notifying all it's clients of the move.

The Façade Model

Here the server provides just one Façade object (see [Gamma+]) which provides single functions to do each operation. So in the example we might provide a FootballServer, with operation getMatch and operation changeMatch, with parameters a match I.D., and all of the data required to make the changes.

This approach is also easy to design. It is particularly popular for wrapping legacy systems, which don't have a concept of underlying software objects - typically no objects appear in the interface; all are represented using some kind of string identifier. The model permits dynamic distribution, for the FootballServer instances can be duplicated on many hosts; each client can use a load-balancing service to decide which instance to use for each operation. However: this approach doesn't work well for operations that require complicated data structures. It requires multi-threading at the server, which is complicated to implement. It doesn't preserve the object-oriented design between client and server.

Variant: Session Object

One way to remove the problems of multi-threading is to have a separate instance of the Façade Object for each client. Thus the server object can be single threaded. Typically there would be a single Factory object (see [Gamma+]) on each host to create such a session object.

This removes the problems of multi-threading. Also, the session object can easily implement security facilities, since it knows its client. However: The Session Object approach retains all the other problems of the Façade approach above. In many environments it can be difficult for the server session object to detect that it's client is no longer present, so garbage collection of session objects can become a problem. This approach also militates against dynamic distribution, since creating a new session object for each transaction is likely to be more expensive than simply choosing a different Façade Object.

The Microsoft Model

The model used by Microsoft's Transaction Server (MTS) is that each distributed object represents a change to the system. Although, confusingly, the names of MTS objects are typically the names of underlying concept domain objects (Match for example), each MTS object actually represents a change to the system that is associated particularly with that object. So the lifetime of such an object is very short - typically just a single transaction. Note that this distributed object now serves two purposes in different ways. It can provide read-only information as though it were a CORBA-style domain object. However when it is used to change the database, it becomes a 'transaction object', and doesn't complete its work until the commit function.

This approach allows each server object to be single-threaded, which makes design simple. It integrates well with legacy systems, since the access and change mechanisms can map well to function calls or database access statements. It supports dynamic distribution easily - each transaction creates a new object, which can map to a different host. However: it may require many client-server calls to set up the data for a single database change - which is slow. A single object may represent one of a number of different possible changes to the database; although one might use the State pattern (see [Gamma+]) to implement this cleanly, it is still awkward for the server programmer.

Which approach is best?

A logical question to ask at this point would be 'which approach should we use, then?' In practice no one model is suitable for all purposes. Instead we should choose the best model for each specific network interaction. The approachs, in fact, are 'patterns' for us to use whenever appropriate (see [Gamma+]).

Introducing Migrating Objects

An interesting feature of many Agent systems is that distributed objects can migrate easily between hosts. What is the benefit of this? CORBA, DCOM and other RPC-based systems already provide broker facilities to allow load balancing and simple object migration. What does dynamic migration add for the application developer?

We can identify two immediate benefits:

A familiar example of a task that needs to migrate is a complex database update. Database servers can be very fast, but typically updates require some intelligence (e.g. optimistic locking: "read from the database, check if the data's changed since the last read and only write if it hasn't"). If we implement these updates using Remote Procedure Calls (RPCs), this can require database elements to be locked for the duration of several client server calls; that can make it impossible to get decent throughput (see [Waldo+94]). A better approach is to migrate the upgrade task onto the database host.

What sort of object suits migration?

Of the three models for distributed objects we examined earlier in the paper, clearly the one most suited to migration is the Microsoft model. Since each object represents single task, migrating it to another system is easy and appropriate. However there's another alternative, which may work better in specific cases:

The Transaction Object

Here each particular operation on the underlying data store appears as an object of a different class. For example, adding a goal to a match has a different class to updating a team list for that match. So in our football example, the distributed objects might be MatchScoresGetter, NewGoal, TeamDetailsChange, and TeamDetailsGetter.

This makes it very easy to distribute processing dynamically. The client can set up each object locally using local rather than client-server calls, and the object can then migrate to carry out its task (the 'doIt' function). The 'server' code to implement each task can be single-threaded. Dynamic distribution is relatively trivial - each object can dynamically choose a host to migrate to. However: this approach creates a proliferation of classes - one for each operation. This costs in both coding time and code size. Invoking each operation is difficult for the programmer, since the results of each operation can't be known immediately (unless the framework provides mechanisms to hide this problem from the programmer).

Migrating Objects and Shared State

Assuming migrating objects, how do you share the state associated with each transaction between client and server hosts, so the choice of host may be unconstrained? What are the options?

  1. The client can send out all the state and context with each request. This approach is suitable for heavy processing algorithms like stochastic algorithms or cryptography.
  2. All candidate remote hosts can share the same environment. Options include using a distributed database; multicast data such as Reuters Triarch; hardware and operating system support such as Tandem/DEC clusters or Sun's SparcWorks shared NFS environment; and duplicated static data.
  3. The remote host can access the client's state through remote objects located on the client (callbacks).
  4. The remote host can access shared state through remote objects on a further remote host with the given state. This may be a single host, or one of a cluster as in (b) above.

So how do we document these mechanisms using standard notations? The standard UML notation for Interaction diagrams can show examples of [c] and [d] - although there's no standard way to show anything more than particular scenarios.

UML notation may be more useful when it comes to describing the structure of migrating objects. Although UML is vague on its definition of aggregation, the concept is useful in the context of distribution. We can say that object B is-part-of object A if object B is migrated whenever object A is migrated.

Thus suppose, for example, that the team details comprise an array of Player objects. We can describe it on a UML diagram as shown here. The UML aggregation diamond on the association shows that for this case between one and eleven Player objects migrate along with each TeamDetailsChange object.

Summary

This paper has explored five different possible approaches to creating the interfaces to distributed objects, and examined the effects of each on dynamic distribution, as summarised in the following below:

The CORBA Model

Difficult to achieve dynamic distribution.

The Façade Model

Possible, given a broker to chose a different façade for each transaction.

Session Object

Difficult to migrate sessions. Suitable only if there are many clients and only short sessions.

The Microsoft Model

Easily suited to dynamic distribution.

Transaction Object

Easily suited to both dynamic distribution and migrating objects.

Finally this paper examined the benefits of migrating objects and the problem of migrating state information along with the objects. It proposes using UML aggregation notation to indicate composite objects to migrate.

References

[Gamma+] Design Patterns, Gamma, Helm, Johnson and Vlissides, Addison-Wesley, ISBN 0-201-63361-2

[Jacobsen] Object-Oriented Software Engineering, A Use Case Driven Approach, Ivar Jacobson, Addison-Wesley, ISBN 0-201-54435-0

[Orfali+] Essential Distributed Objects Survival Guide, Orfali, Harkey and Edwards, John Wiley and Sons Ltd., ISBN 0-471-12993-3

[Waldo+94] A Note on Distributed Computing, Waldo, Wyant, Wollrath and Kendall, Sun Microsystems Laboratories, http://www.sunlabs.com/smli/technical-reports/1994/smli_tr-94-29.ps