I'm in the process of putting together a Phase I architecture as I
mentioned in an earlier posting, and is mentioned on the web site.

The first question that comes to mind are Phase I goals:
	Service/Resource failover
	Service Groups or dependencies
	Resource Diagnostics
	/proc (or /ha) interface for status and control
	2-n node cluster size
	redundant NIC failover
	shared/mirrored filesystem takeovers
		{ala Poor Man's data replication}


Towards this end I've drawn a few pictures, and written a few words
of text.  The picture is related to Tom Vogt's picture, but has several
differences. It is very similar to the proposed project framework on the
HA web site. Each component in this picture is replicated on each
node in the cluster.

              manual requests
                    |
                    v
          +---------+----------------+
          |                          |
          |                          |
          v                          v
   diagnostics               +----------------+
   (scheduled                |                |
   and manual) ------------->|  Configuration +------> Application
                             |    Management  |        Notification
monitoring/heartbeat-------->|                |        API
                             +--------+-------+
                                      | ((Re)configuration
                                      |  Modules)
        +-------------+---------------+----------+
        |             |               |          |
        v             v               v          v
  IP Takeover     Filesystem    Application     etc.
                  Takeover      Start/Restart   
                             
The discussion below talks about the system in terms of objects,
although the current code is written in C (not C++), and I expect
the new framework to be implemented in the same way.

There is a presumption in this design that each node has a copy
of the whole cluster configuration.  Otherwise, when the
cluster is reforming itself, it doesn't know what resources
provided by missing nodes should be instantiated in a failover
mode on a "replacement" node.

The most common/important kind of object in this model is the resource
    - Resources are things like IP addresses, NICs, filesystems,
	disks, applications, etc.

The following methods exist for every resource:
name()		The name of this resource (ASCII string)
provided_by()	Returns the node providing this resource
type()		Returns the "type" of a given resource
		See "resource_type" object below.
service_state()	returns IN_SERVICE, OOS or PENDING_OOS
			(REMOVED or PENDING_REMOVE?)
in_service()	Brings the resource into service
oo_service()	Takes the resource out of service, gracefully
force_oos()	Takes the resource out of service, immediately.
mark_oos()	Mark a failed resource as out of service
			takes no action to take the resource out
			of service

There is a (static member) function which locates a resource:
find_resource()	Returns the resource that corresponds to the given
		name
There is also an object called resource-list-item
	It has the following methods:
next()		The next resource-list-item in the list
resource()	The resource associated with this list item

Resources also have the following member functions (which returns
	a resource-list)
dependson_list()	Returns the list of resources which this
			resource depends directly on

dependents_list()	Returns the list of resources which depend
			directly on this resource

These dependency lists can be manipulated by these two functions:
dependson(r1, r2)	Mark resource r1 as dependent on resource r2

There is also a fundamental object called a resource_type
It has member functions like these:
	instantiate()	Creates a resource of the given type
	typename()	returns an ASCII string naming the type

................................. .................................
Things which are not yet defined but I know I need.

Diagnostics objects

Application notification objects
	(so an application can register that it wants to
	 be notified about cluster transitions, etc.)

Message:
A set of {name,value} pairs sent from a node to all nodes, or
to a single node.  This is what I now implement in "heartbeat" (this
version is awaiting testing by SuSE users before release)

Some of the things I haven't defined, I don't know I need, and some
of them are just lower-level details that I haven't gotten to.
-----------------------------------------------------------------------
Continuing on...
-----------------------------------------------------------------------
Inside the Configuration Management Subsystem, there exists what I call
a configuration strategy module.  At this point I assume that this
is a plug-in module, which can be replaced with any one of a number
of strategy modules, according to the needs of the system or whims
of the administrator.

My initial thoughts on a first-cut node transition strategy module
go like this:

Timeouts and new transitions cause restart from step 1

	Step 1:
	Declare a cluster transition

	Step 2:	(one everyone has ACKed the cluster transition)
	Declare a "transition master" node (somehow -lowest node name?)

	Step 2:	(once everyone has ACKed the transition master)
	Master Requests a config report (what resources each node has)

	Step 3: (once every node has reported)
	For each resource group, the transition master requests
		a cost of providing it from each node
		(infinite if it cannot be provided)
	Step 4:
	Transition Manager requests

	And then I finish this document later...