two-phase initialization
时间:2010-07-28 来源:wqfhenanxc
它用于什么情况?
一、two-phase initialization 的介绍和其存在的必要性分析
摘自http://blogs.microsoft.co.il/blogs/sasha/archive/2008/08/19/two-phase-initialization.aspx
Two-phase initialization is an architectural pattern for artificially breaking and managing coupling between strongly coupled components. The motivation and implementation of this pattern are not always obvious, so I will give a couple of examples to demonstrate.
Let’s take an operating system as an example. Some of the components involved in the initialization of the operating system are the I/O manager, the memory manager, the object manager and many others. At runtime, the strong coupling between the various components is obvious and beneficial – they tend to use each other, all the time.
However, during system startup, these dependencies (especially if startup is performed synchronously) can lead to a dead end. For example:
- The memory manager initializes. It needs to create shared memory objects (section objects) to represent binary images being loaded during system startup. This requires a trip to the object manager.
- The object manager initializes. It needs to allocate memory for the system handle table and for the actual resources being created. This requires a trip to the memory manager.
Another example can be taken from an ESB infrastructure I have been implementing lately. The infrastructure services include a configuration service, a publish/subscribe service and a “DNS”-style service. These services are typically used by other system components, but they also need each other:
- The configuration service initializes. It needs to register itself in the “DNS” service to be accessible by other system components.
- The “DNS” service initializes. It needs to obtain its configuration and use the pub/sub service to register for configuration change notifications.
- The pub/sub service initializes. It needs to obtain its configuration and register itself in the “DNS” service to be accessible by other system components.
Disentangling these dependencies can be done in various ways. For example, we could say that the infrastructure services are not allowed to use each other – the pub/sub service will use local configuration, the “DNS” service will have a predefined list of registered endpoints, etc.
However, in an operating system we can’t resort to a solution in which the object manager manages its own memory, and the memory manager manages its own objects.
The only feasible alternative is two-phase initialization.
When using two-phase initialization, infrastructure components initialize in two phases. In the first phase, they do not rely on any other components to reach a stable state in which they are able to provide basic services to the rest of the system. In the second phase, they transition to a fully-functional state in which they rely on other components (which have not necessarily reached the second phase yet).
Using this model in our example, the “DNS” service can start with a predefined list of endpoints that will be used to communicate with the infrastructure services while they are in the first phase. In the second phase, these predefined endpoints will be replaced by the actual endpoints for the actual services. The pub/sub service can start with a local configuration during the first phase, and retrieve its configuration when the configuration service becomes available (enters the first phase), and so on.
Providing a generic implementation for all infrastructure and non-infrastructure services to account for two-phase initialization is exceptionally difficult, but achievable if the proper metadata is in place. Components must provide metadata regarding their explicit dependencies and ways to make forward progress while these dependent components are not yet available.
This sounds simple, but in reality it really isn’t. Multiple issues plague the two-phase initialization pattern, but do not undermine its principal validity:
- Transitioning between initialization phases might require a significant amount of work. For example, the pub/sub service might use a database to store the subscription information, and when transitioning to the second phase (by talking to the configuration service) the connection string to the database might have changed.
- Deadlocks can be introduced into the startup sequence if initialization is not carefully asynchronous.
- Terrible race conditions can be introduced into the startup sequence if it is not carefully synchronized for multiple threads of execution.
- Lots of noise is generated in the system while it’s restarting or when some components are being reinitialized.
The two-phase initialization approach is used by Windows. In the first phase (called phase 0), initialization proceeds in a single thread and bring up only the minimal services required for the second phase. In the second phase (called phase 1), system components can rely on other components being present to start transitioning into their fully-functional state.
To summarize, two-phase initialization is difficult to manage and implement, but in the real world where components circularly depend on each other there is rarely a better alternative.
摘自http://www.tantalon.com/pete/cppopt/design.htm
<C++ Optimization Strategies and Techniques>, Pete Isensee
An object with one-phase construction is fully "built" with the constructor. An object with two-phase construction is minimally initialized in the constructor and fully "built" using a class method. Frequently copied objects with expensive constructors and destructors can be serious bottlenecks and are great candidates for two-phase construction. Designing your classes to support two-phase construction, even if internally they use one-phase, will make future optimizations easy.
The following code shows two different objects, OnePhase and TwoPhase, based on a Bitmap class. They both have the same external interface. Their internals are quite different. The OnePhase object is fully initialized in the constructor. The code for OnePhase is very simple. The code for TwoPhase, on the other hand, is more complicated. The TwoPhase constructor simply initializes a pointer. The TwoPhase methods have to check the pointer and allocate the Bitmap object if necessary.
class OnePhase
{
private:
Bitmap m_bMap; // Bitmap is a "one-phase" constructed object
public:
bool Create(int nWidth, int nHeight)
{
return (m_bMap.Create(nWidth, nHeight));
}
int GetWidth() const
{
return (m_bMap.GetWidth());
}
};
class TwoPhase
{
private:
Bitmap* m_pbMap; // Ptr lends itself to two-phase construction
public:
TwoPhase()
{
m_pbMap = NULL;
}
~TwoPhase()
{
delete m_pbMap;
}
bool Create(int nWidth, int nHeight)
{
if (m_pbMap == NULL)
m_pbMap = new Bitmap;
return (m_pbMap->Create(nWidth, nHeight));
}
int GetWidth() const
{
return (m_pbMap == NULL ? 0 : m_pbMap->GetWidth());
}
};
What kind of savings can you expect? It depends. If you copy many objects, especially "empty" objects, the savings can be significant. If you don't do a lot of copying, two-phase construction can have a negative impact, because it adds a new level of indirection.
三、使用two-phase construction 解决calling virtual during initialization 问题
参考:
1.http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Calling_Virtuals_During_Initialization
2.http://www.parashift.com/c++-faq-lite/strange-inheritance.html#faq-23.6
the Dynamic Binding During Initialization idiom (AKA Calling Virtuals During Initialization).
To clarify, we're talking about this situation:
class Base {public:
Base();
...
virtual void foo(int n) const; // often pure virtual
virtual double bar() const; // often pure virtual
// if you don't want outsiders calling these, make them protected
};
Base::Base()
{
... foo(42) ... bar() ...
// these will not use dynamic binding
// goal: simulate dynamic binding in those calls
}
class Derived : public Base {
public:
...
virtual void foo(int n) const;
virtual double bar() const;
};
This FAQ shows some ways to simulate dynamic binding as if the calls made in Base's constructor dynamically bound to the this object's derived class. The ways we'll show have tradeoffs, so choose the one that best fits your needs, or make up another.
The first approach is a two-phase initialization. In Phase I, someone calls the actual constructor; in Phase II, someone calls an "init" method on the object. Dynamic binding on the this object works fine during Phase II, and Phase II is conceptually part of construction, so we simply move some code from the original Base::Base() into Base::init().
class Base {public:
void init(); // may or may not be virtual
...
virtual void foo(int n) const; // often pure virtual
virtual double bar() const; // often pure virtual
};
void Base::init()
{
... foo(42) ... bar() ...
// most of this is copied from the original Base::Base()
}
class Derived : public Base {
public:
...
virtual void foo(int n) const;
virtual double bar() const;
};
The only remaining issues are determining where to call Phase I and where to call Phase II. There are many variations on where these calls can live; we will consider two.
The first variation is simplest initially, though the code that actually wants to create objects requires a tiny bit of programmer self-discipline, which in practice means you're doomed. Seriously, if there are only one or two places that actually create objects of this hierarchy, the programmer self-discipline is quite localized and shouldn't cause problems.
In this variation, the code that is creating the object explicitly executes both phases. When executing Phase I, the code creating the object either knows the object's exact class (e.g., new Derived() or perhaps a local Derived object), or doesn't know the object's exact class (e.g., the virtual constructor idiom or some other factory). The "doesn't know" case is strongly preferred when you want to make it easy to plug-in new derived classes.
Note: Phase I often, but not always, allocates the object from the heap. When it does, you should store the pointer in some sort of managed pointer, such as a std::auto_ptr, a reference counted pointer, or some other object whose destructor deletes the allocation. This is the best way to prevent memory leaks when Phase II might throw exceptions. The following example assumes Phase I allocates the object from the heap.
#include <memory>void joe_user()
{
std::auto_ptr<Base> p(/*...somehow create a Derived object via new...*/);
p->init();
...
}
The second variation is to combine the first two lines of the joe_user function into some create function. That's almost always the right thing to do when there are lots of joe_user-like functions. For example, if you're using some kind of factory, such as a registry and the virtual constructor idiom, you could move those two lines into a static method called Base::create():
#include <memory>class Base {
public:
...
typedef std::auto_ptr<Base> Ptr; // typedefs simplify the code
template <class D, class Parameter>...
static Ptr Create (Parameter p)
{
std::auto_ptr <Base> ptr (new D (p));
ptr->init ();
return ptr;
}
};
This simplifies all the joe_user-like functions (a little), but more importantly, it reduces the chance that any of them will create a Derived object without also calling init() on it.
void joe_user(){
Base::Ptr b = Base::Create <Derived> ("para");
}
If you're sufficiently clever and motivated, you can even eliminate the chance that someone could create a Derived object without also calling init() on it. An important step in achieving that goal is to make Derived's constructors, including its copy constructor, protected or private..