| Session of low-level optimization of memory usage in the C++ programs with the total exposure |
|
|
|
| Software Development Articles | ||||||||||
| Tuesday, 16 June 2009 16:43 | ||||||||||
|
In this article, we will try to make our algorithms working faster using the methods of low-level optimization of memory allocation in C++. It should be clear that all methods described in this article should be used very carefully and just in the exceptional cases: usually we have to pay for all low-level optimization elements that we used by flexibility, portability, clearness or scalability of the resulted application. But if you have exactly that specific case and have no way back – then you’re welcome.
Author: Table of Content
“In computing, optimization is the process of modifying a system to make some aspect of it work more efficiently or use fewer resources. The system may be a single computer program, a collection of computers or even an entire network such as the Internet. (с) wiki The optimization of the “Concrete Factories”I think that a lot of you repeatedly met the classic ”Factory” pattern – or “Concrete Factory” in GoF. terminology: struct IObject
{
virtual ~IObject(){}
virtual void DoIt()=0;
};
class CFactory
{
public:
void CreateSmth(int objType, std::auto_ptr * pResult)
{
if (iValuereset(new CObject1);
}
else
{
pResult->reset(new CObject2);
}
}
};
This pattern is great for avoiding endless if’s and switch’es through over the code, but it has one unpleasant disadvantage. It is concerned with excessive usage of dynamic memory that sometimes affects badly on the C++ program performance. We will try to cure this patter of it – with certain reservations of course. Let’s follow the factory usage process again:
To optimize the described life cycle of the production we can use the following statements:
char buffer[1024]; new(buffer)CObject(a,b,c); It is very useful taking into account the fact that we can allocate the buffer more effectively than the standard new implementation does. But actually the using of placement new also adds some difficulties to the developer’s life:
//--------------------------------
// typelist basics
//--------------------------------
template
struct Node
{
};
struct NullNode
{
};
We can develop the recursive compile-time function to calculate the maximal size of the object from the types in the list: template
struct GetMaxSize
{
};
template
struct GetMaxSize >
{
static const size_t TailSize = GetMaxSize::Result;
static const size_t Result = (TailSize > sizeof(Head) )
? TailSize : sizeof(Head);
};
template
struct GetMaxSize
{
static const size_t Result = 0;
};
Then we can create the list of all possible production types for the factory CFactory: typedef utils::Node,
utils::Node,
utils::NullNode
>
> ObjectList;
static const size_t MaxObjectSize = utils::GetMaxSize::Result;
As the result now we can be 100% sure that each produced object will be placed in the buffer of MaxObjectSize (if we’ve developed the type list correctly, of course): char buffer[ MaxObjectSize ]; It can be easily allocated in the stack.
struct IManageable
{
virtual ~IManageable(){}
virtual void DestroyObject(void * pObject)=0;
virtual void CreateAndSwap(void * pObject, int iMaxSize)=0;
virtual void CreateAndCopy(void * pObject, int iMaxSize)=0;
};
I.e. an object that wants to live in our container should be able to:
The structure of the container can be represented in the following scheme: I.e. our container includes row buffer and two pointers to the different virtual bases of the object placed in the row buffer:
As far as we don’t want to spend efforts on adding the support of the IManageable interface to the each production class it makes sense to develop the pattern manageable that will do it automatically: // manageable control flags (iFlags field)
const int allow_std_swap = 1;
const int allow_copy = 2;
const int allow_all = 3;
// class manageable: wrapper, provides binary interface to manage object's life
template
class manageable:public IManageable, public ImplType
{
typedef manageable ThisType;
virtual void DestroyObject(void * pObject)
{
((ThisType*)pObject)->~ThisType();
}
// CreateAndSwap
template
void CreateAndSwapImpl(void * /*pObject*/, int /*iMaxSize*/)
{
throw std::runtime_error("Swap method is not supported");
}
template
void CreateAndSwapImpl(void * pObject, int /*iMaxSize*/)
{
ThisType * pNewObject = new(pObject)ThisType();
pNewObject->swap(*this);
}
virtual void CreateAndSwap(void * pObject, int iMaxSize)
{
if (sizeof(ThisType)>iMaxSize)
throw std::runtime_error("Object too large: swap method failed");
CreateAndSwapImpl(pObject, iMaxSize);
}
// CreateAndCopy
template
void CreateAndCopyImpl(void * /*pObject*/, int /*iMaxSize*/)
{
throw std::runtime_error("Copy method is not supported");
}
template
void CreateAndCopyImpl(void * pObject, int /*iMaxSize*/)
{
new(pObject)ThisType(*this);
}
virtual void CreateAndCopy(void * pObject, int iMaxSize)
{
if (sizeof(ThisType)>iMaxSize)
throw std::runtime_error("Object too large: copy method failed");
CreateAndCopyImpl(pObject, iMaxSize);
}
public:
manageable()
{
}
template
manageable(Type0 param0)
: ImplType(param0)
{
}
};
The pattern is parameterized by object type and flags that define what methods should be supported. For example, if we specify the allow_copy flag then the compiler will require the constructor of coping from the object for the CreateAndCopy method implementation; similarly, if we specify the allow_swap flag then the CreateAndSwap function will be generated – it will be based on the method of the swap object, that we should develop ourselves. class CFastConcreteFactory
{
public:
typedef utils::Node,
utils::Node,
utils::NullNode
>
> ObjectList;
static const size_t MaxObjectSize = utils::GetMaxSize::Result;
typedef utils::CFastObject AnyObject;
void CreateSmth(int iValue, AnyObject * pResult)
{
if (iValueCreateByCopy(utils::manageable());
}
else
{
pResult->CreateByCopy(utils::manageable());
}
}
};
And it is as easy to use as the original one: // usage sample CFastConcreteFactory factory; CFastConcreteFactory::AnyObject product; factory.CreateSmth(-1, &product); product->DoIt(); // copy sample CFastConcreteFactory::AnyObject another_product; product.Copy( &another_product ); But the functioning of our creation will be much faster (see fast_object_sample in the attachments). All source, examples, performance and unit tests can be found in the lib See: cmnFastObjects.h, fast_object_sample The optimization by «Arena»The second optimization method proposed is much simpler but the scope of its application is a little bit smaller. Let’s imagine that we have some iterative algorithm that repeats some action N times: int mega_algorithm(int N)
{
int Count =0;
for(int i =0 ; i
Let’s suppose that two conditions are met:
In this case we can use the “Arena” pattern for its optimization. The essence of the pattern is rather simple. For its implementation we:
It’s very simple and effective. It’s also very dangerous pattern because it’s rather hard to guarantee the first condition fulfillment in the production code. But this pattern is good for calculation concerned tasks (for example in the game development). When using STL containers the concrete instance of the arena can be referred to the container in such a way that the definition and usage of the container will be almost the same to those of the original one, for example: typedef utils::arena_custom_map >::result Map1_type;
Map1_type map1;
for(int i = 0;i
In this example all memory for the object map1 is allocated in the extendable buffer CGrowingArena, allocated memory will be disposed in the destructor when destroying the object. All source, examples, performance and unit tests can be found in the lib MakeItFaster in the attachments. There are also projects for VC++ 7.1 and VC++ 8.0. See: cmnArena.h/ win32_arena_tests, win32_arena_sample Download attachments. |






