In this article, we will try to make our programs working faster using the C++ software algorithms optimization tips, in particular C++ low-level optimization tips. We will consider template-based C++ software memory usage optimization strategy. It should be clear that all methods described in this article should be used very carefully and just in the exceptional cases: usually we have to pay for all low-level optimization elements that we used by flexibility, portability, clearness or scalability of the resulted application.
But if you have exactly that specific case and have no way back – then you’re welcome.
Team Leader of Network Security Team
Table of Content
- The optimization of the “Concrete Factories” (variations about Pimpl idiom)
- The optimization by “Arena”
“In computing, optimization is the process of modifying a system to make some aspect of it work more efficiently or use fewer resources. The system may be a single computer program, a collection of computers or even an entire network such as the Internet.
Although the word "optimization" shares the same root as "optimal," it is rare for the process of optimization to produce a truly optimal system. Often there is no “one size fits all” design which works well in all cases, so engineers make trade-offs to optimize the attributes of greatest interest.
Donald Knuth made the following statement on optimization: “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.” An alternative approach is to design first, code from the design and then profile/benchmark the resulting code to see which parts should be optimized."
The optimization of the “Concrete Factories”
I think that a lot of you repeatedly met the classic ”Factory” pattern – or “Concrete Factory” in GoF. terminology:
This pattern is great for avoiding endless if’s and switch’es through over the code, but it has one unpleasant disadvantage. It is concerned with excessive usage of dynamic memory that sometimes affects badly on the C++ program performance. We will try to cure this patter of it – with certain reservations of course.
Let’s follow the factory usage process again:
| || |
We define some container for the production, the place where created object will be stored.
| || |
We create an object in this container by means of the factory.
| || |
We use the object via the defined interface
| || |
or pass the container ownership to somebody else.
To introduce optimization of memory usage in the C++ software with the described life cycle of the production we can use the following statements:
- The special form of the new operator – placement new – enables to create objects in the custom “row” buffer. For example:
It is very useful taking into account the fact that we can allocate the buffer more effectively than the standard new implementation does. But actually the using of placement new also adds some difficulties to the developer’s life:
- Row buffer should be aligned by the platform-dependant range;
- The destructor of the created object should be implemented manually.
- Stack is the great alternative for the heap. It would be tempting to use buffer on the stack as the container, i.e. to use
instead of the original
The main problem of the container on the stack creation is that it’s impossible to allocate an object of the custom (unknown while compiling) size in the standard C++.
- But as soon as our factory is the concrete one (and not an abstract) and we know about all of its production types, we certainly will be able to know the maximal size of the object-production on the compilation stage. For example we can use the type lists:
We can develop the recursive compile-time function to calculate the maximal size of the object from the types in the list:
Then we can create the list of all possible production types for the factory CFactory:
As the result now we can be 100% sure that each produced object will be placed in the buffer of MaxObjectSize (if we’ve developed the type list correctly, of course):
It can be easily allocated in the stack.
- As far as our container should be able to store the objects of different types we have a right to expect some help from them in the form of the corresponding interface support:
I.e. an object that wants to live in our container should be able to:
- Destroy the objects of its type by the certain address;
- Use the Create And Swap technology for passing object ownership (optional);
- Use the Create And Copy technology for object copy creation (optional).
The structure of the container can be represented in the following scheme:
I.e. our container includes row buffer and two pointers to the different virtual bases of the object placed in the row buffer:
- Pointer to IManageable for the management of the object life cycle;
- Pointer to the user interface IObject which methods the factory user, in fact, wants to call.
As far as we don’t want to spend efforts on adding the support of the IManageable interface to the each production class it makes sense to develop the pattern manageable that will do it automatically:
The pattern is parameterized by object type and flags that define what methods should be supported. For example, if we specify the allow_copy flag then the compiler will require the constructor of coping from the object for the CreateAndCopy method implementation; similarly, if we specify the allow_swap flag then the CreateAndSwap function will be generated – it will be based on the method of the swap object, that we should develop ourselves.
So our optimized factory now looks as following:
And it is as easy to use as the original one:
But the functioning of our creation will be much faster (see fast_object_sample in the attachments).
All source, examples, performance and unit tests can be found in the lib
MakeItFaster in the attachments. There are also projects for VC++ 7.1 and VC++ 8.0.
See: cmnFastObjects.h, fast_object_sample
The optimization by «Arena»
The second C++ memroy usage optimization method proposed is much simpler but the scope of its application is a little bit smaller.
Let’s suppose that two conditions are met:
- Algorithm doesn’t have any side effects;
- We can predict the maximal size of the dynamic memory allocated for an iteration (let’s name it
In this case we can use the “Arena” pattern for its optimization. The essence of the pattern is rather simple. For its implementation we:
- replace the standard new/delete with our own ones;
- register some buffer-arena pBuffer with
MaxHeapUsagesize before the algorithm starts and set the index pointing on the start of free space
- in the new handlers we allocate memory directly in the buffer by moving
FreeIndexon the allocation value. Naturally, we return
((char *) pBuffer + oldFreeIndex);
FreeIndexto 0 after each iteration and so dispose all memory that iteration has allocated for its needs;
- unregister our buffer-arena after algorithm finishes.
It’s very simple and effective. It’s also very dangerous pattern because it’s rather hard to guarantee the first condition fulfillment in the production code. But this pattern is good for calculation concerned tasks (for example in the game development).
When using STL containers the concrete instance of the arena can be referred to the container in such a way that the definition and usage of the container will be almost the same to those of the original one, for example:
In this example all memory for the object map1 is allocated in the extendable buffer CGrowingArena, allocated memory will be disposed in the destructor when destroying the object.
All source, examples, performance and unit tests can be found in the lib MakeItFaster in the attachments. There are also projects for VC++ 7.1 and VC++ 8.0.
See: cmnArena.h/ win32_arena_tests, win32_arena_sample
Read about code and algorithm optimization in this Intel Vtune Tutorial.
We love C++ programming! Learn more about Apriorit C/C++ software development services.