Home Company Development Blog An Implementation of P-way external merge algorithm under Linux
An Implementation of P-way external merge algorithm under Linux PDF Print E-mail
Thursday, 30 June 2011 16:47

This article demonstrates one possible method of merging N sorted large text files using Forecasting algorithm in the style suggested by D.Knuth. The demo app "for_merge" will merge them faster than standard Unix "sort" utility by starting test suite. This code also could be useful for implementation of second part of Merge-Sort algorithm (where merging takes place). Note, that Input files are generated before test suite starts.

Written by:

Oleg Kalashnikov,
Software Developer of Driver Development Team

Table of Contents
1. Introduction
2. Implementation details
2.1 Working environment
2.2 Source code overview
3. Launching
3.1 Command line parameters
3.2 Examples
4. Testing
4.1 Generating input data files
4.2 Launching test suite
5. Results of testing
5.1 SMALL data set
5.2 BIG data set
5.3 Conclusion
6. TODO

1. Introduction

External merging is a term for a class of merging algorithms that can handle massive amounts of data. External merging is required when the data being merged do not fit into the main memory of a computing device (usually RAM) and instead they must reside in the slower external memory (usually a hard drive).

P-way merge is a more general algorithm that allows to merge N pre-sorted lists into one.

Forecasting merge algorithm (P-way merge using 2P + 2 buffers) was suggested by D.Knuth in "The Art of Computer Programming" volume 3, exercise 5.4.6F. It keeps track of the buffer that will be emptied first and uses an extra buffer to read the appropriate next part from the disk, while the contents of the remaining buffers are processed.

The complete article text is available only for the registered users. Please Log In or Register.