RMalloc

RMalloc is a malloc implementation which means, a dynamic memory allocator .

One reason for this version of malloc was a problem I encountered with one of my other programs when allocating several gigabytes. Because of the used algorithms I deallocated and allocated most of this memory (which was used by relativly small chunks). This behaviour confused all other malloc implementations I tested and led to unacceptable program-performance. So I decided to do it by myself and wrote this lib which performed MUCH better.

The first version was a simple segregated storage allocator without splitting and coalescing. Allthough it was much faster and scaled very well wrt. the number of processors, it suffered from severe fragmentation (as is known for such kind of allocators) when used with a special variant of one of my other projects (see libHmatrix). But it performs very well if the application only uses a limited number of different sizes when allocating blocks.

To reduce this fragmentation I completly rewrote the allocator and added splitting and coalescing. It now works similar to LKmalloc by using private heaps with ownership for each threads. Thereby each memory-chunk is put back into the heap, it was allocated from.

The main differences between rmalloc and LKmalloc are the handling of small blocks, the allocation of memory from the operating system and the mapping of heaps to threads.

For performance reasons small chunk are only allocated from containers, which hold several equal sized chunks. Especially if a lot of small blocks are allocated, this improves the overall performance of the allocator. This also avoids splitting and coalescing these small blocks (which might reduce fragmentation).

Memory from the operating system is not allocated in equal sized stripes (as in LKmalloc) but in dynamically growing chunks via mmap (or sbrk). This reduces the number of calls to these system-functions. The size of each chunk is adjusted according to the memory in use.

Heaps are associated to threads by using thread-private data (via PThread-functions). Thereby the number of heaps is automatically adjusted to the number of concurrent threads (LKmalloc need some initialisation for this).

Quelltext

RMalloc is free software where free is defined by the GNU Lesser General Public License. So share and enjoy ;).

rmalloc.c
rmalloc-1.1.c
rmalloc-1.0.3.c
rmalloc-1.0.2.c
rmalloc-1.0.1.c
rmalloc-1.0.c
rmalloc-1.0-pre5.c
rmalloc-1.0-pre4.c
rmalloc-1.0-pre3.c
rmalloc-1.0-pre2.c
rmalloc-1.0-pre1.c
rmalloc-0.98.3.c
rmalloc-0.98.2.c
rmalloc-0.98.1.c
rmalloc-0.98.c
rmalloc-0.97.c
rmalloc-0.96.c
rmalloc-0.95.c
rmalloc-0.94.c
rmalloc-0.93.c
rmalloc-0.92.c
rmalloc-0.91.c
rmalloc-0.90.c
rmalloc.h
rmalloc-0.95.h
rmalloc-0.94.h
rmalloc-0.93.h
rmalloc-0.92.h
rmalloc-0.91.h

ChangeLog

v1.1

v1.0.3

v1.0.2

v1.0.1

v1.0

v1.0-pre5

v1.0-pre4

v1.0-pre3

v1.0-pre2

v1.0-pre1

v0.99

v0.98.3

v0.98.2

v0.98.1

v0.98

v0.97

v0.96

v0.95

v0.94

v0.93

v0.92

v0.91

v0.90

Benchmarks

These benchmarks are similar to the benchmarks used to show the properties of hoard.

ThreadBench-Eq

This benchmark allocates a lot of small, equal sized chunks in each thread and tests the parallel efficiency.

The following table shows the time for the different allocators when running sequentially.

Sun-malloc MT-malloc Hoard PTmalloc Rmalloc
132.2 sec 219.7 sec 131.7 sec 119.8 sec 115.7 sec

The speedup wrt. the number of processors is shown in the following figure. speedup in thrbench-eq

Larson

This benchmarks simulates the allocation behaviour of a typical server application with multiple threads.

The number of allocations per time on one processor is shown in the following table.

Sun-malloc MT-malloc Hoard PTmalloc Rmalloc
579345 580463 622875 577604 753322

The speedup of the different allocators is shown in the next figure. speedup in larson