Wine bug only; works OK on real Windows.[1][2] (bugs.winehq.org seems to be down)
void *sane_realloc(void *ptr, size_t size)
{
if (ptr == 0) {
return malloc(size);
} else if (size == 0) {
free(ptr);
return 0;
} else {
return realloc(ptr, size);
}
}
ISO C realloc has braindamaged corner cases. Some implementations behave like the above, in which case you can just have #define sane_realloc realloc on those targets.With the above you can initialize a vector to null, with size zero, and use nothing but realloc for the entire lifetime management: growing it from zero to nonzero size allocates it, shrinking down to zero frees it.
malloc(0) doesn't necessarily return null; it can return some non-null pointer that can be passed to free. We can get such a thing if we call sane_realloc(0, 0), and can avoid that if we change the malloc line to:
return size ? malloc(size) : 0;
void *sane_realloc(void *ptr, size_t size)
{
if (size == 0) {
free(ptr);
return 0;
} else {
return realloc(ptr, size);
}
}
Unfortunately, when shrinking an array down to 0, you run into a complication. Detecting allocation failure now requires checking both size > 0 and sane_realloc returning 0. To simplify this further, just always allocate a non-zero size. void *saner_realloc(void *ptr, size_t size)
{
if (size == 0) {
size = 1;
}
return realloc(ptr, size);
}
According to ISO C, size zero can behave like this:
free(old)
return malloc(0)
and if malloc(0) allocates something, we have not achieved freeing.There are ways to implement malloc(0) such that it returns unique pointers, without allocating memory. Or at least not very much memory. For instance we can use the 64 bit space to have some range of (unmapped) virtual addresses where we allocate bytes, and use a compact bitmask (actually allocated somewhere) to keep track of them.
Such a scheme was described by Tim Rentsch in the Usenet newsgroup comp.lang.c.
If an implementation does such a thing, adjusting the size to 1 will defeat it; allocations of size 1 need real memory.
(I can't fathom the requirement why we need malloc(0) to be a source of unique values, and why someone would implement that as efficiently as possible, when it's implementation-defined behavior that portable programs cannot rely on. Why wouldn't you use some library module for unique, space-efficient pointers.)
I would never rely malloc(0) to obtain unique pointers at all, let alone pray that it is efficient for that purpose.
I'd be happy with a malloc(0) which returns, for instance, ((void *) -1) which can be hidden behind some #define symbol.
saner_realloc isn't realloc; it is our API, and we can make it do this:
#define SANE_REALLOC_EMPTY ((void *) -1)
void *sane_realloc(void *ptr, size_t size)
{
if (ptr == 0 || ptr == SANE_REALLOC_EMPTY) {
return size ? malloc(size) : SANE_REALLOC_EMPTY;
} else if (size == 0) {
free(ptr);
return SANE_REALLOC_EMPTY;
} else {
return realloc(ptr, size);
}
}
Now, a null return always means failure. The shrink to zero, or allocate zero cases give us SANE_REALLOC_EMPTY which tests unequal to null, and we accept that value for growing or freeing.The caller can also pass in something returned by malloc(0) that is not equal to null or SANE_REALLOC_EMPTY.
I think my sane_realloc never freeing has much simpler behavior. As much as I hate the needless waste of 1 byte, if my code allocates thousands of 0-sized objects, I'd rather fix that before adding complexity to my sane_realloc.
With yours solving the 1 byte problem, it still interests me. We can simplify your code slightly.
#define SANE_REALLOC_EMPTY ((void *) -1)
void *sane_realloc(void *ptr, size_t size)
{
if (ptr == SANE_REALLOC_EMPTY) {
ptr = 0;
}
if (size == 0) {
free(ptr);
return SANE_REALLOC_EMPTY;
} else {
return realloc(ptr, size);
}
}
Of course, C++ being C++, the language-level position is "it's all UB" either way (except for implicit-lifetime types), and even the proposals for trivial relocation make you go through a special function [0].
[0] https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p27...
Or more complicated (if anticipating a lot of allocation work for very varying buffer sizes) - e.g. slab allocators. memcached is an example where this is used, a couple pictures explain the gist.[2]
[1]: note: can be even simpler of course, but quick example of structs used:
```
typedef struct
char *memory;
size_t size;
size_t used;
} memory_arena_t;
typedef struct {
memory_arena_t *arenas;
size_t arena_count;
size_t max_arenas;
size_t arena_size;
size_t total_size;
size_t total_used;
} memory_allocator_t;
```[2]: https://siemens.blog/posts/memcached-memory-model/ - I'm sure that when heap is visualised, it would show how this helps keeping fragmentation at bay as well (this helps wasting fewer memory pages, too).
Enter myself and a buddy of mine. First thing we discovered was that they were using regular java.lang.Strings for all the string manipulation, and it'd garbage collect for between 30 and 50 seconds every minute once the process got rolling. It used a positively criminal number of threads as well in our predecessor's desperate attempt to make it go faster. SO much time was spent swapping threads on CPUs and garbage collecting that almost no real work got done.
Enter the StringBuffer rotation scheme. John and I decided to use the backup GS-160 as a hub to read source data and distribute it among 16 of our floor's desktop machines as an experiment. The hub was written in C++ and did very little other than read a series of fixed-length records from a number of source files and package them up into payloads to ship over socket to the readers.
The readers gut-rehabbed the Java code and swapped out StringBuffer for String (and io for nio) to take the majority of garbage collection out of the picture.
The trick we employed was to pre-allocate a hoard of StringBuffers with a minimum storage size and put them in a checkin/checkout "repository" where the process could ask for N buffers (generally one per string column) and it'd get a bunch of randomly selected ones from the repo. They'd get used and checked back in dirty. Any buffer that was over a "terminal length" when it was checked in would be discarded and a new buffer would be added in its place.
We poked and prodded and when we were finally happy with it, we were down to one garbage collection every 10 minutes on each server. The final build was cut from 30 days to 2.8 and we got allocated a permanent "beowulf cluster" to run our database build.
However, system calls also have overhead (thanks Meltdown/Spectre mitigations), and you might not come ahead by avoiding memory copies.
Seems like you could use shm_open + mmap also.
On Linux, you could probably also do some juggling with mremap + MREMAP_DONTUNMAP, but I don't know why you'd prefer this over memfd_create.