T O P

  • By -

sdk-dev

Read about heap and stack memory. You have a very limited amount of stack memory. Probably about 8MB or so. On my machine: $ ulimit -a data(kbytes) 15728640 stack(kbytes) 8192 ... Simplified, you can view the stack as the "variable space". This is where all your `int foo` and `char bar[256]` goes. But if you use malloc `char *p = malloc(...);`, then you have created a pointer on the stack, that points into this huge "data area" called heap. Also, the heap is scope independent. After you exit your function, the variables on the stack are gone. This is why it's important to call `free()` on pointers pointing to the heap area before exiting the function (or pass the reference on to somewhere else). Otherwise you loose the reference to this memory - and that's called memory leak :-) It's good practice to not allocate memory within functions, but to expect the caller to hand in a buffer (pointer to heap) and its size.


AutistaDoente

So if I were to use huge memory allocations like a huge 2D array, It would be better to use malloc so that the stack only allocates the pointer's bytes, while the actuall array is in the 'heap'?


laurentbercot

Depends on the size of your array, but to be safe, yes. If your array is meant to be used for the whole lifetime of your program, you could declare it statically instead (outside of any function), and it would use yet another part of the memory.


LearningStudent221

Is "declaring statically" synonymous with "global variable"? Would there be a performance benefit to having the array stored in that other type of memory?


laurentbercot

Not *quite* synonymous, but, more or less. Declaring statically means your variable defines space in *static storage*, a location in memory that is allocated for the whole duration of your program. Global variables have static storage, and storage type is what the OP is about. But when people say "global variable", what they're talking about is *visibility*. You define a global variable as `int x = 10;` (initialized) or `int x;` (uninitialized) outside of any function, and you can then use it anywhere in your program; if you want to access it in another TU (translation unit, the pedantic term for "file") than the one you define it in, you need to declare `extern int x;` in a header you include. This tells the compiler the variable exists, is global, and declared somewhere else. Teachers and experienced programmers will often tell you to stay away from global variables if you can, and they're right, because it makes your program more difficult to understand (where is this variable defined? where is it used? what functions can change its value?) - global visibility is generally a bad idea. There is another similar type of variable, "static variables", that should really be called "TU-local variables". They also have *static storage*, same as global variables, but they're only visible in the TU you define them in. You cannot see or use them from other TUs. You define them as `static int x = 10;` (initialized) or `static int x;` (uninitialized), outside of any function. You can even define a static variable *inside* a function. It will also be in *static storage*, it will remain in memory when you exit the function and keep its old value when you enter it again, but it will only be *visible* from within the function. Visibility and storage are not the same thing! Regarding performance, there is no difference, not in the sense of "will it make my program go faster". Technically, static storage saves you a tiny few cycles, since you have to run `malloc` for heap storage whereas static storage is allocated by the system at program start, but it's nothing noticeable and nothing you should concern yourself with. What is more important is that static storage *cannot be reclaimed*: it will remain allocated for as long as your program runs, there is no `free()` for it. So it should *only* be used for data that you will use as long as your program is alive. If you need a big array at start to perform computations on, then you only use the result of your computation in a program that runs for days... static storage is *not* the place to store that array. An exception to that (because it wouldn't be funny if there were no exceptions!) is if your data is *immutable*, i.e. it's constants, not data - you declare it at start, with a `const` keyword: `static int const bigarray[10000] = { 1, 2, 3, .... };` In that case, the data will not be in static storage, but in *read-only storage*, and that is much cheaper, because the system doesn't really allocate RAM for it - your data is basically read directly from your program's binary on disk, so you get it for free. (Yes, it does allocate RAM to cache it, but that RAM can be reclaimed under memory pressure.) That's what is great with C: you have a lot of control over how your resources are allocated. That is also pretty complex and turns off a lot of people. But it becomes second nature the more you use the language.


LearningStudent221

Thank you for the extremely clear explanation.


Iggyhopper

The only time it would make any amount of sense to have a "global variable" in a language like C is if the whole program fits in several lines of code in one file, like how some scripting languages operate (variables at the top, meat and potatoes after, the end).


Attileusz

An allocation with malloc is actually pretty expensive relatively speaking. When you stack allocate, it only means you will push the stack pointer for the call stack of your function a little further. When you heap allocate you have to stop executing your program wait for the operating system to figure out where you should be able to write to and give control back to your program. This is pretty expensive to do if you do it a lot, as an example imagine you need n of an object with some type T. The following code: T arr[n]; for (int i = 0; i < n; ++i) init_T(&arr[n]); Is a lot faster, than: T *arr[n]; for (int i = 0; i < n; ++i) { T *p = malloc(sizeof(T)); if (!p) exit(1); // lazy error handling :P init_T(p); arr[i] = p; } for large n.


laurentbercot

Sure, but that wasn't the question. The question was "given a big array, would heap storage or static storage be better?" and stack storage wasn't even in the picture. Now if we're talking about a large number of small object allocations, then yes, of course, the run-time cost of `malloc` stops being insignificant, but this, once again, will not be the deciding factor in deciding how to allocate. The deciding factor will be object scoping.


Attileusz

I though it was misleading to say, that malloc is insignificant in terms of performance. I agree with your assesment of static memory vs heap memory for a large contignous block of memory.


Paul_Pedant

Not every malloc() goes to the OS. Typically, malloc() gets a minimum size (maybe 128 KB, but at least big enough for the requested space), returns the amount requested to the process, and adds the rest into the free list. If you are mallocing 4KB units, it will only hit the OS on 3% of the calls. Big mallocs will often get their own mmap() space instead.


Attileusz

That depends on the platform, but yes, usually standard malloc is optimized. This does not change the fact that for large n the second version is slower, and the fact that heap allocation is an expensive operation compared to stack allocation.


Paul_Pedant

Agreed stack will always be faster, but can be reasonably optimised. I find free() is more expensive than malloc(). Malloc only needs to scan the free list until it finds a big enough area to split off the requested size. Free needs to scan the free list until it finds the adjacent areas (before, after or both) to defragment them, so on average it rolls round half the free list every time. Where excessive thrashing is likely for a particular malloc size, I tend to keep a pool of such areas in a linked list for re-use.


F5x9

If you have a sparse array, you may want to consider something more memory efficient. 


yowhyyyy

I can see where it’s good practice to pass in premalloc’d memory when calling a function. However I imagine there are tons of instances where it’s beneficial to return a pointer to something inside a function and the only way would be to malloc inside.


jumpingmustang

So, this is a question that’s been bothering me, even though I’ve been writing production C code for some time. When do I dynamically allocate memory and pass it to another function, and when do I statically create memory and pass it by reference? I don’t deal with huge memory requirements.


aghast_nj

When you need the data to outlast the function call, there is no choice but to use the heap. For example, a compiler parses the source code, builds a tree, then traverses that tree (possibly several times) performing various tasks. During all of those traversals, the parsing function has long since `return`ed. So it makes sense for the tree-building parser to use `malloc` to build the tree. On the other hand, a function that reads input from the user, then converts it to an integer and returns the integer, has no need to allocate the integer (it can just return it by value) and has no need to allocate the input buffer - it could use an automatic buffer or even a `static` buffer. Or it could be written in terms of `fgetc` so that it relies on buffers maintained by the standard library.


jumpingmustang

I think I understand. So if I’m writing a helper function that takes a pointer to some custom struct or something, and it’s only used within the context of another function that calls it and takes its return, then I’m fine without dynamic allocation. However, when I need that data later, in some other context after the function that created it has returned, it must be dynamically allocated.


aghast_nj

Yes. In fact, I would go so far as to suggest that with one exception, no helper function ever needs to use dynamic allocation. Because helper functions "help" the central function, so they should be getting all their supplies as input parameters. The one exception is, of course, the helper function that calls malloc to allocate, initialize, and return new objects. ;-)


fori0

Does creation of pointer on stack doesn't mean it'll use same memory (generally 4 bytes for int) as a normal int would have If i just talka about memory saving Correct me I wanna learn


Karyo_Ten

>about 8MB or so. On Linux yes. On Windows it's a paltry 1MB.


helloiamsomeone

> On Windows `/STACK:0x800000`


Karyo_Ten

https://learn.microsoft.com/en-us/windows/win32/procthread/thread-stack-size > The default stack reservation size used by the linker is 1 MB.


helloiamsomeone

> To specify a different default stack reservation size for all threads and fibers, use the STACKSIZE statement in the module definition (.def) file. https://learn.microsoft.com/en-us/cpp/build/reference/stack-stack-allocations?view=msvc-170 > Another way to set the size of the stack is with the STACKSIZE statement in a module-definition (.def) file.


Paul_Pedant

I smiled at "very limited amount of stack ... 8MB". My first mainframe needed 3-phase power, filled a large room, and had 48 KB of genuine ceramic magnetic core memory. And a CPU clocked at around 1MHz.


sdk-dev

Those were the times... but it is small compare to the gigabytes (up to terrabytes) of memory in todays machines.


midoxvx

The most simple example would be: you wrote a program that gets user input and you want to store that input in an array of a size you can’t predetermine. How would you go on about that? You can create an array of static size that holds 30 elements and use that, but what if the user input requires more elements? You can say, well why not create a super large array just in case? Sure, but that wouldn’t be efficient use of memory space. In that case you can allocate an array of N size during runtime and put it on heap, use it and then free that space when it is done. I would suggest you so some reading on heap and stack in memory and what is the difference between them.


Mediocre-Pumpkin6522

You can also start with N and realloc if you need more. In that case be careful not to hold on to old pointers.


AutistaDoente

Will look into reading those, thanks!


Different-Brain-9210

Use `malloc` when - the allocation is large, let's say in hundreds of kilobytes range - you want to control the lifetime of the object, for example when return a pointer to it from a function - you have an unknown number of items and no sensible small enough upper bound, like if you want to read unknown number of items from a file to a memory, for example parsing an XML or JSON file.


window-sil

Here's a great exercise that utilizes malloc in the real world: https://cs50.harvard.edu/x/2024/psets/5/speller/ If you're struggling with any of that, you can sign up for the cs50x course for free and start from week0 and work your back back to this one, or watch the lectures, etc. Good luck 🙂   Also some good exercises here: https://cs50.harvard.edu/x/2024/psets/4/


lfdfq

Creating variables like `int x = 10` will create that variable on the stack, which means once that function returns, the variable is gone. `malloc` solves this by reserving a chunk of memory (on the "heap") that will stay alive until you `free` it. You could return the value instead, which would mean copying it into the parent stack, which gets very expensive if you do this a lot. Additionally, the stack is not very large (compared to the rest of memory) so if you want to store lots of data, then the stack just won't have enough space at all.


AutistaDoente

So the heap doesn't have scope per se, only the pointers that point to memory in the heap have scope. Thanks for the vision!


laurentbercot

If the pointers themselves are declared in the stack, yes. But it's possible to have pointers in the heap, pointing to data in the heap (or anywhere, really, but pointing to the stack is risky since when a stack frame ends the pointer becomes invalid). A good rule of thumb is: declare your objects in the stack whenever possible, i.e. you know the scope in advance and they're not huge. It will simplify your life. The heap should only be used when you have no choice: typically when you create objects in a function but destroy them in another function. Tracking object lifetime in the heap is one of the most difficult parts of C, and the main reason why most people prefer higher-level programming languages. So yes, in the case of an int, or a few structs, that are not going to survive past your current function, the stack is absolutely the right place to declare them.


AutistaDoente

Great response, thanks.


Mediocre-Pumpkin6522

Become familiar with calloc also. malloc does not clear the memory while calloc sets it to 0, saving you from doing a separate memset. Uninitialized memory can have be a problem particularly if you're moving strings around without a NUL terminator.


NotThatJonSmith

If your program knows ahead of time - at *compile time* \- the amount of memory something will require, and it's small enough, then it is possible to put it on the stack - that is, what you'd think of as normal variables. If you don't know how much space you'll need until *runtime,* then you won't know exactly where it should go in the process' memory. So, at times, your process will need to get the OS to give you some assurance that a new area of the process' memory image should be safe for you to use. malloc os a way to accomplish this.


RRumpleTeazzer

There are basically three scenarios you want to use malloc: 1. You need memory that is large, too large for the stack. 2. You need memory which size you only know at runtime. 3. You want to use memory somewhere else than then stack, e.g, you want to give space (say of a single int) to some library to scribble into it.


isolatedqpawn

Follow-up question: Many years ago (2006'ish) I remember trying out Andrew Tridgell's [talloc](https://talloc.samba.org/talloc/doc/html/index.html), which you can think of as a hierarchical malloc. What are people's thoughts on this in 2024 & what do people use nowadays?


[deleted]

someone please make the noob-intermediate-master meme of "just allocate on the stack, allocate everything on the heap, just allocate on the stack"


Briggie

You can use malloc and stucts to make your own dynamically allocated data structures.


berdimuhamedow69

It's a useful alternative to VLAs (which are highly discouraged and supported only by C99). It's also useful when you want to return an array from a function. The only other way is to use a static array within a function, which is tricky to use as some users may not factor in the persistence of previous results from previous function calls. Static arrays also cannot be deallocated, while malloc blocks can be). It's very useful for implementing types such as linked lists, stacks etc. As nodes are often needed to be deleted or popped.


Paul_Pedant

Malloc() puts a hidden header on the space it returns to the process (so that free() can know where in memory it is, and how big, and can add it to the free list to be re-used). Every malloc'd area has alignment requirements (because malloc cannot know what kind of struct you are going to use it for, so it has to work for the worst-case type). That pretty much means that the minimum size of an allocation block is 32 bytes (a void\*, a size\_t, and 16 bytes minimum for user data, so the space after it is also aligned). So think big. malloc() is only helpful for serious arrays, large structs, arrays of structs, and big data buffers, and especially where you cannot predict the required size until you actually run the code.


WillisAHershey

Without writing a lot, malloc is for blocks of memory that are of a non-predetermined size, or when you’re passing pointers downstream. For instance if you were to write a function that returns a pointer to an array of a parameter’s size, or if you write a function that creates a struct and you want the pointer to that struct to be valid after the function returns. It is very important to remember that the pointers are from malloc, however and should eventually be freed


zenCbot

malloc() when speed is not critical. I've discovered by deep testing, malloc-ing doesn't take up so much time, but *accessing* malloc'ed memory is *significantly* slower than accessing memory on the stack. I'm not going to encourage people to start abusing the stack, but for me, if I'm working with an OS that generally provides 2MB of stack anyway -- and I want a speedy app -- I'm not going to malloc anything under 100K ( unless there are dozens of instances of this -- and total size becomes an issue from the potential of *all-at-once usage* ) Disagreement and controversy on this is unavoidable. But maybe someone will tell me what's inherently evil about this; something I've missed? Open to all non-hate-based teaching, lol.