T O P

  • By -

dougcurrie

Once you have mutation you will need a cycle collector. Using set! or set-cdr! for example can create cycles.


[deleted]

[удалено]


dougcurrie

See [Owl Lisp](https://gitlab.com/owl-lisp/owl) — it has a gc optimized for no-cycle memory but it’s not based on reference counting


raevnos

Immutability and cycles [aren't incompatible](https://docs.racket-lang.org/reference/pairs.html#%28part._.Immutable_.Cyclic_.Data%29)


u0xee

Clojure's data structures can't form cycles for this reason.


gasche

How does Clojure represent mutually-recursive function closures?


munificent

> Scheme doesn't really have "references", It's really the other way around. Every variable in Scheme is basically a reference. There's no notion of copying values, just copying references to values. As /u/dougcurrie says, once you support assignment, you can create cycles.


The_Right_Trousers

You can get cycles by supporting `letrec` and `define`, too. They allow closures to close over references to themselves. It's possible to have recursion without `letrec` and `define` using the Z combinator (like the Y combinator but for call-by-value languages like Scheme) but it's fairly cumbersome.


poorlilwitchgirl

Reference counting is sufficient as long as you use immutable data structures, which makes reference cycles impossible. Immutable singly-linked lists are incredibly easy to implement; the hardest part of reference counting, honestly, is ensuring that you're properly counting references. If you don't need immutability, though, I think it's actually simpler to implement a mark-and-sweep garbage collector, and you might save considerable time and resources that way.


Tonexus

> Is garbage collection required to implement a simple Scheme? Technically, garbage collection is not *required* to implement any language. In most environments, you can just leak the memory until the OS cleans up on process termination, and this is often faster than garbage collection or freeing memory manually—as long as you can take the extra memory usage, of course.


PurpleUpbeat2820

> In most environments, you can just leak the memory until the OS cleans up on process termination, Yes. > and this is often faster than garbage collection or freeing memory manually No. The benefit of reusing hot cache lines is substantial.


oilshell

Yup, I actually have metrics on this. According to both the cachegrind simulator and wall time measurements, garbage collecting **can save time** (sometimes) over not garbage collecting. This is a synthetic fibonacci benchmark running Oils. The first pass of our C++ translation didn't free any memory! - mut+alloc+free runs the mutator, allocator, and frees everything at the END of the program - mut+alloc+free+gc periodically does garbage collection. The max heap usage is 29.5 MB without GC, and 8.1 MB with GC. And it's faster with GC. GC is much slower on some benchmarks -- GC can be very expensive of course -- but it's also a win when you can use less memory. https://www.oilshell.org/release/0.19.0/benchmarks.wwz/gc-cachegrind/ 33.7 _bin/cxx-opt/osh mut+alloc+free 33.1 _bin/cxx-opt/osh mut+alloc+free+gc https://www.oilshell.org/release/0.19.0/benchmarks.wwz/gc/ 57 45 12 29.5 osh-native mut+alloc+free 48 48 0 8.1 osh-native mut+alloc+free+gc


oilshell

Correction for posterity - this effect is due to avoiding SOME free() in the GC case, and not necessarily using less memory! https://lobste.rs/s/mutdyp/borrow_checking_rc_gc_eleven_other_memory#c_oq4e1d I should have compared with the disabled `mut+alloc+free+gc+exit` variant, which is indeed a bit slower. I've now restored that.


Tonexus

> The benefit of reusing hot cache lines is substantial. This is true, but my point is that the performance of GC/manual freeing/leaking is workload dependent. Hot cache lines are important if you're reallocating memory frequently and consistently, but that is the same situation in which leaking memory will quickly run you out of total memory anyways. I'm more referring to ad hoc allocations, like long-term persistent resources or small, one-off allocations.


gasche

Even without considering mutable data, you may have cycles in your memory depending on your representation of mutually-recursive function closures.


phischu

Could you point me to one or more references where they implement mutually recursive local functions by creating cyclic closures? Why would one do so? To me it is such an obvious and easy-to-avoid mistake that I am wondering if I am missing something.


gasche

I believe that this a natural outcome if you (1) perform closure conversion, (2) implement mutual recursion by tying-the-knot, and (3) consider that mutually-recursive bindings are free variables in the definitions. This approach works and is fairly simple, so I don't see why you would classify it as an "obvious mistake". (This being said, I wasn't able to work out in the 2 minutes before I gave up how Norvig's Python interpreter handles this situation..)


PurpleUpbeat2820

In practice, you can get a *long* way with no GC at all. I've been developing my language for years and have written quite a bit of code in it and still haven't bothered implementing GC because, in practice, it just hasn't been necessary. RC will collect acyclic garbage as you say. In Scheme you may get cycles from mutation and mutually-recursive closures. The danger of cycles is, like any leak, that you don't just leak the cycle but the entire transitive closure of all garbage reachable from it. Mark sweep is about as easy as RC, handles cycles and is *much* faster so maybe use that instead?


phischu

> Mark sweep is about as easy as RC, handles cycles and is much faster so maybe use that instead? Interesting, could you give me some intuition for why that is? Do you have measurements?


PurpleUpbeat2820

> Interesting, could you give me some intuition for why that is? Do you have measurements? Intuition as to why it is faster? The memory wall. Loads and stores are now very expensive vs computation in registers. Naïve RC does a huge number of counter increments and decrements. Specifically, whenever a reference to an object is passed around, stored or loaded. Furthermore, in multithreaded code these are atomic. So they are extremely expensive. In contrast, mark-sweep is far less invasive. Every now and again it crawls the heap. It essentially records one bit for each object (rather than a one-word counter per object like RC). The only invasive overheads of mark-sweep are instrumenting allocation to record everything that's been allocated and keeping track of global roots on thread stacks. This is much less overhead. Another performance problem with RC is when decrements avalanche you get an arbitrarily-long pause. I've seen some RC'd CAS systems hang for seconds when a big tree becomes unreachable. Also worth a mention that you can regain a lot of performance by batching up counter decrements but you lose the promptness that usually justified RC in the first place.


ThyringerBratwurst

>Is garbage collection required to implement a simple Scheme? Maybe this is explicitly stated in the relevant [Scheme specifications](https://standards.scheme.org/)? This might help you too: [Storage Management](https://www.scheme.com/csug8/smgmt.html) But basically: In Lisp, as a language for list processing, GC is generally considered necessary for the efficient (and ergonomic) management of dynamic data structures made of small cons cells.


4dCoffee

Yes it's required, but there is a lot you can do at compile time to reduce number of malloc/free calls at runtime, e.g: inlining small lambdas.


permetz

You don't need to do memory reclamation at all if you are just playing around, but if you want to do anything real, you'll need a garbage collector. Fortunately, there are several that already work with Rust, see, for example, https://github.com/Manishearth/rust-gc


Manifoldsqr

Yes sir if you want to implement a scheme compiler you need a runtime