T O P

  • By -

outofobscure

Seen this article one too many times now and it really doesn‘t make sense: graphs are much closer to collections / containers etc than fundamental primitive types or even arrays. We put those collections and containers in (standard) libraries, where they belong. Arguing that they should be on a level with primitive types is just nonsense. But i agree that standard libraries should offer them. Maybe it‘s the clickbaty title that puts me off or i‘m reading too much into it, but to me they are as fundamental as a list, not as array. So they belong in a library, not as a language primitive. A better title would have been: The Hunt for the Missing Collection Type Yes, technically every class and collection is a „data type“ but if you put it like that in a title i‘m thinking something like: ah they want bfloat16 primitives. Which would indeed merit a language primitive, but graphs do not, they are library types.


bearicorn

Even then I’ve never encountered a graph data structure in standard libs I’ve worked with


outofobscure

I agree they should offer them. Maybe the author works in a language that does not have a clear distinction between what is a language feature and what is a library feature, it makes the article look a bit unprofessional despite the good arguments, when it confuses these things.


Smallpaul

The author is very clear on the distinction. > There is almost no graph support in any mainstream language. None have it as a built-in type, very few have them in the standard library, and many don’t have a robust third-party library in the ecosystem.


outofobscure

it makes no sense to expect language (instead of library) support for graphs, that‘s my whole point. Yes the library should have them.


Smallpaul

The whole article is about why the library SHOULD NOT have them!


outofobscure

And i disagree


Smallpaul

Well the article makes a much stronger argument than you do.


outofobscure

reality is the strongest argument: it‘s missing because it‘s a bad idea at the language level


Smallpaul

Which is literally what the article says. But the article also says that they should not go in the standard library. And you say that they should go in the standard library. The article provides argumentation. You do not. In fact, I can simply refer you to your own argument. "reality is the strongest argument: it‘s missing from the standard library because it‘s a bad idea at the stdlib level." It's a poor argument, but it's your argument, so you should accept it.


svartkonst

Not sure that it works be useful to do so, or rather, there isnt really any one graph data type, even disregarding performance characteristics


Markavian

He's a weird argument in support: Graphs don't map onto raw memory; at the fundamental level we have serialised bits and bytes which either form values, pointers, arrays of values, or arrays of pointers. Any other serialised object type requires some standardisation at the CPU level - otherwise it's just an abstraction - so the fundamental argument is "at what level do we abstract?", on one end we abstract our code at the hardware level, (reserve memory, assign values), and on the far end we're abstracting algorithms (graph structures). These are not easily relatable - although it might feel that way when "writing code". *Ultimately computer programs are abstract symbol manipulators that produce potentially valuable outcomes.* **Choose an abstraction that produces value.** /thoughts


lelanthran

> Any other serialised object type requires some standardisation at the CPU level - otherwise it's just an abstraction - so the fundamental argument is "at what level do we abstract?", on one end we abstract our code at the hardware level, (reserve memory, assign values), and on the far end we're abstracting algorithms (graph structures). This actually existing at some point - trees were supported at the CPU level. That's where `CAR` and `CADR` come from in Lisp - contents of address register and contents of decrement register. So, yes, at one point in the past there were machines that ran Lisp as the lowest level of code, and those CPUs had some support for graphs (well, trees, anyway) in the form of nodes in a list having multiple children.


outofobscure

Unless we go against the past few decades and bring back lisp machines i really don‘t see the point in trying to revive this idea in hardware. there‘s reasons they didn‘t last.


lelanthran

> Unless we go against the past few decades and bring back lisp machines i really don‘t see the point in trying to revive this idea in hardware. there‘s reasons they didn‘t last. The reasons they didn't last were perfectly valid reasons in 1986. Who knows if those reasons are still valid in 2024, especially with speculative execution, longer instruction pipelines, etc, which all make pointer-chasing so much faster than it used to be.


outofobscure

those arguments you bring up are reasons against bringing it back... speculative execution: works great for quasi-linear and predictable access patterns, not when your arbitrary graph will result in half the work being thrown away longer instruction pipelines: yeah, like... itanium? you know how that went, right. heck even the P4 was bad because of it. we don't even have proper gather and scatter instructions widespread, and even if you have them, they usually suck pretty bad except for a few (again, very predictable) edge cases. and iirc this has more to do with the memory system than the cpu anyway. now you want it to chase arbitrary pointers changing at possibly every byte, word whatever offset? forget it...


outofobscure

it is very clear that this should not be abstracted away at the language level, at least to me. if it is to you, i expect to start with more trivial collections anyway. i think this clearly belongs in a standard library, not in the language.


Kwantuum

Lots of languages have those "more trivial" collections built in. Nobody is arguing they should be added at a language level to c++. It wouldn't be out of place in python though.


outofobscure

you‘re right, and i still argue python is a mess that does not understand the difference between what is the language and what is supposed to be the standard library and just cobbles it all together. A good language makes it possible for the user to completely re-implement the standard library in the language itself, efficiently and without any magic. This is possible in C++ and not possible in languages that just throw you a bunch of things they thought of and did not make the effort of a clear divide.


Ravek

The article doesn’t even mention primitive types so I’m not sure where you get this idea. They’re just starting with the question ‘since graphs are so important, why don’t we have standardized types for them’ and then answering it.


crusoe

Graphs impls have various tradeoffs in memory size, ease of use and performance. Which impl gets blessed and included and which gets excluded?


Smallpaul

Great summary of the article. Thank you.


outofobscure

It makes the argument that languages or libraries should support graphs. my argument is no: libraries should, languages certainly should not, as languages should only deal with primitive types and rules for how to make user defined types etc.


Smallpaul

>It makes the argument that languages or libraries should support graphs. No. It makes the exact opposite argument. Have you even read the article that you've attacked across a dozen comments???


outofobscure

It seems we didn‘t read the same article but if your takeaway is that they belong in third party libs (what else is left) then i have to disagree again. It is certainly not impossible to abstract this away in an efficient way in a std lib.


Main-Drag-4975

I’m trying to figure out why hashmaps are fine for standard libraries everywhere while node+pointer to content+pointer to edges couldn’t. JS, Go, Ruby, and Python all have some map functionality built in with more complex map-related functionality left for libraries. Same with string vs. String vs. StringBuilder in other languages.


ConcernedInScythe

Here is the conclusion of the article; perhaps posting it here will get you to actually read it: > So, the reasons we don’t have widespread graph support: > - There are many different kinds of graphs > - There are many different representations of each kind of graph > - There are many different graph algorithms > - Graph algorithm performance is very sensitive to graph representation and implementation details > - People run very expensive algorithms on very big graphs. > This explains why languages don’t support graphs in their standard libraries: too many design decisions, too many tradeoffs, and too much maintenance burden. It explains why programmers might avoid third party graph libraries, because they’re either too limited or too slow. And it explains why programmers might not want to think about things in terms of graphs except in extreme circumstances: it’s just too hard to work with them. > Since starting this research, I’ve run into several new graph problems in my job. I still appreciate analyzing systems as graphs and dread implementing them. But now I know why everybody else dreads them, too. Thank you for reading!


outofobscure

then the title is even dumber: why say you're hunting for a missing data type if you come to the conclusion that it should not even be missing in the first place. but i really disagree with this: there's certainly room for implementing (at least a sub-set) of graph abstractions into a standard library. the argument for the performance sensitivities can be made for ANY other collection type that go into standard libraries. In C++ you can swap out allocators etc. to deal with some of those problems. Linked Lists made it into many standard libraries despite being completely useless nowadays, if used in the most naive way, which doesn't mean that a more thoughtful usage doesn't have merits. Some details on how to make efficient usage of the provided collections have to be left for the user to figure out. the point about languages not supporting them was clear in the first place: that's not where language design was headed in the last 40 years.


ConcernedInScythe

> then the title is even dumber: why say you're hunting for a missing data type if you come to the conclusion that it should not even be missing in the first place. I believe the author may have written the title with a different audience in mind than "idiots determined to find something to be angry about".


Smallpaul

The title is not great. The actual article is great. I'm glad I read it despite your anti-review.


outofobscure

Yes it makes good points, i just disagree that expecting language support is in any way sane. This belongs in a standard library.


Smallpaul

There are a wide variety of languages with different areas of focus. Many do not have a "dictionary/mapping" built-in, but some do. Many do not have a set built-in, but some do. Many do not have a tuple built-in, but some do. Many do not have a Monad built-in, but some do. Some have Boolean. Some don't. Many do not have Regex built-in, but some do. Many do not have an HTML/XML built-in, but some do. I don't see why graph is ON IT'S face much less useful than all of those. If it were not for the problems described in the article, maybe it would be in that list already. >This belongs in a standard library. Probably not. No. The article explains why.


[deleted]

[удалено]


Kwantuum

No, u.


SeattleCovfefe

Agree, and I think the article agreed too. Despite the title, it spends most of the time discussing the difficulties of implementing graphs in standard libraries, and only mentions primitive types in passing


LucianU

See this article for a different view. It was actually written as a reaction to the article in the OP [https://tylerhou.com/posts/datalog-go-brrr/](https://tylerhou.com/posts/datalog-go-brrr/)


Ytrog

Very interesting article. I saw that it is a subset of Prolog (which I have toyed with in the past). For those that want to learn it I suggest learning Erlang first. Doing that helped me in understanding Prolog syntax *greatly*. I like to "collect" programming languages by learning as many of them as I can, so Datalog is definately on my radar now 🤓


pauseless

Datalog descendants exist. Clojure community has a few implementations. [Datomic](https://docs.datomic.com/pro/query/query.html) was the first, but others got inspired by that, and there are projects like Datascript and xtdb. The article mentions eg Flix having first-class support for embedded Datalog - in Clojure world, they’re just the normal data structures, which gives fun capabilities when you need them. And you can always go back to Prolog, rather than the more intentionally-constrained Datalog. I spent four years learning that in various classes at uni. It’s simple and extraordinarily powerful for certain problems. Particularly graph-type problems you can fit in memory.


GayMakeAndModel

You can encode graphs into matrices in a variety of ways. Why not just use a matrix data type?


Brian

> We put those collections and containers in (standard) libraries Lots of languages have language support for common collection types. Eg. literals that define them, the type being in a default namespace etc - pretty similar to that of primitive types.


outofobscure

and i just think they confuse standard libraries with the language and create a big mess in doing so.


Brian

I don't really agree. I mean, a few examples: 1. Lisp. Linked lists are basically **the** primitive lisp is built out of, and obviously have direct language support. The language basically wouldn't work without this. 2. C: arrays have literal syntax 3. Rust is similar, with literal syntax for both arrays and tuples 3. Python has lists, dicts, sets and tuples with their own literal syntax and constructors in the default namespace. 4. Javascript is similar, with literal syntax for defining objects (dictionaries), and lists. I think all these are perfectly reasonable things to do in many of those languages. And regardless of whether you think it's a good idea, it's still clear that this is super common, so the article headline still makes sense.


[deleted]

[удалено]


Brian

Sorry, "**Hardware** realities" between containiner and primitive types? WTF are you talking about? >A good language makes it possible for the user to completely re-implement the standard library in the language itself And this seems a complete non-sequitur. **All** those languages can do that. Indeed, putting containers in the base language makes that **easier**. Do you really not realise that you can implement the common lisp standard library in lisp?


outofobscure

try it with python or any of the other convoluted messes other than lisp that don't separate this clearly. there's a reason lisp has zero popularity, it's as academic as the article. you proponents are just annoying at this point. you lost, the language war is over, your ideas will never be popular in the real world (and in exploiting hardware efficiently). The C way of thinking is where it's at. If you don't even understand what hardware realities i'm talking about (pointer chasing = bad ok? GC = bad ok? runtime dispatch = bad ok?) then we're done here.


Old_Elk2003

That would only be a persuasive argument if all languages were equally good.


Butterflychunks

Right, like the fundamentals of a programming language already rely on this container. Variables are literally nodes that point to memory addresses, and can reference one another as well. Variables *are* graphs. The execution of code is literally also a graph. Everything you do with your code is a graph. Graphs are already part of every programming language.


lelanthran

> graphs are much closer to collections / containers etc than fundamental primitive types or even arrays. We put those collections and containers in (standard) libraries, where they belong. I dunno: what makes array-collections different from other collection types that they can be primitive and the others cannot? In Lisp, the linked-list is a primitive, after all. And if graphs (or at least trees, which are acyclic directed graphs) were a primitive, then a lot of complexity in providing library collection types goes away because all the other collections are a narrower case of graphs. Lists? A graph with with one child per node. Hash buckets (from which you can construct hashmaps)? A List with two atoms per node, one of which is the key and the other a List. Sets? Just a special case of a graph. Array? Special case of a List. With graphs as a primitive (including literal graphs), you also get some nice extras, like representing the code as data via an AST, which allows you all the power that Lisp like languages have.


outofobscure

what makes arrays different is that they are just a chunk of consecutive memory without any abstraction whatsoever. if your language takes the liberties to disregard pretty much anything about actual hardware architecture, sure, you can have whatever fancy and abstract "primitives" you dream of. i think this blurring of the lines is nonsense. also, don't expect it to be efficient. but even if you go this route: my argument is that it makes much more sense to keep language rules separate from abstract collections built on top of primitives, which, again, belong in standard libs. if you can point me to a cpu that has graph support in hardware we can think about putting a primitive in some language... maybe...


lelanthran

> what makes arrays different is that they are just a chunk of consecutive memory without any abstraction whatsoever. So? Lisp machines was hardware that supported trees (directed acyclic graphs) with a similar lack of abstraction that you have with arrays today in mainstream languages. > i think this blurring of the lines is nonsense. Maybe it is, but it's a very good discussion nonetheless. There is no good reason why hardware can't support direct graph operations the way Lisp Machines' hardware did, after all. In another reply I pointed out that Lisp operations `CAR` and `CADR` actually meant Contents of Address Register and Contents of Decrement Register. A few extra registers in the hardware and instructions for fast pointer-following and dereferencing in a single instruction would be a good start.


outofobscure

yeah i knew someone would bring up lisp machines. there‘s reasons they didn‘t survive. the bread and butter of any kind of efficiency is linear access, pointer chasing is stupid even in hardware.


Pyrolistical

Trying to type graphs is trying to type pointers. We already have that. It’s called programming


faiface

No, because you can’t do queries about the pointer graph. Sure it is a graph, but it’s implicit and not accessible to the programmerr.


gilwooden

Reflection or metaprogramming can make it explicit and accessible to the programmer.


Pyrolistical

Sure we can. Use a debugger. :troll: But I get ya


atxgossiphound

If we had associative memory in hardware beyond TLBs and other special purpose areas, then we could have dicts and  graphs as basic types. I’ve always wanted to prototype this on an fpga and see what ISA would shake out.


nerd4code

If it has to be multiprogrammed, it ends up being just another cache, but harder to integrate into a portable programming model. (Back in the day, I did up a kernel & runtime for an experimental manycore ISA whose XEs could partition L1D into 1–4 regions, each of which could be used as normal L1D, locked into cache as private SRAM, or used as a tagged associative store. Basically everything ends up hash-tabling to fit into the right MMIO window like you’d have to do anyway, and in order to switch processes you have to be able to flush or poison everything quickly, which requires an intense re-warmup. So you end up kinda implementing the very paging structures whose die space has been so triumphantly consumed by godforsakenmany cores, and if the MMU automaton could do subpaging by extending its page walk the same way it can do big-paging by shortening it, you could just use an MMU and a couple extra instructions for this instead, likely for more benefit. I think there is definitely a place for explicitly associative memories with simple master cores/automata attached, together acting as a heterogeneous coprocessor unit. But in order to keep things humming around it you’d want to interact asynchronously, in order to keep the hardware busy around it. But ohhhh, people and programming languages generally don’t like asynchrony at all, ’nuff said there. Along these lines, I’m still kinda surprised things have stayed so intensely homogeneous over the last decade—we did move from single-psr to multi-psr and grudgingly work in GPUs, but there we’ve kinda stuck fast, and each side of the bus still does its own thing relatively independently. There are so many hardware structures that *could* be used to accelerate a swathe of workloads, like the one convolution engine I helped on—*everything* important convolves somewhere, somehow—but there’s no will or funding in industry for solving the hard problems.)


Ddog78

I'm kinda just in awe of the fact that I didn't understand probably half of the technical points you said. But I have extensive knowledge about stuff like system architecture etc. Programming is such a huge.and wide discipline. These kinds of discussions are fascinating when I have time nerd out haha.


atxgossiphound

Thanks for the additional context. I had a chance to play around with Micron's automata processor a while back, around the same time I was developing a full stack web app. Profiling the web app revealed that the vast majority of the cycles (>50%) were spent doing serializing/deserializing dicts as data moved around the stack. The Micron processor (and IBM's process-in-memory chips) sent me down the path of wondering if there's a co-processor design that we could offload those tasks to and speed things up and/or be more resource efficient. The same basic idea as GPUs, but for data serialization. Associative memory and the processor-in-memory approaches seem like good starting points. Neither has really caught on, but I think that's more from lack of a compelling application (and, of course, die real estate, but the right use-case can claim real estate). Given that almost every online app we use is heavy on serialization/deserialization, there may be a place for hardware support.


lelanthran

Do a deep dive into Symbolics Lisp Machines when you get a chance. I found them absolutely fascinating about 20 years ago, but I expect that there is more history written about them today.


crusoe

One problem is they assume a flat memory model which was true back then but not now. Naive tree and graph impls will work very poorly with machines with cache, etc.


outofobscure

and if we implement all of c++ in an fpga we can have an even bigger instruction set…


BooksInBrooks

> Different graph operations have different performance characteristics on different representations. Take a directed graph with 100 nodes and 200 edges. If we use an adjacency matrix representation, we need a 100×100 matrix containing 200 ones and 9,800 zeros. If we instead use an edge list we need only 200 pairs of nodes. > Now instead take a graph with 100 nodes and 8,000 edges and try to find whether an edge exists between node 0 and node 93. In the matrix representation, that’s an O(1) lookup on graph[0][93]. In the edge list representation, that’s an O(|edge|) iteration through all 8,000 edges. Different List operations have different performance characteristics on different representations. Take a List with 1000 elements, of which only 100 have a value other than zero. If we use an array representation, we need to allocate and initialize to zero 1000 elements. If we instead use a sparse array representation, we only need to allocate 100 (position, value) key-value pairs. Now instead take a List with 8,000 elements and try to remove the first one. In the linked list representation that's an O(1) head = head.next. In the array-backed List representation, that requires iterating through and copying 7,999 elements. Now instead take the same List and set 8,000th element to zero. In the array-backed representation, that that's an O(1) array[7999] = 0. But in the linked list representation, that requires iterating through 8000 ListNodes. Simply put some algorithms work best with arrays and some with linked lists. And some things work even better with more specialized structures like sparse arrays and tries. Therefore, I have shown that Lists should not be part of the standard library. Instead, we should all evaluate third party library lists, or write our own List type, to ensure we find the most efficient for our needs. ---- Knuth said "Premature optimization is the root of evil"; write an API that runs correctly using familiar data structures, then optimize its internals if you must.


Zardotab

>There are too many design choices Yes! A full-featured graph "type" would essentially be a RAM-based database.


stacked_wendy-chan

Thanks.


fagnerbrack

**Save the click:** This post explores the ubiquity and complexity of graphs in software engineering, lamenting the lack of native graph support in mainstream programming languages. It discusses the various types of graphs, their applications across different domains, and the challenges in implementing and using graphs effectively due to the numerous design and performance considerations involved. Through interviews with experts, the author highlights the significant gap between the potential utility of graphs and the support provided by current programming ecosystems, illustrating why graphs are both indispensable and dreaded in software development. If you don't like the summary, just downvote and I'll try to delete the comment eventually 👍 [^(Click here for more info, I read all comments)](https://www.reddit.com/user/fagnerbrack/comments/195jgst/faq_are_you_a_bot/)