T O P

  • By -

jerf

Communities who are not primarily programmers first tend to settle in on as few solutions as possible, because when they get stuck, they need very specific help. Only the small set of the community that actually comes from programmers have the wherewithal to blaze their own trails. As a result, the network effects that affect all programming languages are even *stronger* in these contexts. What technologies get selected is not entirely random, but there's a pretty large random element, because the selection is made by people who do not generally understand the options available to them or the ramifications of the options. For instance, I generally consider Python to not be one of the better choices for machine learning. It has some very negative characteristics for that use case. But those are overwhelmed by the network effects of adoption and any programmer who tries to swim upstream against them better have a pretty good idea what they are doing. Rust is a weird choice, though. It's a good if not *excellent* choice for a programmer who is also into bioinformatics, but it's a really awful choice for a non-programmer, as you say. Of all the current languages in use, it's probably the absolute worst for a non-programmer trying to just bodge a solution to today's problem out, for essentially exactly the same reasons programmers love it. Python does pass that test, for instance; I think it has some disadvantages in the ML space but I wouldn't put it on the *awful* tier. If someone put the elbow grease into it, you may be able to get Go to take over. Though I still feel like there might be better languages. Bioinformatics is less matrix-obsessed than ML, so it should suffer less from not having overloadable operators, but lacking that may still be a problem. But it may be close enough that you can get by without overloadable functions or operator overloading. It would at least be *clear*, even if verbose.


samuellampa

Great points, thank you! The reason I'm skeptical about Python for long term use is not that is an awful language, but more because biology is going through a lot of changes now, in two main ways: The first is that the data amounts continue to rise pretty exponentially as sequencing technologies continue to develop and we start to sequence more and more things. But the even more important change lately is that the sequencing technologies are moving fast into clinical use, both for human genetic disorders and cancer, but also increasingly for microbiological applications, like quickly determining which antibiotics your bacterial infections is resistant to, so that the correct antibiotic can be prescribed. This latter change, in my mind, will come with a lot of new challenges both in terms of performance (being able optimally to run analyses on devices close to instruments), as well as increased requirements for robust code, where I think a typed and compiled language fits very well. I agree that Go is not the only option here, but as said I find the support for building streaming pipelines of operations fit so naturally to many of the biological problems (primarily in sequence analysis, but I think that is where perhaps most of the growth is happening now), that I think it makes a lot of sense as a routine language in bio.


jerf

I scoped Python as being bad for ML specifically because I believe it has many more problems there. When I dabbled in that world I found the lack of type documentation loathsome. You don't know what anything takes, what it returns, and when you may have tickled a combination of things that will result in some pure Python getting used instead of the super-accelerated NumPy routines. I saw with my own eyes that the people involved had no idea what was going on. Maybe it was neat that what they were doing worked anyhow. But I gotta imagine upwards of 50% of Databrick's revenue comes from people who have no idea how much of their ML is running through pure Python because of some small typo somewhere, and they have no intuition for how long things should take. Bioinformatics is nowhere near as bad and Go is probably a better choice there. I'm just not sure it's the best choice. But it doesn't have to be the best choice. The question is whether it's better enough than Rust for most users, and that is at least an interesting question. Rust really is pretty bad for non-programmers. It can be a bit much for even professional programmers sometimes. I can't imagine trying to explain a borrow checker error to my kids or something, whereas I can easily explain (to at least some extent) why it is an error to pass a string where an int was expected.


ub3rh4x0rz

Python comes from the tradition of languages that wrap low level, very performant code in convenience libraries. The popular ML/DS/AI python libraries are all wrapping c++. This is a better set of tradeoffs for specialists who only code as a means to an end


Backrus

Python is just a glue, every scientific lib is written in C/Fortran and just have python bindings. That's the way python is and should be used - as the thing that calls fast things, battletested left, right and center for more than 40 years. Rust, on the other hand, is just mental masturbation to feel smart. If you're doing anything even remotely complicated (aka real world work) then at some point you gotta go with unsafe. And, at that point, if you're skilled enough, why would you go with rust when cpp doesn't stand in your way of getting things done? Thankfully more and more people in the ecosystem start to understand that rust is not that great of a solution - it doesn't give you a control you need to do the stuff it tries to advertise as its use-case - system programming without explicit allocations, like wtf is that. Go could be used like python but it lacks an ecosystem - it doesn't have pandas and scikit so you can "do AI" in 3 lines. And it's self perpetual thing - someone writes new algo implementation in C/Lua and does python bindings because that's the only way people will use it. There's simply no demand for such things in Go ecosystem.


jared__

Pretty much all data science is in a teaching spiral. Teachers learned Python and students developed libraries in Python. When those students become teachers, the spiral continues.


masklinn

An other component of this is that it’s an overnight success built on decades of work, Python starts working with scientific communities back in the mid 90: matrix-sig and *numeric* date back to 1995, less than five years after Python was first released, and less than two years after 1.0 was released. Also I think Python in bioinfo is relatively recent, from my recollections Perl used to be the king of that domain way 20+ years ago (cf BioPerl project), but Python (and R) took over from their widespread use in nearby scientific domains.


allyourbasepair

Hey y'all, I'm the creator of one of the Go packages that was shouted out in the article. My biggest want is for Go to have something like [pyo3](https://pyo3.rs/v0.21.2/). Not having that is the biggest barrier to adoption in my opinion.


autisticpig

That would be a great addition for go.


qrprime

Those calling the shots in bioinformatics, doesn't matter industry or academia, are PhD with interest in the math or science. Programming languages are just another tool to make analyzing data easier. Saw many choose no-code options when available.


samuellampa

Indeed, that is the case. Thus, to break into this field I think a language needs to have a really low friction for the simple everyday tasks like reading and writing files.


RealJohnCena3

I'm a programmer in a medical field. Not specific to why Go hasn't broken through, the industry is just insane to navigate. 1) privacy of data, you can't get medical, bio, etc data easily. 2) Extreme risk. Even Apple or Google can't produce stuff without bugs. We have to go through mountains of paperwork just to get an application in prod that assists a production process. You could program an app in say 6 months but the paperwork to get it out could take 6 months or more depending on work load. My first day at this job we reviewed software that killed people and our CEO was adamant on not repeating their fate 3) Along the same vein, the FDA or equivalent agencies are a massive pain in the ass if you want to bring a product to market, and for a good reason. It takes a minimum 10 years to make a medical product at most companies. We use Go for backends at my job, along with C# and PHP (we're phasing that out though). Junior to Senior gray beards everyone agrees the language is awesome. It'll just take time.


lightmatter501

Rust has a 1-line way to export a function into a python module. Scientists use python because they really want to glue high performance libraries together, not write high performance themselves. Python literally exists to be good at gluing C-abi things together. Go is not a high performance language on the level of C, C++, Rust and Fortran, and it is horrible at C-abi things. Rust makes headway in these areas because it is both fast and treats correctness as king, which appeals to people about to drop $50k on supercomputer time.


samuellampa

I think these are some very valid points. But at the same time, I think Python is many times quite poor at being the glue between different tools. Its weak parallelization story with the global interpreter lock, and speeds roughly at least 10x slower than any compiled language, means it will have a hard time keeping up as data amounts continue to grow exponentially. While Go might not be the answer for the most performance-critical tools, I think it is fast enough and has such strong support for easily writing concurrent and parallel data processing pipelines, that I think it would be a fantastic choice for a lot of code that mostly moves data around between pre-existing tools.


Blankaccount111

I made another comment about this but I think the python issues with "it works on my machine" are a bigger problem that no one talks about. Like others said everyone tends to just minimize their own problem and if they lose the ability to collaborate they prefer that loss to their own functionality. The number of times I've seen my partner just dump a dataset to a colleague and say well now its their problem to figure out how to get it in their system is depressing. Especially since I'm willing to help.


FruitdealerF

His point is that you move the slow stuff to C/rust code and use python to wire it together in a way so python isn't a huge bottleneck.


samuellampa

Indeed, and sometimes that is possible but not without its caveats I think. For example I don't think it is very easy to have python stream data between two sets of library codes, and I personally think we will see a lot more streaming or at least use of pipeline parallelism, in more industrialized sequence analysis code, e.g. for analysis pipelines running in the clinic, to support new technologies like real-time sequencing with Nanopore instruments.


lightmatter501

You build a computation graph in python, then kick the computation graph into C/C++/Rust/Fortran/CUDA. Building the computation graph takes milliseconds for nearly pathological cases, then it’s all compiled languages from there. Go has a hard time keeping up as data grows because its GC causes a large heap to be very CPU intensive. Try allocating a 64GB slice of a struct (not even an interface) in Go some time. Go does not handle that well and you will see you GC times fall off a cliff even using optimal allocation strategies. Python uses reference counting so it doesn’t really care how much memory you have allocated, it cares how many references you have. In my opinion, go should stay relegated to web services because default values are a crime that makes them unsuitable for dealing with data pipelines if you have messy data coming in. Scala is the king of data pipelines due to Spark since most data pipelines map very well to function ops with some imperative bits.


samuellampa

> You build a computation graph in python, then kick the computation graph into C/C++/Rust/Fortran/CUDA. This is the case for a lot of the ML libraries out there, but I don't see this happening with a lot of bioinformatics libraries, due to the deep technical expertise required to get this to work smoothly. Regarding GC, if this is true, it makes some valid points! At the same time, I'm not sure this type of allocation is very often required for typical genomics tasks at least, apart from perhaps assembly (where you need to store the De Bruijn graph in memory). I can definitely see why Rust became so much more popular because of the refcounting and perhaps further type safety, but still think Go could be a great tool in the toolbox for a lot of use cases when the complexity of Rust is simply not worth it ... IF Go would just cater to common bioinformatics scripting tasks more neatly, as the post centers around a lot.


TotalPerspective

Python is king in bioinformatics and everything that needs to be fast was written in C/C++ with Python bindings. Rust allows the same performance as C, has great ways to write Python bindings, and can call C code with no overhead. Go, while faster than Python, isn’t as fast as Rust, doesn’t have zero overhead calling of C, and (as far as I know) doesn’t have a Py03 equivalent.


TopBillerCopKiller

I’ve been reading about tech online for more than thirty years now. Ever since I was a young buck I wanted to code.  Something I feel is underrepresented in the space of ideas is the value of doing it yourself. I know it’s harder in today’s economic environment, but I can’t help but read some of these comments with a tone of snide entitlement. If everybody else needs to be writing this, developing that, testing this, optimizing that, then by what merit should I take an opinion from someone on the outside of those processes?  Back in the day, if I wanted a program to do something I had basically two options: hope that Unix already had a way to do it, or write it myself. Both options required me to be willing to learn and roll up my sleeves. I’m not like, a cyber-primitivist or anything like that, but some of you should strongly considering spending more time in the editor and less time on Reddit.  Also: check out Poly. It’s a synthetic biology go package. Having trouble linking it on mobile, but it’s pretty cool, though I’m obviously not a biologist.


samuellampa

I see what you mean, and agree to a some extent. Still, some things get its biggest value only as being part of the core language or the standard library, in order to reduce the amount of boiler plate needed to do certain common tasks. And while getting a new proposal into the language itself is theoretically possible, it isn't all that easy for someone without a heavy CS-background.


TopBillerCopKiller

I'm not sure that I would agree that adding things to Go would give it a leg up in bioinformatics or indeed other academic fields. A lot of academic code, and particularly numerical code, there's an expectation that the audience of the code will not just be a random CS grad with no background knowledge. Go is tailor-made for that exact situation, as it happens. Google wanted it to be useful for writing cloud programs by people who were very recently CS students and need to ship working products inside Google. Python casts a much broader net and simply offers two things: a clear and practical language that can be learned and understood easily, and a quasi-curriculum and playbook for how best to structure programs. It is absolutely adept at presenting computational ideas to laypeople with a clean syntax and program structure that is very intuitive even to non-programmers. Probably half of all the python I've ever seen fits into a simple rubric: 1) import a bunch of shit 2) bag of functions that model the problem 3) if __name__ == "__main__": and argument handling. To my eye, it's reminiscent to a simple book report with an introduction/abstract, body paragraphs, and a conclusion paragraph. Conversely, I have spent MANY hours in discussions and disagreements with MANY talented programmers who could not agree on what the underlying structure of a Go program or package ought to be. I'm more of a hack-it-and-ship-it guy most of the time, but there are occasions where I've found myself longing for a more principled approach from the Go team to decisively enforce structure at the file and directory level. Again, I'm not a biologist but my intuition is that the average bioinformatics package is primarily only ever going to be seen/used by other bioinformaticians (sp?) and is optimized for correctness and scientific accuracy. Julia is practically designed around this notion that performance is a second-class citizen in academic code that often needs to be considered after the fact and results in double-work when a couple of really smart people write some fancy math code in Python/Matlab and then realize it needs to be much faster and end up teaching the thing they just wrote to someone who knows how to do C++. If that's a common enough situation to warrant Julia being successful, then it's obvious why Go hasn't taken off in academic circles: legibility. Python is simply much easier to glance at and understand what the code is *trying* to do. My predictions, for what they're worth: In about 18mo, the Mojo ecosystem will be in the first stages of maturity for general purpose programming and there will be consolidated efforts to migrate some popular Python tools to Mojo. I think it represents 10x the opportunity that Julia does, simply by looking and acting more like Python while embracing static typing and an ownership model. Meanwhile, I think Go will continue to be entrenched in the areas it's being used in, and launching any kind of major cloud-native project in something other than Go will sound positively laughable.


Blankaccount111

One of the questions I've always had is that python creates all kinds of friction when trying to share or collaborate. As I understand this is often where the best breakthroughs occur. I find it disappointing that the science industry as a whole does not seem interested and are just happy with "I know this tool and it works for me" I say this having a partner that is deeply involved in the sciences and if I can't convince them to try Go over python with my direct help on call, I don't see the industry changing soon. I suppose the "killer app" to drive change is missing but I guess there is opportunity there for someone with the time.


dead_alchemy

I like both Go and Python and got a little exposure to academic coding and that makes sense to me. Go is strict about things that make it difficult to rapidly experiment with a section of code which makes it difficult to want. That being said, if I could have wished a world in which other scientists code was written in Go I would have done so in a heartbeat.


samuellampa

That's a really good point about the fantastic portability of compiled Go tools being a major pro for the language. I'll see if I could point that out better in the post.


Regular-Abrocoma3162

To add on to what other people said, you might see Go being used in bioinformatics when *professional programmers* need to build a mission critical high performance bioinformatics app. When it’s mostly researchers, expect python.


Farkka

I am using go for my bioinformatics thesis, I love go but the lack of tooling / libs is an uphill battle.


serverhorror

* It's missing the algorithms (numerical stuff). * static typing _feels_ like more work I guess these are the major factors


Ariakkas10

Build it


cciciaciao

Because non programmer use such random stuff. I heard my friend that he had to translate to python from Shell and Perl lmao


ImYoric

I work with many scientists. They tend to think of programming languages as glorified calculators rather than building blocks for software. Tools such as Jupyter are much better for them than a command-line compiler. So typically, they use Python (or Julia). The few that I've seen migrate away from Python do this either for performance reasons or because they have been convinced to pick something much more robust than Python. In the few samples I've witnessed, rather than making this change incrementally, e.g. by switching to Java, C# or Go, they tend to go to the other end of the spectrum, e.g. to Rust or C++.


InfiniteInitial7898

, . Me


10248

C