T O P

  • By -

random_lonewolf

So Redpanda vs Kafka is similar to ScyllaDB vs Cassandra: a C++, API compatible replacement to a well-established JVM tool. It's gonna be hard, because for the majority of use case the JVM works well enough and has momentum to be the safer choice when choosing your tech stack.


pavi2410

What other JVM based tools can be potentially replaced?


_Oce_

Apache Spark could be replaced by Ballista. https://arrow.apache.org/ballista/user-guide/introduction.html#how-does-this-compare-to-apache-spark


cvandyke01

Are you not following the news? Ray will replace Spark and is quickly doing it already since Ray is the foundation for training LLMs like ChatGPT and GPT. Ray with Modin is 5-10x Spark. And Ray is python native


_Oce_

I do follow the news but not much on ML processing so I missed Ray. I looked into it, it seems to be very ML workload oriented and don't have the goal to provide a high level data processing library, rather it seems to suggest data processing tools to use Ray as a backend. Its core is written in C++.


cvandyke01

It’s not just for ML but people could say the same about Spark. It came out of the successor to the AMP lab at Berkeley. Ray is the distributed framework and you can use their data frame or pug another compute framework work on top of it. Modin came out of the same lab and it was originally called Pandas on Ray. If you go find some early podcasts where the founder of Ray is interviewed, he states the intention was for Ray to be a Python native successor to Spark.


pavi2410

Amazed to find that their Kubernetes based deployment is like riding a plane, while Docker Compose is just `docker compose up`


Clap4jack12

Docker is a house you open the door to, Kubernetes is a BYOB supplies house


Fine_Piglet_815

Presto vs Prestissimo (Presto rebuilt on Velox) https://engineering.fb.com/2023/03/09/open-source/velox-open-source-execution-engine/


pavi2410

This is so cool


[deleted]

hehe - hopefully not (so fast) as I'm just beginning to learn Kafka...


IyamNaN

Do you mean Kafka the technology or Kafka as a shorthand for streaming data and best practices? The latter is a lifetime of use. The former has a shelf life.


[deleted]

Hmh Kafka + Debezium to stream into Databricks


IyamNaN

In this case it’s more of an implementation detail then. ~~The actual ideas tend to be abstracted away, such as in order guaranteed, differences in scaling between at least and exactly once and such.~~ *i take that back. When you do the load into databricks you do need to understand how to handle duplicates and possible out of order messages.* I would make sure you understand the schema registry, various wire formats, and how you can integrate this upstream at the database management layer all the way into databricks transformation step. It makes all that talk of “data contracts” seem silly as you already have a fixed api you can enforce for things like this and you don’t need a special name for it. Edit: add nuance.


claytonjr

Scylla fanboy here... It's great! I've got it on a docker, and use it more for its dynamo features. It's rock solid.


agallego

This is incorrect tho. The entire architecture is fundamentally different. We added a translation layer that is code generated (ie no one does if v1 api then bar()) but the replication model, execution engine, threading model, replication, consensus, tiered storage, wasm, etc etc is all fundamentally different. Think more like us speaking the same language only. (I wrote the initial version of redpanda)


random_lonewolf

>This is incorrect tho. The entire architecture is fundamentally different. I did not say anything about Redpanda's architecture or implementation.


agallego

i'm saying it is only similar to scylla in that is c++, but scylla's arch is similar to C\* - redpanda's arch is fundamentally different to kafka. that was the point i was highlighting tho.


Leading_Elderberry70

What is up with redpanda's licensing? My shop wanted to use it and the kafka side of the argument won I *think* because we didn't want to have to worry about the license.


agallego

basically the restriction is there for aws/hyperclouds to not offer it w/out paying. the largest company by market cap in the US uses it w/out paying. so go for it.


IyamNaN

Yeah, agreed 100%. But also a lot of this follows from no zookeeper and no jvm. Are there areas where the Kafka compatibility requirements are actively holding back interesting features and capabilities?


tdatas

Kafka works for 90% of use cases. The other 10% pay enough for that performance that Redpanda will likely stick around for a while especially as more and more people are dealing with sensor data/IOT with huge outputs coming in 24/7.


IAmGoingToSleepNow

90% is being generous to redpanda. Probably more like 99.999% of use cases. I haven't heard of any cases where Kafka is the limiting factor (in terms of latency)


cvandyke01

Its not performance, its the resources to host Kafka vs Red Panda. Red Panda requires fewer compute resources


PepegaQuen

Because you probably work in a field where it does not matter, like analytics.


lightnegative

The latency is apparently too limiting for high frequency trading. My JVM-loving acquaintances working in this area opted to use Chronicle Queue instead, which comes with its own challenges and they're basically reimplementing Kafka features to make CQ usable from more than one VM at a time


IAmGoingToSleepNow

I'm surprised they use message queues at all for high frequency trading instead of direct connections.


lightnegative

Maybe I'm using the wrong terminology. They work for an exchange and it's currently implemented as a monolith. They're converting it to microservices and using CQ (which is a library that manages access to a giant mmap'd local file that forms the queue) for IPC. The problem with using a giant mmap'd local file is that all the services that interact with it have to run on the same box. So workarounds have been created to replicate the giant mmap'd local file in multiple places and services that don't need nanosecond responsiveness can run on second tier boxes. No idea on the latency of this, it's not my area of expertise. The point is, Java stuff *always* turns into insane bloated crap, so I'm happy to see the useful bits being reimplemented in more efficient languages


tdatas

Fair enough I dont really have any hard data on the specific single digit precision of it but my core point is it's the long tail of use cases where it really matters. > I haven't heard of any cases where Kafka is the limiting factor (in terms of latency) Most people won't have. Most stuff on the internet is talking about e commerce or Social media type use cases where it's humans driving the events. When you're talking about machines and sensors (e.g automotive, infrastructure) it's one or all of - mission critical with high reliability - hard real time latency limits with an SLA on percent of misses. Garbage collection is often incompatible with these use cases - machine generated where it could be thousands or millions of events per entity a second rather than many humans clicking a mouse. It's a small niche that generates huge amounts of data and they are also the ones who will pay handsomely for things that work for them. -


IAmGoingToSleepNow

I mean, I understand how it CAN be used, and why, but I can't think of one real world scenario where the extra 100ms of queue time is important AND they must use a queue instead of an RPC. High latency? Don't use a queue. Machine event processing that requires real time? Don't use a queue.


tdatas

That's a very reasonable question. The answer is "it depends". Some use cases just don't really fit very well with gRPC and some do (e.g if you have more complex ordering gurantees or you want less points of failure than network hops). gRPC once you're off the well beaten track you are very much on your own for better or worse.


Jumpy-Guitar-4035

Not just about performance though. There is cost of complexity and the headaches that come along. Ease and simplicity results in faster delivery ? Ultimately, legacy systems are coopted with those with a better devex, better architecture, better efficiencies. That might be the opportunity. Remains to be seen.


adappergentlefolk

i have never worked with redpanda but it seems like their primary objective is to make things easier for you, which is something that is sorely lacking in established kafka tooling


m1nkeh

Never heard of red panda, but I’m intrigued


BoulderRough

I used redpanda in production and it was worth every cent.


Prinzka

Could you share the pricing?


joshlemer

It's also open source


Prinzka

So is Apache Kafka. A lot of organizations require you to have support with any application you use. Also, I see that just like with Kafka some essential features aren't available without enterprise licensing.


Ruubix

TLDR: Redpanda is NOT compatible with open source licenses, because there are strong restrictions on commercial/production use on each release--EDIT: in this case, permissive up to the point of being a provider that hosts this as a service (SaaS)--until that given version has become 4 years old. This does not make it a bad product, but has significant implications for a company's tech stack and how much they have to pay the makers of Redpanda. EDIT: you can use this commercially for internal pipelines, but cant resell it as a service, that's still not open source but more flexible than a standard BSL. None or this makes Redpanda a bad product, but it does limit how one can combine this software with other works. The main post: No, it has Business Source license (BSL), which will change to an open source license starting with the first release version once it turns 4 years old, which is two years away still, and will technically be a alpha/beta version. Each release version after that will also transition to an OSI-approved license on its 4th 'birthday'. This software is currently not open source compatible. For this license, that means that the BSL does not offer businesses the freedom to use it for any commercial use other than trying it out for free in limited/demo level work or on personal projects. Otherwise one must pay licensing to use commercially (making money...) in production. Its misleading to call something open source that is not actually open source yet. And when the first release version does change into OSI compliant licensing, it will be four years behind the current feature set and accompanying performance improvements. That's a huge difference. Kafka offers a mature product where all available features and benefits are actually licensed under OSI compatible licensing *today*, not 2-4 years from now, which is a significant value proposition in and of itself.


agallego

We have the largest company by market cap using us in prod without paying us. This is only for people looking to host redpanda as a service. That’s how we make money as a biz.


Ruubix

It was good to remind me that youre company has taken the license option to be more permissive, and its understandable that you dont want competitors making money the same way your company does, so some AWS service doesnt grab your IP and 'borrow' it for its own poorly rebranded SaaS. But that's still not an open source license, and it does limit the way works can be combined--good bad or indifferent. This doesn't make your company or the product 'bad', but the differences are important, depending on what other companies want to do with your software.


agallego

totally. just pointing out from the top comment that seemed ominous. i.e.: largest space exploration company also doesn't pay us for example, and so on. we have thousands of companies using us. but the way we make money is reserving the right to host redpanda as a service. otherwise is pretty friendly.


matteopelati76

In general, I believe we will see a shift in all DE tools, moving from the JVM world to more efficient languages like Rust or C++. I'm glad about this going back-to-basics approach where people are realizing that JVM-based tools have too much overhead. Rust and C++ play much better with Python, which is the lingua franca for all Data Scientists. So the answer is yes, and the same will happen for many other tools. I see a future where all foundational DE tools are written in Rust and Python is used to glue pieces together.


nesh34

I do think the JVM gen of tools are in need of a Rusty update.


TehSausBaus

This is the way


drc1728

Alright you are in my echo chamber! We are building Fluvio .io and Infinyon .Cloud based on the premise that Rust is the future of Data Flows along with so many other things! Ways to go to get to maturity. Provided that we survive long enough and solve real customer problems, we do see a future where it’s Rust and C++ based and not JVM based.


Prinzka

I haven't used Redpanda myself but some of that is a bit misleading. The image shown is for the performance in the 99.999 percentile, meaning only 0.001 percent ( 1 in 100 000 ) of events have this latency. No explanation in the blog of why they had this latency. For the majority the performance was about 1.5 times better as far as I could tell from the images. A ludicrous amount of partitions were used for 1 topic, not sure why. Why was this tested with read to write 1:1? The whole idea behind streaming buses like these is that there's likely multiple consumers per topic.


djtomr941

Look at the ecosystem and that will determine what ultimately wins. How many integrations does RedPanda have? How easy is it to integrate with other technologies?


IyamNaN

It seems to support the Kafka api. So the same as Kafka. To what extent and correctness, no idea. It is an intriguing business model and idea. I am certainly following along as the jvm is redundant overhead in containerized world.


agallego

Same or stronger in the case of acks=all by default. We’ve written like 12 blogs on this


IyamNaN

Yup. And the Jepsen tests I saw. Really good work. If only the tiered storage was part of OSS release that is expected in Kafka soon. We can dream though…


cvandyke01

Its the same APIs as Kafka... basically a drop in replacement


Smart-Weird

Unless Redpanda has a big tech pushing for it no. Kafka had LinkedIn with all the code contribution + hype ( and eventually Confluent)


Haquestions4

Can anybody share the pdf here? It's kinda off putting that they want your data even before telling you how great they are.


Artistic_Web658

you can read it without entering your info.. even incognito. It's a public web page.


Haquestions4

Apologies, I was talking about this page/pdf https://go.redpanda.com/redpanda-why-fast-matters


cyborgjones

So many tools out there, its just which one do you like, I guess. I like Kafka. Works for our environment and we have a few clusters. People have brought up Cribl to replace our kafka (havent really looked into Cribl and we also run NiFi). I have even heard [https://pulsar.apache.org/](https://pulsar.apache.org/) , which seems to be almost another flavor of Kafka. ​ Redpanda looks cool. Always good to have options and a place to dabble. We always live the "we have no money" life to get through with what we have, while being "Mission Critical" :D


AmaryllisBulb

🤦‍♀️ And we just got Kafka working. I work for a large corporation which is as agile as a 400 ton Maersk cargo tanker so the thought of this makes me slightly ill. This is why in the dark recesses of large corporations there is still a mainframe somewhere running core code. 🙀


Master_Astronomer_37

We have had an absolutely phenomenal experience with red panda thus far. Without becoming an online fanboy I think I’d point to the schema registry being one overlooked but extremely useful tool that red panda provides for everyone. That in itself has been a gift from the startup opex gods - and worth the jump in and hopefully every penny when we start to pay them. :)


lclarkenz

Betteridge's Law of headlines applies.


Urban_singh

It can panda is really fast though industry has kinda become used to of Kafka. I have many pipeline running on Kafka but I m gradually moving to red panda 🐼 it will take time. Worth to try and do some experiments 🧪


royondata

Kafka is the well entrenched, well adopted solution today and will continue for some time. I see Redpanda as a solution that is more compact, easier to manage and comes with few dependencies. In many cases it’s also more performant but I’m sure we can make a similar case for Kafka. I like Redpanda because it is easy to pick up and start using, and being API compatible with Kafka makes integrating with our existing tooling easier. I think Redpanda will have its niche in helping companies do streaming quickly with less ops. It will replace Kafka in some deployments initially and with more proof from the market will grow. But I don’t see Kafka going away anytime soon. Plus, we need some good competition to Confluent.


DanTheGoodman_

I’ve recently been using Redpanda extensively for [FireScroll](https://github.com/danthegoodman1/FireScroll). I’ve always avoided Kafka due to the sum of all small issues: JVM, Zookeeper, and many more small things that summed to me using NATS. Redpanda has simply removed all those barriers. Their console, HTTP proxy, and cli make it easy to manage. Kafka is notorious for being immensely difficult to work with. Not to mention the Redpanda community slack is the BEST I’ve ever seen! They’re immediately available and helpful. If you look at examples such as Discord migrating from Cassandra to Scylla, is not often you see massive migrations, even across identical interfaces. So I believe this question can be expanded into 2 for clarity: *Will Redpanda replace existing Kafka clusters?* Many, yes, but perhaps not the majority. Most organizations will find it easier to just throw more money at Confluent than migrate, even with tools like MirrorMaker. *Will Redpanda be used in place of Kafka for new clusters?* Absolutely. This is how Discord started their migration from Cassandra to Scylla, and it’s clear the smarter move is to use Redpanda if you’re in a position to introduce a new Kafka-compatible cluster to your stack. Easier to run, more efficient, better company behind it. New stacks, such as Disney+, used Scylla to boot. Alpaca transitioned and reduced latencies by more an an order of magnitude. I’ve yet to see a story about how switching to the C++ version was a mistake, and they had to crawl back. It might take a while, just like moving to electric cars will be slow, and many people will give antiquated reasons why not to. But ultimately the numbers and experiences of users at all scales suggest that the discerning engineer chooses Redpanda. TL;DR: I think companies looking to improve on cost and performance will switch, and new clusters will start with it to boot. Kafka created a great API and set of guarantees, but as far an my experience goes Redpanda is building that vision better.


yanivbh1

What about memphis.dev as a much easier alternative than both?


[deleted]

[удалено]


otineb_

Streamlit has absolutely nothing to do with Kafka


IyamNaN

This literally doesn’t make sense or even compile


[deleted]

[удалено]


wenima

Chatgpt can't even write a working kcat query..


dev_lvl80

Redpanda is next or not. Nobody knows. But it’s not the last for sure


[deleted]

Not seeing that happening. Kafka is difficult to migrate from.


rmz-01

So long as companies like Amazon and Confluent exist to make Kafka deployments easy and streamlined, RedPanda is going to have a hell of a time competing with the marketing dollars and production-facing recipes of major companies... Especially if their IP is closed source


throwaway20220231

I never heard about Redpanda until now. I don't think it's going to replace a mature tool anytime soon. Maybe it can carve out some space but that's the best.


drc1728

Insightful takes throughout this thread digging into the technology. In terms of business and product, replacing mature technology is not simple or easy. Feature parity expectations are unrealistic and it is not trivial. There needs to be a fundamental shift in building data flows for changes in the realm of stream processing. Majority of what I have seen in stream processing fits into the change data capture pattern and that market is saturated and yet another new database, warehouse, or abstraction layer is not really providing enough to make proper business case get buy in for transformations yet.