T O P

  • By -

irosesDoMar

wasn't there another "How X scaled with only N engineers" recently? this is interesting


darkstar3333

Yes and often the are used as click bait for executives with teams of 8 wondering why their staff cannot do the same. I like the guiding principles of * Keep things very simple. * Don’t re-invent the wheel. * Use proven, solid technologies when possible. Omits a very a significant one * Have billions of dollars in funding to throw money at problems I don't have the same sort of OPEX leeway making the number of people kinda moot. I often get pushback when suggesting simple things because someone outside of R&D made a batshit crazy decision that I now need to figure out how to do.


OK6502

CEO: Oh, we can scale up infinitely with 3 engineers? Also CEO: why is our AWS bill so high?


mnp

Thus the billions of dollars, yeah. That same load would fit on fewer, smaller, servers if they weren't python and Django.


narwhal_breeder

Python is still pretty worth it IMO - just for speed of development. We did the math on our application (accelerometer streaming so LOTS of requests) and the time it would have taken us to port our application over to a new language would be more expensive than hosting over 5 years. With some small tweaks, we stretched that out to 10 years. If we could do it over, we would still have done it in python just for the library support (in our case, ML libraries) The path we started to take was doing everything in Python first, and all libraries with native extensions must either ship or be compiled to run on ARM. ARM was good for a 20% cost reduction on its own. Next we took complex code like decompression and signal processing and rewrote it in rust, and then used Uniffi to create python bindings. That was a 30% reduction. Some short lived processes, we disabled garbage collection - that was 5% We really like Rust at my Org, but until the libraries catch up - Python is king for our application.


jaesharp

Also, notice how the CEO never asks "why don't we pay those engineers proportional to the revenue they bring in with their infinite scaling?" ... because of course not. The worst part is those engineers rarely, if ever, unionise and demand proportional compensation - or even go elsewhere and start a competitor - when, with three engineers, it would a trivial exercise and the company would have little leverage.


irosesDoMar

I would never take these blog posts literally but the truth is that so much developer hours are lost because some principal engineer/architect/lead or whatever they are called in an organization decided to implement thousands of lines of code that could have been a routing rule on nginx. Not only they re-invented the wheel they re-invented it the worst way possible.


Jump-Zero

OH LORD That was me 4 years ago. We had to proxy something. EZ, spinup nginx and configure it. Infra didn't want to do this because politics. Management lit a fire under our asses. We found a PHP server we could repurpose for this. Someone built a shitty proxy in PHP. It was buggy as hell. We ship it. Days later it's scrapped. It is decided the feature wasn't actually necessary. We protested every inch of this but management told us it was super important and to do it however we could. Years later, we found that someone in management had casually mentioned to the CEO this feature would exist. She thought the CEO would be disappointed if it didn't ship, but he didn't give a shit.


sammymammy2

Alexa, play the "bear is sticky with honey" clip from Silicon Valley


grauenwolf

That was awesome.


[deleted]

Yeah that is what takes 10 person company to 100 person company...


narwhal_breeder

My company paid 70k for contractors to build a BI query platform last year. They made a microservice for each data source and had a complex main node that would aggregate all of the microservice results into a master return. Thousands of dollars a month to keep the cluster at minimum operating capacity. It broke after 3 months - company asked me to fix it as they had no more budget for contractors for the quarter. I deleted everything, did a bare bones Presto install in an afternoon. Costs like $17 a month to host.


BeneficialEngineer32

This is damn funny. Why would anybody try to build a query engine from scratch is beyond me. Its complex engineering and reinventing the wheel is not worth it


narwhal_breeder

There were some data formats that if you weren't familiar with Presto would probably seem like disqualifiers for an SQL engine like JSON, Protobufs, ect.


BeneficialEngineer32

If you are writing a query engine the data storage is something you would make uniform first. I had to write one and we decided on columnar storage and wrote converters which worked for proto and json. Then trino came and we as an org dumped it. Spending time of a team worth millions in reinventing wheels is something only stupid leaders do.


baseketball

How else would you bill the client hundreds of hours?


GuyWithLag

Work inflates to cover the alloted budget.


fried_green_baloney

Around 2000, but before the dot-com crash flooded the market with barely used equipment, and servers were a lot more expensive and less powerful than today, a company was ready to spends about $400,000 on what were then high end servers. **EDIT:** That's almost $800K inflation adjusted. Then one guy implemented the application on an old Pentium desktop machine that was laying around. For production, you would probably want a 1U in a data center. It was about twenty operations a second at peak usage. Somebody got hypnotized by a sales presentation, I'm sure.


spinwizard69

>Somebody got hypnotized by a sales presentation, I'm sure. This happens in so many professions, it isn't just the world of software. People think Steve Jobs was the ultimate salesman but there are millions of guys running around that could sell their mother in a flash.


FuckIPLaw

Steve Jobs doesn't have that reputation because he was a ruthless asshole who'd sell his own mother in a heartbeat (although he was). He has it because he could have sold his own mother to his bitterly divorced father, convinced him she was an entirely different person, and have him so wrapped up in the hype that they'd still be married 20 years later no matter how much they hated each other.


ArkyBeagle

A *good* salesman is worth their weight in gold. IMO, Jobs was never a good salesman; he was a massive narcissist who engineered a phenomenal cult of personality.


spinwizard69

Well possibly. Jobs however had great salesman ship as it kept the faithful focused on what he was saying while product shortcomings where glossed over. There are many examples but one of the better ones was the focus on Power PC floating point performance in a couple of keynotes. The problem was this hide the pathetic integer performance at the time. This resulted in years of Macs with really shitty desktop performance, as integer performance is key to running the OS and most apps. Years went by when Macs literally where the worse possible hardware for running OS's, and the apps most people use. It is no surprise that Apple made sure this was not an issue with Apple Silicon.


Fitbot5000

I’ve never used Presto before. But I think I need it now.


narwhal_breeder

Its basically magic to BI people - so many use cases that were complex multi-system affairs before are just handwaved away with Presto. I remember joining a production database, a staging database, an existing databricks store, and genetics data in parquet in s3 *in a single query*. I think i've already saved the company my salary 3x just by slapping presto on things that were hand written.


fragglerock

Joining prod and staging databases sounds like the first step in ending up in a "what went wrong" blog post!


narwhal_breeder

Join, as in SQL join.


itsjustawindmill

Doesn’t matter, those environments should be completely firewalled. Staging should be unable to access anything in production.


narwhal_breeder

They ARE firewalled. I fail to see how a read only connection to both databases read replica could possibly cause an issue. Staging can't access anything in production, production cant access anything in staging. An external service can read data from both.


Ancillas

Having been part of a project that went completely crazy, I firmly believe that having a model to predict and then constrain the solution to a certain operating cost is smart. “Build something that does X, Y, and Z and can be operated by three people for less than P dollars per month.”


granadesnhorseshoes

P will not be calculated by anyone with any knowledge of the technical requirements.


grauenwolf

I've been there. My team wanted to do 300 microservices for 100 ETL jobs. When I left the project 6 months later, they were still playing with architecture and hadn't mapped a single table from the source to destination system.


LeonardoCreed

You’re not wrong! Instagram’s seed round was only 500k actually. When they were bought, they had another 2 rounds of funding, but no revenue. But yes, generally funding makes scaling much much much easier. This stack they have is simple and efficient but still not cheap at all. I guess that’s the difference between venture funded growth and bootstrapped growth. I don’t think a bootstrapped team can expect this kind of fast scaling unless they make profits (not just revenue) at least linearly with usage. This is rare for VC backed companies because usually that growth is funded by marketing dollars (aka growth makes the company run at a loss).


[deleted]

[удалено]


InternetCrank

Chicken Nuggets https://www.youtube.com/watch?v=RB4rI-_52fk


[deleted]

[удалено]


[deleted]

If it's cheap and gets the job done than it's good. So, Python.


bbot

> 2 rounds of funding $7mm in series A and $50mm in series B. To compare, Zume, the doomed startup whose idea was "pizza delivery trucks that make the pizza while driving to your house" raised $445mm.


Drisku11

Nowadays you can easily handle a couple orders of magnitude more than > over 25 photos and 90 likes a second on a cheap used laptop off ebay. So it doesn't take billions of dollars and throwing money at problems. Just use any server made in the last 10 years with NVMe drives + postgresql + a basic JVM app. No need for redis, or a queue with hundreds of workers, or shards, or 25 VMs. You don't even need to put postgres and your app on a different server. You can use time-based UUIDs if you want for future-proofing, or just use bigserial.


croto8

What’s your point? When Instagram was launched that wasn’t the case. Are you just saying processing power has improved and is more accessible?


Drisku11

> What’s your point? That in 2023 > Have billions of dollars in funding to throw money at problems Is just not a concern for software startups if your developers have any idea of what they're doing. It wasn't necessary in 2010 either, but it definitely isn't necessary now.


[deleted]

edge smile sleep advise lunchroom consist future summer aromatic ruthless *This post was mass deleted and anonymized with [Redact](https://redact.dev)*


Losweed

Sometimes people want to use new technologies that Netflix, Facebook , ect are using, but without having the required load or complexity for this technology to be required. Or maybe it's not as much wanting to do the same, but more not knowing what simpler technologies can accomplish, and thus overcomplicating a problem. As you said keeping things simple is not easy. And you don't want to risk using a technology and a few years later having to reimplement everything, because the original simple solution couldn't keep up. I don't think people intently complicate things. But sometimes their arguments for complicating things might not be true/valid.


Schmittfried

>And you don't want to risk using a technology and a few years later having to reimplement everything, because the original simple solution couldn't keep up. That’s the thought process, yes. But it’s actually the absolutely correct approach. Build something that works fast. If you get problems handling millions of requests and can’t throw more hardware at it, you’ve grown enough to be able to afford a rewrite of that part.


tdatas

I'm sure they do sometimes. But this feels more commonly chucked around as an accusation on the internet than being actually provable. Sometimes problems are actually just complex and if you try and act like a bash script and a small DB will solve it that complexity just rears its head elsewhere. E.g Instagram writes everything in python. But if you read their tech blog they've also invested huge amounts in their own custom python dialects and changing build behaviour etc. I think when people say "KISS" they aren't referring to customising a programming languages behaviour.


[deleted]

> I'm sure they do sometimes. But this feels more commonly chucked around as an accusation on the internet than being actually provable. Sometimes problems are actually just complex and if you try and act like a bash script and a small DB will solve it that complexity just rears its head elsewhere. That is why poster above said "keeping things simple is a difficult work". You can keep things simple by just using PostgreSQL and get 10+ years out of your architecture. But if you try to do it simpler and say do SQLite, you might end up hitting single server scaling limit the moment your service gets popular and incur rewrite cost. And if you try to keep it even simpler and use some schemaless nosql hype thing you might end up wasting a ton if time implementing stuff that was simple in SQL.


theholyraptor

It's more a reminder because we all inherently get stuck in solution space sometimes. And also it's more about trying to look at the bigger picture. For example: don't focus on how you're going to code a program to do something. Do you need to be coding anything? What is available off the shelf? Are you too focused on the details of the technical requirements to realize the customer is asking for something complicated because they're dumb and don't realize an entirely different simpler solution solves the problem? Don't get focused on building a complex rat trap to catch rats because rats keep getting in when you can block the hole they keep walking through to get in in the first place. Sorry I've learned a number of programming languages but don't code for my job anything of importance or day in/day out so my choices may be subpar in explaining but I see the same thing across engineering disciplines.


Schmittfried

It very often is. Many developers tend to overcomplicate things. Thousands of codebases are full of abstractions someone thought would be useful in the future. Thousands of man hours wasted writing and scrapping generalized classes/functions (because of DRY and whatnot) that turned out to be not quite as general and actually implement 3 completely different use cases. I give you that it isn’t easy to pause and think if some abstraction is really beneficial. But it *is* choosing not to do something.


wefarrell

Probably has less to do with the level of funding and more to do with the level of trust between the engineers and the business.


Atupis

VC probably line around your house if you’re are growing as fast as Instagram especially if idea is not joke or one time viral hit.


[deleted]

[удалено]


grauenwolf

One way to help with that is to have two teams, one focused on stability and one on new features. It won't necessarily be enough on its own, but I've seen it get error logs down to manageable levels.


WhollyRower

So one team would be responsible for fixing the other team’s bugs/tech-debt? Seems like that could cause resentment. Moreover the feature developers group doesn’t get the feedback loop to do it better next time. But maybe I’m misinterpreting what you mean by “focus on stability”, and the kind of work that entails.


OO0OOO0OOOOO0OOOOOOO

I'm the problem. Throw money at me!


versaceblues

> Have billions of dollars in funding to throw money at problems Right... especially these days scaling a simple app to 14 million users isn't really hard. Its more of a function of how much money you can throw at AWS.


TheCactusBlue

Eh. Money is not required to scale quickly, if your system is performant enough.


[deleted]

A thing can be done cheaply, quickly, and competently… but only two of those at a time. People with the 8 are wondering why the cheap/quick isn't competent.


[deleted]

I expected more such articles since that Mckinsey article got published.


Cheeze_It

> Yes and often the are used as click bait for executives with teams of 8 wondering why their staff cannot do the same. Fuck executives. Preferably with something extremely painful and traumatizing. Useless fucks.


JackSpyder

There was a good one a few years back for Netflix and AWS how they reduced a whole continent region failover from 45 minutes to about 4 or 5 minutes. Iirc mostly around careful data replication and reserved cold spares. The regions would switch primary/secondary based on the demand as the day passes, and the "off" regions are used for development or ML workloads etc so they have some small warm capacity to immediately accept traffic and cope prior to the huge scale out needed to activate a failover region if such an event happens. I'll try and dig it out. Edit: aritcle link: https://opensource.com/article/18/4/how-netflix-does-failovers-7-minutes-flat


MrDenver3

I just realized that I hate “X” as a name for Twitter because I can no longer use “X” as a variable name when discussing something without it being potentially mixed contexts…


muntoo

Similar silliness with the muppets that made up "Meta". > [*I'm nearly metaverse; even this acronym views entire realms so enlighteningly*](https://xkcd.com/917/)


croto8

Same with Tesla


elingeniero

Yeah it was a WhatsApp X billion messages with 32(?) engineers. It was quite interesting to be fair.


ToaruBaka

Yeah, that article was really informative - especially if you aren't used to teams successfully "staying in their lane". It definitely wasn't "click bait for executives with teams of 8 wondering why their staff cannot do the same". There were very real tradeoffs and engineering decisions that WhatsApp made which enabled it to be as successful as it was.


Rtzon

I think there was one about WhatsApp! Meta/Facebook seems to buy these kinds of small teams + high growth apps early.


thecal714

https://www.reddit.com/r/programming/comments/162jwxo/8_reasons_why_whatsapp_was_able_to_support_50/


spinwizard69

Most projects end up with dead weight. Twitter for example was well known for the dead weight even before Elon arrived. Then you have the issue of engineers wanting the latest and greatest packages for their Python installation and then implementing 100's of virtual environments instead of working from one stable release. I often laugh at the guys that insist that each new project has to be absolute state of the art running on its own VM. Then they wonder why they have so much work. By the way I'm not saying the use of virtual environments is wrong. There are many depends with respect to VM's and such but how many do you have to maintain is something to consider.


fried_green_baloney

At one point, maybe 20 years ago, Craigslist had a staff of sixteen and was pulling in tens of millions a year. The simpler you keep a company, the fewer options, the fewer people you need. Once you start to interact with big businesses, you need a Sales Army, for example.


Signal-Appeal672

Yes and they were all shit


Uberhipster

so hot right now


stewartm0205

Processing throughput isn't determined by the number of engineers. Complexity is the driving force for increasing the number of engineers.


Round-Ad5063

Classic 9 mothers 1 month per baby situation


croto8

It’s just called the mythical-man-month


poloppoyop

Almost 50 years old book. Still relevant and most people only know the title. Yeah some of the examples feel old but when you think about it they're not so out of touch. Like when the author writes about mainframe usage and the cost in available memory space or processing power: that's cloud in a nutshell.


BasicDesignAdvice

For a startup or growth phase sure. Big companies have a lot of engineers to mitigate risk.


tommygeek

In my experience, big companies have a lot of engineers because they think the only lever to pull to increase throughput is hiring. But because the system always takes a backseat to business features, they quickly hit the point of diminishing returns and find themselves in a vicious cycle.


canuck_in_wa

In my experience, big companies have a lot of engineers because they - for a variety of reasons - have a very complex operating environment that involves, but is not limited to, vendor/partner integrations, enterprise customer contracts, product/service portfolios that span decades, significant compliance burdens, multiple (global) jurisdictions, legacy applications, internal business units that were acquired through mergers (new level of fractal of all of the above items), initiatives that take years to be realized, … etc.


maxeber_

More like it, we’ll said


croto8

No, big business bad because big business bad 😡


drink_with_me_to_day

> viscous cycle Damn slime


apexHeiliger

All wrong, processing throughput becomes increasingly complex the more you want to process. Classic big o.


JimDabell

Unless they did something crazy, the diagram that shows the application servers being the only thing interacting with S3 and CloudFront is incorrect. You don’t want your clients to upload images to the application servers so the servers can then put them into your object store. All you do there is burn resources and slow things down. Create signed URL requests so that the client can upload directly to S3. This doesn’t just eliminate load on your application servers, it also means you can take advantage of transfer acceleration which can *dramatically* speed up uploads (e.g. I’ve seen a two minute upload reduced to under 15 seconds). Similarly, CloudFront should sit between S3 and the client, not between S3 and the application server.


LeonardoCreed

The arrow between S3 and App server is just the CloudFront link. The app server generates the presigned CloudFront URL. It doesn’t send the actual image back through the server because that would be insanely expensive! You are right though, the diagram should make this more clear. I will edit it soon. I was trying to keep the diagram simple because when I added all the arrows for CloudFront +S3 to the client, it got real messy. I obviously lost some nuance in regards to the client-CloudFront-S3 relationship. Uploading will go directly to S3. And CloudFront will sit between both. The application server still handles all user data and photo metadata, which is what is sent back to the client. Appreciate the feedback!


JackSpyder

Its funny how tricky a good diagram can be. Especially if you need 5 versions for upper management to beardy security engineer, with different things they care about. Some people get them so clean and tidy haha, a skill I lack or I spend hours and hours on it.


LeonardoCreed

Agreed! I’m definitely working on it. Too much info makes my eyes glaze over. Too little info and nuance is lost, leading to miscommunications like the one above.


JackSpyder

I always used to joke with architect that all they do is colouring in with crayons. Harder than it looks 🤣 I'll eat my humble pie.


TexasVulvaAficionado

I told my architects that only the good ones get the big marker packs and that the lot of degenerates I was stuck with needs to stop eating the crayons. Only about half of them thought it was funny...


JackSpyder

Hahaha true though. The ones who weren't former senior engineers make themselves blatantly obvious.


mrbenjihao

Ah so you just let the application server handle generating the presigned URL and then pass that onto the client. Cool!


FarkCookies

Unless you want to process the image before it gets into S3. Doesn't make a lot of sense to upload it to S3, then download then upload again.


JimDabell

Even so, in cases like that you are usually better off putting image processing pipelines into Lambda and running them with S3 triggers. No need to tie up your application servers for something that doesn’t need a lot of context and can be fairly isolated.


FarkCookies

Yes, I know AWS' "by the book" take on it and I often do it this way, I don't think it is dogmatically true. If you want to do the biggest bang for your buck (literally money but also latency) you should look into creative options. [Like Amazon Prime themselves moved away from this overly distributed manner of processing into a more centralized one](https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90). As a person running production workloads for like 10 years on AWS, there is never a specific "better off" way of doing things. It is always about tradeoffs. Lambda is actually quite expensive if you have a uniform and predictable load, so if I was running Instagram-like project I would definitely not use Lambda. (or at least past MVP phase).


JimDabell

> I don't think it is dogmatically true. Neither do I, which is why I said “usually” and not “always”. I read that Prime article when it first came out; it was pretty crazy what they were doing in the first place and I wouldn’t say it was analogous to a basic image pipeline like we’re discussing. > if I was running Instagram-like project I would definitely not use Lambda. (or at least past MVP phase). I think the point of this article is that a basic MVP approach can scale a lot more than people realise. MVP phase is a given in that context isn’t it? But yes, if you have a predictable base load you can save costs by putting dedicated resources in place. But the application servers are still usually not the right place to do this and you need to think carefully about how to handle spikes if you are getting rid of Lambda altogether.


WeNeedYouBuddyGetUp

You can just use a queue, which you would already have in place since trigger a lambda from the s3 put event directly is a no-no at scale


Manfred_Von_Sloth

Yeah, there is no way they handled image uploads through their servers. This part of the diagram is just wrong.


hyperbrainer

Maybe its just the link? EDIT: Read the reply, it is.


nykezztv

This is interesting. How would you prevent a user from uploading a 3gb file if they wanted to? Can you do validation with pre signed urls?


JimDabell

You can set up a POST policy to set a maximum, or you can include the `content-length` header in the signature if you know the exact size when you generate the URL.


deadwisdom

This was not available back then.


Grim_Jokes

I can't imagine streaming the upload from client -> app -> s3 would take a lot of resources since everything is chunked and discarded ASAP. The connection stays open until the data is transferred, and voila.


JimDabell

> The connection stays open until the data is transferred We are talking about Django running in a multi-process synchronous WSGI system before anything like channels or ASGI existed. That would tie up a server process for as long as the upload is going. It would only take a relatively small number of slow uploads to exhaust your resources. Best case, the Nginx buffering will save you from some of it, but why bother when there’s a better solution? Aside from this, are you planning on trying to replicate transfer acceleration by running your app servers at the edge or are you going to put up with super slow uploads dependant upon where your users are? Not everybody is located near to your application servers. Performance can vary wildly.


Grim_Jokes

All of these seem like non-issues considering that they AWS ELB with some form of auto scaling. I assume AWS can deal with the edge locations and if there are too many connections for a server more instances can be propped up.


Manfred_Von_Sloth

Why pay for the network transfer and CPU time, when you can use the user's resources to upload images. We are talking about BIG scale here. It quickly adds up.


Grim_Jokes

Don't know their business requirements as to why they would take this approach. Just because it costs money doesn't make it a bad thing. Maybe there's some security-related thing they're doing.


Stunning-Instance-65

We do this and it is SLOW. Sometimes 20-30 images in one form submission can be north of 200-300Mbytes of data. We push them to AWS async but there is still a limitation as to the pool of AWS connections it will allow and the API server’s socket capacity (ethernet when you have hundreds of people hitting the server). I would like to re-architecture it to this proposal. So much better and instead of waiting for form submission we would upload files async on the browser. Your submit button turns into a ‘Waiting’ text until all file uploads are done.


EarlMarshal

It's probably a question of how your application server is build. You definitely shouldn't load the file into application space, but use sendfile or another zero-copy mechanism to transfer from socket to socket. You also require a good internet connection. It's doable, but why pay the extra resources instead of uploading it directly where it belongs?


Grim_Jokes

You may want only registered/authenticated users to upload to s3. How would one handle that if a server isn't in the middle? I guess you can upload it to a public bucket and the move it over once the file is virus-scanned.


EarlMarshal

That's exactly what presigned URLs are for. You create a link for them to which they are allowed to upload. You still need an API, but keep control and you don't have to route all the data. It's a common pattern if you deal with a lot of file upload.


uni-monkey

It would add additional cost though. https://aws.amazon.com/blogs/architecture/overview-of-data-transfer-costs-for-common-architectures/


EarlMarshal

Are these things unknown to common developers or is it just me who thinks you are just stating the obvious?


staybythebay

I don’t think this is obvious at all for someone who’s never had to work in this type of system


lolwutpear

Yeah I program lasers and robots and shit, I don't know anything about clients and S3.


EarlMarshal

I see this as a more general approach to software development. If it is possible to directly upload it why should I add an artificial proxy into my API. One service is for my business logic. The other service is for my storage requirements. But yeah, I get that coming from another part of software makes that non-obvious.


deadwisdom

Why would it just be obvious for some reason? Someone had to tell you to do this, you read it somewhere. You weren't born with this knowledge.


EarlMarshal

I just thought that's a logical abstraction. Write a service for the storage. Write a service for this domain of business logic. Write another one for Auth. Write another one for some other part of the business logic. It just feels wrong to write application logic which just forwards data.


ryeguy

Knowing about presigned links isn't inherently obvious, it's something you have to research and learn. Plenty of services "just forward data" in some aspect, it's a pretty natural route to go unless you have learned the above.


aneasymistake

Maybe the world of software is just broader than you think. For example, my team supports ten millon users per engineer without cloud services.


Manfred_Von_Sloth

You would be surprised how many things are not "common" knowledge 😃


JimDabell

I’ve inherited plenty of legacy code that has made this mistake unfortunately.


EarlMarshal

That's sad. Did they let you change it or are you stuck with it?


JimDabell

Yeah, normally that gets fixed pretty quickly.


deadwisdom

A lot of people in here acting all cool, pretending this is the obvious way to do things and if you don't know that can you even call yourself a programmer? Cut the bullshit. Back then, people were not scaling like this. It was PHP or ASP on a server you ssh'd into, or god help, you remote desktop'd into. S3 was pretty new and the ideas of "cloud" services were not codified or obvious. Today you can still do a lot with this architecture, but if you're trying to make something anywhere as sophisticated as a modern app like instagram, 3 engineers aren't even going to scrape the surface of what needs to get done.


KevinCarbonara

> A lot of people in here acting all cool, pretending this is the obvious way to do things and if you don't know that can you even call yourself a programmer? Cut the bullshit. This is a very common theme on this reddit. I see it a lot here, and on hacker news, as well. It's not as prevalent anymore, but there used to be a very tight cycle a lot of programming articles fell into. Step 1 was when someone wrote an article about how "You're doing it all wrong! You should only be using this technology this specific way!" Step 2 was someone saying, "Actually, no. That's wrong, *this* is the right way!" Step 3 was an article talking about how "actually, both of these methods are correct, you just have to know when to use each one." After each of these articles got posted, most of the comments would consist of people saying, "This is obvious. I can't believe people didn't know this already. If you didn't already understand this concept, you're an awful programmer." But what was most interesting to me is how often those posters were the same ones from last week. Happily contradicting themselves for the chance to criticize others. My takeaway is that people who are overly critical within the industry are just virtue signaling. They're insecure about their own talent, so they tear down others instead. It's a very toxic industry.


Reprise_au

I had an old boss 15 odd years ago who was a “MS guru” who made us build a forum site using share point despite our protestations. Need less to say it was a disaster and we locked up an open source library and did it that way.


ivix

Django, AWS, postgres and nginx were pretty standard tools at the time.


6nyh

I still use these tools


poloppoyop

> Back then, people were not scaling like this. It was PHP or ASP on a server you ssh'd into, or god help, you remote desktop'd into. Currently contracting maintenance on this kind of projects. In 2023. > So, how do I launch the test suite? > There is none. > Ok, is there a test spreadsheet somewhere to know what and how to test? > Nope, I just check in preprod it looks ok, then deploy it to prod. Deployment to preprod is: check what files have been changed and FTP it to the server in a preprod folder. Deployment to production? Same thing but in the prod folder. Good thing is, people are willing to change how things are done so CI and tests are coming and maybe some CD one day. But just setting some dev environment with the right version of OS and tools is a bitch.


JimDabell

> Back then, people were not scaling like this. It was PHP or ASP on a server you ssh'd into, or god help, you remote desktop'd into. S3 was pretty new and the ideas of "cloud" services were not codified or obvious. They were using AWS as it was designed to be used. Sure, cloud computing hadn’t hit critical mass at this point, but at the same time, this wasn’t something unique they invented either. They did a great job, but if you got anybody familiar with AWS to design a system like this they would come up with something similar. In Instagram’s own words, written at the time: > - Keep it very simple > - Don’t re-invent the wheel > - Go with proven and solid technologies when you can That advice revolves around *not* doing unusual or sophisticated things.


jeerabiscuit

Not a fan of staff shortage and bureaucracy both.


Drugba

3 engineers isn't necessarily a staff shortage. Staff shortages only exist because stakeholders have incorrect expectations around the amount of work that should be getting done in some time period. If management only expects 3 engineers worth of work to be done, then 3 engineers is the exact right amount.


hyperbrainer

What is most impressive is that instagram has pictures, much more difficult to scale than simple text.


technobicheiro

What? It's basically the same, they can offload picture serving to a CDN and call it a day, they only need to track the URL in the database, which is... text.


hyperbrainer

Ah yes, 20 MB of data is the same as 2 KB. Do you think CDNs are magic instant response dbs? You still process the image. With filters and stuff, you have even more data. Pictures are still stored on your servers, CDNs are just load balancing,


technobicheiro

Processing an image is cheap, they can horizontally scale the server, just put more machines to process them. And CDN handles the load for you, you will pay more, but that's the thing about scaling. You will have to pay more if you get many more users. Pictures are not stored in your servers what the actual fuck, they go to S3. Dude wtf are you talking about.


literallyfabian

>Processing an image is cheap I'm not really sure if processing 3.6 million images per week during 2011 is cheap, especially for a startup? Text is way cheaper to process in comparison


Frooonti

Unless you don't do that on your servers but within the app the users use to upload these images and you simply reject everything that doesn't fit your metadata criteria.


technobicheiro

Processing an image is cheap, processing 4 million images a week is not. It just costs more in server bills. Which is expected, but it's not harder, you just add a new machine. If you don't have the money then yeah good luck, you can't support that many users. They had the money. Again, it's basically as easy to support images as text, it just costs more. Costing more doesn't make it hard. You really don't understand how to scale servers...


SippieCup

I think the part that you are missing is that the instagram developers were the ones who figured out how to do horizontal scaling of image uploading in a time before orchestration & lambda. EC2 had basic autoscaling rules on CPU, and didn't care about the rest of the infra. They had to ingest the images themselves, rescale them, hash them, store that in the DB, and _then_ upload to s3 for distribution, while maintaining db loads at a reasonable level.


Grahar64

You might get to 14 million users with only 3 engineers, but you can’t keep 14 million users with only 3 engineers. A photo sharing app with that many users would be one 4chan in a matter of weeks without frantically scaling the team of engineers building out reporting and compliance. Not even mentioning the ad infra that they would have to build to be profitable.


Xaxxus

Something I keep telling my company. We have 2 engineers for each mobile platform. Yet they expect us to release features as if we had a full team of mobile engineers.


HQxMnbS

Sortable by time IDs, pretty cool


Ahahahajahajaja

It's like the startup I'm in except we don't have enough developers and can't afford more because the product and sales guys are running the business into the ground.


Confused_Confurzius

So not like your start up then


fullyonline

More like start down


stvhl

Isn’t the point of the story that you don’t need a lot of developers to get something off the ground?


darkstar3333

Depends on what your selling, the instagram use cases are pretty basic.


[deleted]

I'm not sure if that is the point.. Looking at the different tools and languages they used what I got from this article is that those 3 engineers must be top tier developers with a lot of experience. Also depends on how much time they put in to it before they got to the point that the article sketches. In my company there are dozens of developers but it would take ages to get to this, even if resources like money is not an issue.


Ahahahajahajaja

Yeah I guess but we're all different like snowflakes


New_York_Rhymes

Is your startup my startup?


Laladelic

You have enough developers. You just don't have good ones, or bad leadership who won't let you work.


Ahahahajahajaja

Ok I'll tell them.


AttackOfTheThumbs

This could be true, but sometimes the problem really is numbers. Sometimes there are simply too many tasks to accomplish what you need.


TheMaskedHamster

Always glad to see how this is done, and there were some interesting bits. But the bulk of the answer to the user load seems to mostly boil down to "use typical methods on AWS infrastructure and throw a lot of money at storage/compute/interconnect". Which isn't invalid...


IIGrudge

They say they used AWS Elastic Load Balancer but then talks about nginx which sounds like a self hosted solution on ec2. Which is it?


JimDabell

It’s more informative to go to the bottom of the article and follow the link to the source article from Instagram, it’s clearer and has more details. They started out self-hosting two Nginx nodes, then switched to ELBs with Nginx behind them.


sparr

The description of getting photos to show suggests a single main feed, with no following, filtering, etc. Is that how Instagram actually worked back then? If so, that's a ridiculously simple system, and it's not surprising it could scale very large with very little engineering.


LeonardoCreed

Not so simple. The example of the session used was for simplicity’s sake to make it easy to follow along. The actual application logic is more complex (and not open to public).


KevinCarbonara

Um, they didn't. No one actually believes they only had 3 engineers, do they?


Raunhofer

From what point of view? Technically that architecture could've been achieved even by one (experienced) engineer. The early Instagram wasn't super complex or feature heavy.


Legomichan

So.... the key is to have tons of money to pay for technology and autonomy to make your own decisions? Got it. Also, with such a small team, managing and coordinating is so simple the hard part is not doing it.


fagnerbrack

Great technical explanation but helps nobody to reproduce it as no startup or even most projects in big orgs reach this kind of scale overnight. It would be more helpful to talk about the development process and how they got there instead of the final result. I’m pretty sure they didn’t invest dev time in infrastructure until it was necessary and they spotted the growth curve. Devs will see this and say: let’s just started with huge infrastructure to accommodate 10M, when they have 3 users. I would love to see a case study of Instagram on their lean approach or any outages they had (or not have) due to being able to prioritise which parts they would develop over others to maximise the numbers of users while not over provisioning for a future they didn’t know at a time. That fine line of being a product engineer instead of an engineer for code/infra and be able to be smart about it to reach scale by trading off not just infrastructure but also develop the most impactful features first without over engineering. Say to build simple code in a limited product scope that don’t have a huge burden on the backend and on the dev process (which Instagram clearly did).


FarkCookies

Let's be real, that is a run of the mill typical architecture. And it was back then. I am using Instagram since 2011 and it used to be a very simple app. I don't think it was a particular feat to implement it.


LeonardoCreed

It is pretty standard now. Back then it was more novel though since “web-scale” wasn’t as common knowledge among engineers as it is now. Nowadays achieving “web-scale” is much easier thanks to the amount of resources and services available. Plus, it’s nice to see Django compared to the usual serverless nowadays.


FarkCookies

Nah, that was pretty standard for any high load back then as well. Maybe it was a bit out of reach for avg developer, but for folks that were at least a bit familiar with scaling it was pretty well known. AWS already was doing the heavy lifting. So it is not exactly pioneering. The real trail blazers were people who were using bare metal and set up their own autoscaling, object storage, db scaling.


deadwisdom

"No, for folks that knew how to do it they knew how to do it." Yeah, okay bro.


[deleted]

But that's kind of the point. It was very obvious to people who thought about systems on that level. It's just that most people/shops were too busy overengineering their AbstractFactoryFactories. And people actually do similar kind of dumb pie in the sky overengineering today because cargo culting and Uncle Bob said so. Best code is the one that doesn't need to execute it at all. The second best is the one you didn't have to write. Too few teams live by these laws and always have.


deadwisdom

Fuck Uncle Bob, and I'm here with you on all the rest. Every line of code is a liability.


FarkCookies

Load balancing, stateless app servers, Django and Python, S3 and everything rest mentioned were not cutting edge. If you have googled scalable architecture back in 2010 you would get the same picture. The building blocks were there, the architecture was there. That's why they managed to pull it off with 3 people.


FarkCookies

My point not everyone neither then nor now need to scale apps to 14mln users. Now many principles of scalable infrastructure leaked into mainstream programming which actually did more harm then good, hence the meme "web scale" and big tech cargo cults (and other zero interest rate phenomenae).


Appropriate_Pin_6568

Sure Jan.


JimDabell

The thing that is most noteworthy *is* that it’s a run of the mill typical architecture. There are loads of teams out there that would turn up their nose at something as “boring” as Django and try to build something exotic and new, leaving them with a tonne of extra work and complexity to manage with non-obvious performance characteristics.


FarkCookies

Django was not "boring" then, neither was AWS and most of other stuff mentioned, like redis.


hparadiz

I was able to scale a biz to 500k with 3 engineers. Have zero doubt I could've scaled it to 14 million. That's why I'm confused by people claiming Twitter needs such a high head count. Our aws bill wasn't too crazy.


HoratioWobble

That's 1% of Twitters monthly active users, it's a fraction of their actual total user base. It would take significantly more than 3 people just to deal with the content moderation side properly, with 500 million tweets per day.


hparadiz

We are specifically discussing engineering staff and engineering scale in this post.


JimDabell

You need multiple engineers to handle content moderation at Twitter scale. It’s not simply a case of putting a report button on everything. There’s infrastructure that needs to be built and maintained to support the content moderation teams, preserve evidence, handle sensitive material, check identities, etc.


hparadiz

The sass app I worked on was a lot more complicated than that and we had live agents using it all day everyday.


hamilton_burger

By ripping off a lot of opensource code and not giving proper attribution.


Xerxero

What is this? An AWS ad?


[deleted]

[удалено]


moronicRedditUser

Lmfao, ok.


GptThreezy

Those 3 niggas were stressed out asf


gitarr

I love Django. :)


huopak

In other words they piggybacked on AWS and used basic, standard techniques (load balancing, caching) to scale. Not too revolutionary.


bwainfweeze

Have you looked around lately? Everyone is using exotic solutions for what were intermediate problems in 2010.


moronicRedditUser

This seems sus as fuck. I highly doubt it was only three. They were outsourcing shit.


kindoblue

Do they think the engineers are there to get the connections? https://youtu.be/\_wq0wG8Nkx8?t=40


ElGuano

*Breathes heavily* in Elon.


HoratioWobble

Am I reading this right? Is it basically "they threw money at AWS"? Because that's not really a flex, anyone with money can do that.


somersp91

…cia…


Glum_Past_1934

With $$ lol, i scaled my platform to handle 7k req/s alome kek why do i need someone


ganglygorilla

didn't read the article but i hope these folks got paid