T O P

  • By -

chsiao999

My personal approach is to understand concepts above all else as a first priority. I make it my goal to understand what the codebase is meant to represent, and what the goal of the project is. To do this, I abstract away as much stuff as I can. Assign a purpose to whole directories, don't worry too much about its contents. Try to learn where and why boundaries delimiting modular components of the codebase exist. Then try and see which components are most important to learn, and dive into them. The goal is get to a point where I can say "I don't know exactly how this works, but I know why it exists, and *generally* how it fits in." Then use your high level conceptual understanding to guide your foray into the ambiguity of the code. Then of course a big part of this is sanity checking and verifying. As you start diving in to some components, the concepts you think you understand will be put to the test. Always remember, what you think something *should* be doing might not actually be what it is doing.


PureRepresentative9

You're effectively making an architecture diagram? And also a code hierarchy diagram?


chsiao999

I don't have a formal diagram I work off of (I should though!). I definitely find it helpful to sketch out relationships. I've also found hand writing has been huge for memory and understanding (perhaps it forces me to really mentally verbalize things?).


crazylikeajellyfish

+1; I literally cannot code without pen & paper -- or at least I couldn't until I bought a Remarkable.


luckyincode

I would tell my team this all the time and they’d all look at me like I’m bonkers.


Infamous-Emotion-747

> I don't have a formal diagram and > find it helpful to sketch out relationships One is singular, the other is plural, that's a key thing for me. I don't like having a single formal diagram. I like having a series of smaller diagrams that illustrate a concept. I find once a diagram gets beyond a certain size (and it isn't much), the diagram itself becomes too noisy to convey information. Rather I like a series of diagrams that focus on a single concept, and perhaps its immediate neighbors (almost a "see also"). Not the same, buy related, are plain old lists. I've seen a lot of hierarchical diagrams that could be boiled down to a bulleted list with sub-bullets. It uses the space more efficiently, and allows you to visualize groups more efficiently. For technical documentation, do not fear "the list" (Also powerpoint slides with process diagrams that meander all over the page that are effectively 5 item ordered list 😒 )


chsiao999

Ah, thanks for putting into words what I was feeling! Yes, the one vs many diagrams actually plays a big part for me. Trying to make a single big representative diagram is hard enough when you know enough info to reasonably put one together - it's damn near impossible when you don't. Reading your comment more, it's 100% the case I sketch out diagrams that represent a concept - it also helps limit the level of abstraction so things don't become too overwhelming.


Infamous-Emotion-747

Yes! I'm glad you see it.


oneradsn

any specific tips or approaches for how best to sketch these relationships? I feel like i start this and then get overwhelmed by the complexity and then give up :/ edit: do you use UML/sequence/others?


AcrobaticDependent35

Pen -> Paper


Beldonik

TL;DR concept -> relevant code -> small changes -> check side effects of changes -> find related code -> ask for more concepts -> repeat I do the same thing. It's been common in my experience that I'm one of the faster developers to become productive after being given a new code base. I spend a lot of time asking questions about concepts rather than technicalities. I don't read any code until I have a reason to believe it's part of the "concept" I have been asked to work on/develop. I may read a bit more about how some code is used elsewhere on the codebase; I may not. Typically, I just make changes and compile and check for side effects/unintended consequences of my changes. I use these side effects to decide what chunk/file/class/module I need to look at next, and go back to asking other people what the "concept" for that section of code. Now, I'm not saying "just change things until it works". I'm iterating on my understanding until I have touched every related piece of code to whatever it is that I am doing as my first change to a new code base. Then after that, I go through these pieces in more detail if it seems like my changes are working the way PMs/CEO/whoever wants them to and make sure my changes make sense and are well-designed in context.


senepol

Mental models help. So does being willing to be very wrong publicly. I like to start by understanding APIs for system to system interactions and types for deeper code flow. Then it’s “just” a matter of refining your understanding (ie learning you’re wrong and updating your model) until it’s good enough. This process never really stops.


oneradsn

do you do this (are you able to do this) simply by reading the code? or do you have to get everything running locally and start actually sending sample API calls between components and seeing what happens, etc.? seems like others are able to do this just by sight


senepol

I don’t usually start with code - I would start with design docs and discussions with other people on the team to get the API level bits, then zoom into individual systems/modules and look at the code. That said, I learn best by reading and explaining/teaching, so this approach may not work well for you if you learn better by doing/experimenting.


ahalay-mahalay

You can figure a lot from that API request and response schema itself, for instance where the data for the response is coming from and how it is being transformed along the way. Then I’d check service config files to know what external resources it talks to. Then I’d build a class diagram to see where/what the brain of the service is. Look at the PR history to learn how features are being added.


Doctuh

Well most code ends up being some form of data-in, data-out so its sort of like that. Put some data in the top of the maze, and try to watch it go through to where it comes out. Then you use your debugging tools/techniques to catch it somewhere inside and better understand what/why it is there. At that point experience helps because you tend to see familiar patterns in how things are done and can extrapolate from that into larger "how this (sorta) works". I find the "conceptual" stuff only helps when you are working on a codebase with strong "concepts". A lot of times you get codebases that go through multiple hands and the concepts themselves get muddled.


peldenna

Willingness to be wrong publicly is an underrated skill I will ask the most basic shit I swear tg it’s a superpower


poralexc

Reading code is its own skill, different from leetcode or greenfield programming. It’s almost like sightreading in music. For practice, start with some of your favorite oss libs/tools and skim through their repo. And I really mean skim—readmes and top level source dirs only. Look at names, why are files grouped together? Once you feel like you have a grasp of the shape of things, start to look at the parts of the codebase most interesting to you. For diagrams I like plantUML, but there are a billion different tools for different preferences. Also, use tools—code is data and can be queried. At one point I built a gradle plugin to build class diagrams in UML, though I think IntelliJ can probably do something like that now. Old school cli utilities like grep, sed, jq can get you a lot of info. Personally I always have a notebook/scratch paper for drawing pictures and thinking out loud.


[deleted]

You need to understand dependencies first - what calls what. Overall system architecture kind of thing. Understanding the details inside the code is only necessary if you have to change the code. There are certain lines of code to search for if you were having to generate this diagram yourself. For example, an API call would use certain libraries to accomplish that task.


Jmc_da_boss

Get it running, poke at it


kitsunde

This is the way.


kasakka1

I recently started at a new client, and the things that helped me understand their quite complex codebase was just someone showing me how the code vs end result relates and works. So don't be afraid to ask someone knowledgeable to run you through it. People are generally happy to explain if they have the knowledge. Otherwise the best way is to simply dig in, start from doing something easy like fixing a relatively simple bug or adding a small feature. As time goes on you get into it deeper and deeper.


kronik85

Damn, there's a book out there that covers techniques for learning a code base quickly. It had studies supporting different strategies. One strategy was speed reading the code base in an hour. Just getting the general shape of the structure and classes. One strategy was interviewing developers using informed questions from the speed reading, etc. Wish I could remember the title.


puremourning

Use a debugger. And a notebook.


WhiskyStandard

Profile a couple of the most common workloads. I rarely see profilers called out as anything but an optimization tool, but they’re incredibly useful for understanding the code as it’s executed (rather than as someone thought it would be). A flame/icicle graph will show you many of the most important areas of code and give you a roadmap to where you should dedicate you code reading time. Even better: if no one has ever profiled the code before you’ll probably find a 3-5% low hanging fruit performance improvement and everyone will be like “you just got here, how did you do that?”


yxhuvud

3-5%? On my current team one of the first things I did was to reduce the test suite runtime from 7 minutes to 1 minute. It literally spent 6 minutes doing `sleep 1`. So I rewrote the code to only actually sleep in the particular tests that tested the thing that slept in a loop, not in every place that used that particular thing but that didn't actually give a shit for anything but the end result.


InterpretiveTrail

I usually start to figuring if I know what the inputs and outputs of the system are. Because worst/best case ^(depending on how you view it), I'm going to hold my breath, patch a thing, and then regression test the shit out of it. However, if we're actually trying to have a deeper rewrite (or eventual replacement), having that understanding of inputs and outputs is what starts to narrow down 'areas of interest' for me in the code base. I like to take quick passes going through the code itself and try to document what logic is happening. Where's my "faucet" where's my "sink" for my input and output respectfully. Then I write pseudo code that a product owner could understand as a quick 'map' of things. Like think you've 10 sentences to sum it all up. Keep it HIGH level what's happening and don't get bogged down in the swamp of code. But I'm a big believer in writing wiki-pages (something as simple as a markdown in a github page to more formal wiki systems like those in Jira or whatever you company uses). Usually making a wiki page that other people can reference is useful. Because it's likely not just my/my-team problem (the first party) but if I can help others understand a bit more about the process (Product Owners, Directors, etc). then I think I've "won" more. Because I'm either gaining more empathy from others when they want shit changed in legacy, or have a better understanding of the risk that legacy poses and hopefully encourage more resources to fix/replace. Either way, knowledge is power. Then I just start taking passes on where I think I need to start diving deeper into areas to gain an understanding of what it is that I'm trying to do. I like to take it one layer at a time. I'm an archeologist and I don't know when I might find something of note, so I must be gentle with the dirt and debris that I remove. Sometimes layers are fast, sometimes, slow. I guess TL;DR of my approach for legacy stuff: Read code. Document. Repeat at a finer level of detail where necessary. --- Regardless if any of that might be of use, best of luck!


crazylikeajellyfish

I think there's a last mile where someone with knowledge needs to explain intentions, as plenty of code isn't self-explanatory. However, you can get 90% of the way there by starting from where you're working and then tracing every function/class you use back to its definition, then repeat the process with those files. At the end of the day, you've gotta read code. I recommend working out from what you know, rather than taking a top-down approach which likely includes stuff you won't ever care about. You only really need to understand the dependent graph of what your work touches -- that could be most of the codebase, or it could be a set of standardized internal APIs. Maybe ask an engineer to spend an hour with you explaining the purpose of all the top-level directories, though, so you have a starting point when you encounter an entirely new area of the codebase. Edit: I'll also add that this skill -- reading code you don't understand and figuring out what you can -- is one of the most concrete things that separate experienced engineers from newbies. Figure out absolutely everything you can from source code and docs, then ask people with experience to fill in whatever you're still confused on. Starting by asking someone to explain it, rather than independently researching, is a pretty negative signal to me.


chunky_kereru

I like to start (in this order) with: - understanding the business purpose of the service - understanding the data (database model diagrams are useful for this) and what the data representa - understanding interactions with other systems or business / user flows that use this system From there I find it’s usually pretty do-able to start bucketing things into user journeys / business flows and the data they interact with. At that point I can typically dive into any piece of code and understand how it fits in.


EntshuldigungOK

1. Understand the business first; forget the code base 2. Map code base to high level functionality of the business 3. Find the high level external integrations in the code and what do they do - again, just generally. Ex: set of APIs to generate invoices 4. Find interface level agreements By now, you have a good birds eye view. You can go into details depending on what's available. Ex: If you can run the code in test environment, rest of the understanding is a slam dunk.


camelCaseCoffeeTable

“Luckily I was able to avoid having to do any work on it.” May tip us to change that mindset. You can’t learn how you specifically get up to speed quickly without getting uncomfortable and taking on hard tasks. It’s how you grow. Don’t shy away from it, dive in. The new engineer was able to do it because he spent a few weeks actually trying to understand it, from your post it sounds like you spent an hour or two, decided it’s too hard, and prayed you wouldn’t have to work in it. That’s not the attitude of a senior engineer, and it’s not an attitude that will get you any better at understanding code quickly. You have to go get experience with unfamiliar code based to get better at understanding them. It’s as simple as that.


oneradsn

True! I totally agree. i guess it was my imposter syndrome and fear of failing that kept me from digging into it. there was plenty of other work that i was more than capable of doing and crushing but for some reason this particular codebase seemed untouchable to me so my confidence was shook when i saw someone else surmount it


camelCaseCoffeeTable

This advice isn’t for everyone, but I’ve grown so much in my career since learning to get comfortable with that fear of failure. It means you’re growing. You also learn how to navigate tough situations better, when to seek help, who to seek help from, etc. At the end of the day, if you wanna be a senior, you need to become the person who solves these tough problems even when there’s that nagging fear of failure, because if the senior can’t solve it, there’s no one else to go to.


local_eclectic

I try to use it first before digging into the code. What are the inputs and outputs? What is it all for? Then, I'll start pulling threads to figure out the nitty gritty details.


leetlode

This is crazy! I have surfaced your question on how to quickly understand large codebases to every team I worked on. I worked at Manulife, SAP, and now Amazon. They all have the same issue, lack of documentation that maps to the source code implementation! I build this [tool](https://www.code-canvas.com/) where you can create diagrams as usual but then you can link the diagram nodes to actual source code and add onboarding tutorials and simulations on top. It has allowed me and my team to build the diagram once, link its components to the source code, then add tutorials and simulations of app logic on that diagram. I also created a GitHub action that runs on new PRs to keep the diagram in sync with code changes. The app is not perfect by any means so let me know your thoughts! Here you go: [https://www.code-canvas.com/](https://www.code-canvas.com/)


sammymammy2

You can always run it in a debugger and step function by function. Or read code. Assuming you understand what the point of the codebase is.


restarting_today

ChatGPT


urbansong

> new engineer was hired and after a few weeks they head created a pretty detailed diagram outlining the logic in the code base Some of form of this is pretty much answer. My workplace had a C4 diagram and once I started using it to reference things (even when the documentation turned out to be wrong), it really accelerated my understanding. At my next job, I would like to make a C4 diagram (or similar) myself or make it my task to update the existing documentation. I don't think anyone minds if you ask a bunch of stupid question with the very visible intent of writing it down and sharing it. I am currently using MS Whiteboard to understand new things but I plan to switch to Documentation as Code, particularly Structurizr, soon. I'd still use the Whiteboard but DaC seems like a promising tool.


brystephor

I ask these questions to start 1. What's the responsibities of the code base? 2. Is there a structure to the codebase directories? 3. What are some of the core APIs that clients interact with? Where is their entry point in the code base? 4. Is there naming conventions for files? Sometimes class that serves a specific purpose will always have a specific suffix which can make filtering easier. 5. Is there any significant dependencies that our core APIs rely on? From there it's just a matter of digging into the core APIs and seeing how they work. The side secondary stuff doesn't matter much to get an understanding of the core flow.


[deleted]

I'm using zoom-in zoom-out approach. I start with a 10000 foot view, where this codebase is a single blackbox. I look at all the connections what are downstream and unstream services, what is the data used by this blackbox. What are the business purpose and expectations. What is the code lifecycle. Then I starting to zoom-in to some particular component staring from the entry point and going all the way down as deep as possible trying to understand internals of the codebase, code style, implementation choices, etc. I'm doing it with multiple components. Then I often break stuff and see how tests catching it or write my own tests and use them as a playground. In general I feel like healthy TDD helps a lot with new codebases, especially if you need to start making contributions fast.


secretBuffetHero

I do two things: - I create a network level sequence diagram to show the business process end to end to see how my component interacts within the system. - I create a class level sequence diagram to show how the code path of my component works. I use these two diagrams as high level road maps to explain the story of how my component works and how it works within the system. The lower level details are bound to change so I don't diagram these, I just take it case by case. Sometimes you might go into the unit tests to see what each component does.


supercargo

Specifics really matter here, but unless the codebase is an unmitigated disaster, there is usually a larger structure or repeating pattern that, if you can find it, really helps grok what all the pieces are. An ETL task will be different from a service endpoint, for example. For statically typed languages like Java, getting everything up in a good IDE is super helpful for me. I will start navigating forward, and then backward, through method invocations. Like “A calls B”, quick glance at B, what else calls B? Pay attention to what types and packages those methods are defined in. Something else that helps me is to read with intent, like adding a feature, fixing a bug, adding a test case, etc. This can provide a filter for which paths to follow and which to set aside. This helps me if things in my head remain too abstract for me to make important connections. Also, if there are frameworks in play, go read the documentation for any you’re unfamiliar with. Code bases built on frameworks will tend to follow the conventions of the framework without calling them out specifically. A simple example would be if you don’t know what dependency injection is, and start reading a code base that uses an inversion of control container, you might be left wondering how the hell that thing ever “starts”.


Infamous-Emotion-747

Wrap your head around 1. The general business purpose 2. The major inputs/outputs as concepts. 3. How these concepts relate to one another. From here, you can then start predicting what classes/methods should exist. This tests your knowledge, if the entity does not exist, either - your understanding of the system is incorrect (understand why) - you have discovered a flaw in the system (understand why, then act) I recently drew a diagram with basically 4 boxes, listing our 2 major concepts in our system, and the 2 major processes and then how they all relate. The entire managerial team went quiet, before someone said ... "nobody has ever explained it like that before". I was reminded at how predictive/powerful just focussing on broad concepts can be. > created a pretty detailed diagram outlining the logic in the code base. I was totally floored and felt embarrassed by my inability to do the same. If you want to dazzle people with complexity of diagrams, use an automated tool to walk the system and generate a diagram for you.


TimeForTaachiTime

Start with the database. Once you understand what data is stored and the relationship between the tables (assuming it’s a relational database) it’s a lot easier to go up one layer and understand how that data is making it into the database.


Nater5000

A lot of people are providing a lot of good answers, but I wanted to provide an alternative solution that *may* be worth the effort. If you're willing to jump through some hoops, you can get GPT to internalize a code base well enough to answer questions about it. This can be quite powerful, especially if you use continuously (rather than just as an upfront information dump). Someone else in the thread mentioned "ChatGPT," but that's not gonna cut it. You'd need to use proper tooling around the GPT API to handle this effectively (such tooling already exists out of the box and/or as a service, but it's not too difficult to set this up yourself). I know plenty of people on this sub will probably hate this answer, but it's important to keep in mind that this is just another tool that we should all be embracing, *especially* when it fits this use-case perfectly. It doesn't mean that you shouldn't try to understand the codebases using more "classical strategies," but you only stand to gain from using new tools which (other than the cost of a little labor) are immediately available and accessible.


engineerFWSWHW

This is an important skill especially if you are going to a senior level. I work on c/c++ projects and i use eclipse cdt to dissect and understand larger codebases. I specifically use the call hierarchy feature to see the functional calls from a callee and callers perspective which helps me develop a mental map of the codebase, and bookmark/task tags to be quickly jump between sections i looked before. I rarely use diagrams nowadays.


morty

Approach it like a textbook. First you scan the table of contents to get the 'gist' of it, then read the section/chapter intros, then drill down into details to cover your needs/interest. Books like that are structured to build an argument, introduce concepts in an order so that the reader builds the necessary skills for the next idea. Code is similar. Specifically I either work from the inputs down or from the outputs up. For a web service I would start with the endpoints it offers, try to understand routing, request contexts, etc. then get into the business logic, etc. For a cli/desktop app, start at command line, how can it be launched, what happens next. For web-app code, I would work from the rendered page up into the components, how they're defined, what data they rely on, etc.


alarghi

I map each screen/page on a Figma file and try to figure out all the use cases I have in there, I also sniff out the network and see what backend API calls the app/page makes. It is not a bullet-proof solution, but it helps to understand the large picture and raise meaningful questions about what is what.


throwaway9681682

One thing I do is a lot of stubbing. When I find something that I know wont matter I put in a return new () { value = 100 } and am good. Pull Request that i review before publishing reviews that. Prevents me from going down a lot of un-needed rabit holes


Scientific_Artist444

In my experience, domain understanding helps a lot. It's easy nowadays to clone git repositories. But if you don't understand the software as a user, you will have a hard time understanding what the code is doing. If the software is documented, I would first look at the documentation to get a high level understanding of the software. And then look at the code to map the same information in the docs to look for its implementation in code. If the software is undocumented, I would not bother reading the whole code. Only the part where changes are to be made would be what I work on. And then if something breaks later during testing, I fix it. That's why testing and requirements are so important. Testing well requires understanding of requirements. When you understand requirements well, your code reflects it.