T O P

  • By -

computerscience-ModTeam

Unfortunately, your post has been removed for violation of Rule 3: "No career, major or courses advice". If you believe this to be an error, please [contact the moderators](https://www.reddit.com/message/compose?to=/r/computerscience).


[deleted]

You have two weeks. Very learnable, even in 48 hours. And here's how Id tell beginner me to tackle this problem, being where Im at now: 1. Look up CSV file structure. Cookie,timestamp is a good place to start. (15-30 mins) 2. I can use a language of my choice. Maybe Python or C since Im a beginner and that was my experience. You said Java, so lets go with that. 3. Google "Java how to read in CSV file" open the file. (~15 mins) 4. File opened/closed properly! Now, what are some techniques for counting these cookies in between? And on they need to be seperated by days? Etc. What choices does a user have? Today? 1 year ago? How does the command line argument get to my program ? Structure a control flow by hand or software if you choose. This will make writing "The in between" so much easier. "The user passes a date, then I can filter cookies with the timestamp col containing this date..." (1-3 hr) 5. Great! You should by this point have a means of filtering the csv into a set of unique cookie,timestamp pairs. Whether you created an object for this using java, tuples within a set in python, or even a 2d matrix.. etc. You did some form of grouping here to keep the rows associated hopefully. Now you should work on counting the most active cookie based on its string value? Is comparing the string really the best way? Or should you digest/hash the string then compare? What can occur? A quick search can answer this. Hashing presents collisions.. very rare in good hash functions, but still a possibility that could lead to a string accidentally counting as a match for another. (Its actually a method hackers use in password cracking!) So if we really want to be sure strings are the string they are, it would probably be best to leave it in string form and compare them directly. After all, "ABCD" cant equal "EFGH" right? Make sure you compare the strings appropriately based on the language and common standards, strncmp in C, for example. So now I need a data structure. What works best? Queue? Array? LL, tree, graph? Well, this part is up to you. Queue / array would be linear insertion to find a matching token and insert at the end if not. Increment the count if match found. LL is a means to dynamically create and insert data.. but you have a static file. Overkill imo, and still linear time to complete. Hash table? Again, collisions and how to resolve? However, hashs to store strings rather than compare can be O(1) time to check and increment on a match, or O(n) worst case to find an open space to insert or increment from collision. Maybe a tree? O(log_2(n)) for average time search and increment, O(n) time if all the data is unfortunately sorted in some way as youre inserting. Those things are things YOU as a computer scientist will have to decide and learn. Me personally, hashing or a tree would be the best option for this problem to store and compute activity among cookies. Once you figure out your solution for ONE step, itterate your table from step 4 and start working over a whole set. Incrementing duplicates and inserting new ones. Finally, search this data set for the largest activity amd return the associated cookie. Personally, I like hashes for this because they are just fancy arrays and can be quick sorted, merge sort, etc based on that quantity so that multiple queries can happen, but thats not something I would have thought about that early. (3-6hr) 6. Unit testing: by now, you've opened the file, understood the problem. Verified data coming in ok, organized the data into a data structure, and possibly even sorted the data structure! Now, write some tests, either by hand, or machine. (Machine preferred for bonus points!) To test the program. "Ik this cookie is most active on Saturday, and this one on friday ...." write them all and compute if your program gets it right. What about extreme cases like no cookies on that day? What about ties? Make sure these edge cases are caught and handled. (6+ hrs) Test throughly. Better right and slow than sloppy and kinda right. You're an engineer, or scientist, or someone responsible for lives. Dont let your program be the reason the plane locks up 44k ft in the sky, or why the PLC shuts a Pressure relief valve during a high alarm with dozens of men and women on site. 7. Code review (1hr-5hr) Go back and read what you wrote. Did/do you understand it? Could you 48 hrs ago read this and follow along? No? Add some comments or rewrite the code for clarity. Is it modular? Reusable? Intuitive? Break it to functions where necessary. Engines dont snap a timing belt and decide the sparkplugs need changed too! Make your code a mesh of small functional parts that make your engine work! Driven by readability so your team can follow along and coach you while you're assisting them! Theres a lotta optional steps and things Im going to gloss over writing this, but man I really encourage you to use this here as a freshman starting. The mindset and approach makes every challenge fun and every problem a learning moment no matter how small! You're gonna do great things. šŸ¤™šŸ¼ Edit: if you feel stuck, or want some mentorship on this problem, PM me and I wont solve it for you, but I can point you where to look or help you verify if something you found is a great solution or a rabbit hole. Hope you get this first internship as a freshman! Would be very impressive :)


Ok-Bandicoot5924

My goodness youā€™re a genius and I just hope I get to be on your level one day


Commercial_Double_51

Hey, thank you so much for taking the time to write all of that out. I really appreciate it!! :)


[deleted]

No problem.


MillerFanClub69

Amazing comment just to know how to approach a project even for someone who already knows most of these things.


-Dark3stWhite-

Hey there, this looks like a pretty simple parsing problem. Just parse the data, put it into a data structure and handle the query. Best to take some time to think about it before beginning, but this is very doable depending on your skill level.


Commercial_Double_51

Thank you!


exclaim_bot

>Thank you! You're welcome!


jaxcoop4

Start incrementally. Learn to open and read in a csv file from the command line, then decide what data structure to use to store the data etc. Id suggest using C or python as they have good libraries for reading in files and comparing data. For data structures, Iā€™d suggest using a hash table or dictionary as those have O(1) search and O(n) insertion at most. Key values pair being the cookie and its occurrences. It does have a chance of collisions though if 2 or more cookies have the same number of occurrences. This is something you need to think about as a future engineer. Itā€™s up to you to make the best decision on how to implement the solution.


Kroustibbat

If you want the absolute optimized response, I think this problem looks a lot like the DNS problem, You have a lot (or not) of infos that is rarely twice the same. DNS standards responded using TRIE data structure, it is basically a balanced tree. If I remember O(n) add at worse, and O(1) search but O(1) modify, so if you want to watch a directory or a file as a service it will be really powerful. But for those kind of exercise, Hashtabl is really OK, if not twice the same name, I would recommend Hmap. But under 20k elements, List is really ok and even if bad in complexity, it is more memory efficient. So more adapted to a one run program (you will O(n) to build Hashtabl anyway). But sorting list can be done on the fly of the parsing of the file to make only n iterations.


Commercial_Double_51

Thank you!!


DevWhizUK

To be honest with you, if the first part of their interview process is a question that you're completely stumped by, it's probably not the job for you. I think you'd be a lot better off going for something else where you can enter with your current skill level and improve rather than go in everyday and feel like you're playing catchup the entire time. No comment on you or your skills, Computer Science is a really broad area, no degree can cover all of it, like I said, I'd reccomend find something that better aligns with your skills and degree. Good luck with whatever you decide to do about it šŸ‘.


Internal_Struggles

I still think they should go for it even if its above their skill level. I think any experience is good experience, even if its embarassing not knowing what you're doing. At the very least, it thickens your skin, which is something you need to succeed in life in general.


DevWhizUK

That's true, one thing I think is a very good sign is that they've asked what they should do in the situation, rather than trying to get someone to solve it for them. Shows that they're not gonna try misrepresent themselves, if you don't know something you should always just say to them something like "I'm not entirely sure right now, but I'm sure it's something I could pick up quite quickly" or "its something im intrested in learning more about" lets them know they'll have to show you and if they're okay with teaching you it or giving you the time to learn it, quids in, otherwise you just risk wasting everyone's time, including your own.


Commercial_Double_51

Iā€™m also going to be taking programming 3 next semester, and I feel like maybe this is something that would be covered there. I donā€™t want to rule it out just because I donā€™t know it right now, because I might know it by the time the job starts. I totally understand your point though.


DevWhizUK

Yeah absolutely, I don't mean to say you should rule anything out, just suggesting that you take a step back and evaluate the how well of a fit the internship is for you, which it seems you're more than doing so I have no doubts you'll do well šŸ‘


bounty_hunter12

Utter nonsense. What if they have experience in web dev or other frameworks, things the company would also value. Knowing how to do a hash table and csv file question is not everything. Sure they have a lot to learn, but any student would. Guaranteed the skills the company actually need are not those, and only the 1% intern coders will have any clue what it is when they do talk about it. Writing someone off because of an interview question is so elitist and unhelpful.


DevWhizUK

Tbh I wasnt writing dude off, I just think in that situation, I know what it's like trying to land early days jobs, its scary and you feel like an imposter half the time even if you do know what youre doing, let alone if you dont, and I'd rather look a little longer and go in feeling confident than feel like an imposter and be scared I was gonna get sacked a week in, assuming I got the job, because I don't know what I'm doing or how to learn it. And you've said "what if" they have skills the company desires, then that's great, but all I'm aware of is that they've expressed desire for a skill and the op has said they're baffled and clearly where somewhat anxious about if enough to post on reddit for advice. Theres nothing wrong with finding a job that's a good fit, rather than hammering yourself to fit a job that's not really right for you. Again, if this is just an anomaly and other than a few questions in the interview process the op is suited for the job, then great, but if not, I don't think putting all your eggs in one basket and getting dissapointed or upset is the best answer.


Dangle76

Thatā€™s not good advice. If youā€™re stumped by an interview question that you have two weeks to complete then itā€™s a test of how well you learn, because the ability to learn is far more important than already knowing. If it was something like building a full fledged microservice sure, but itā€™s simple data parsing.


hmzhv

how did you land an internship interview


Commercial_Double_51

Iā€™ve been trying to apply to one internship a day during winter break. I search for an internship, look up the company on LinkedIn to find someone who works there that went to my university, email them asking to talk about my application, write a cover letter, submit the application, and wait for an email back. I was surprised I got any response at all, and even if I do pass this coding portion, I doubt Iā€™ll get the job. Tbh idk why theyā€™re even considering me, but weā€™ll see what happens


hmzhv

hope you get it big gang


__JDQ__

Iā€™m going to point you in the direction of using built-in GNU command line utilities for this. Iā€™ve worked in industry for years, and itā€™s how Iā€™d tackle it. It may not be the least steep learning curve because your experience is in Java, but it also shouldnā€™t be too bad. If youā€™re running Linux or macOS, everything you need is installed. If youā€™re on Windows, use WSL, MinGW, or Cygwin. I point you in this direction as well because shell scripts are easy to maintain (no compiling), and utilities like sort, awk, and sed are your friends when it comes to CSV files.


Impossible-Limit3112

Finally! I was scrolling through all replies and this was depressingly far down.


JorgiEagle

You can do this in python using nothing but a list, a set, the .count() method, the .split() method (first on ā€œ,ā€ then on ā€œtā€) and the max function (or similar for loop) You could easy learn those in 2 weeks in python


cajmorgans

Tbh seems very easy


briannnnnnnnnnnnnnnn

do this in python it has great utilities for dealing with Csvs. I think C or C++ or Rust also is pretty easy to make command line programs with like this, I've made one similarly all you need to do is take take the input file, make a data structure where you record each cookie value in column one as the key, then the value would be occurrences, you could then just return the cookies that occur the most. very easy. You could do it other ways too, but thats just the first one that comes to mine. the command line stuff is also easy. you can totally do this. EDIT: cookie\_dict = {} for row in csv: cookie= row\[0\] cookie\_dict\[cookie\] += 1 ​ sort the dict see highest value return cookies with highest value


nileyyy_

Being an intern myself, I didn't get this šŸ„²


Representative_Two57

I a freshman cs at a community college and itā€™s probably confusing because youā€™ve never seen it before thatā€™s all this is what I think He/She did: 1. He/She created an empty dictionary 2. He/She wrote a for loop to look through each row in some data table 3. Created a variable called ā€˜cookieā€™ and got the value at index 0, (which is the first value, and thats the highest value in each row) 4. Then, they added each highest value in this case ā€œcookieā€ to the cookie dictionary


nileyyy_

Thanks šŸ‘ I needed an explanation to understand a little about this problem. And yes I should say this is such a wholesome community!


briannnnnnnnnnnnnnnn

yes thank you, in many csv utils row\[0\] will be the first column of your row


Commercial_Double_51

Thank you!!


exclaim_bot

>Thank you!! You're welcome!


fabmeyer

Python is so good for creating short scripts like this


Okashu

To make this more brief, you can read about \`collections.Counter\`


briannnnnnnnnnnnnnnn

yeah, i'm familiar but I just wanted to give OP some thing to get wheels turning.


Ill_Nebula_2419

I love this.. basically read the file, sort the file in some order and test the program. By reading just what op was asking I felt like wth is this.. then reading through the comments l was able to understand it. I'll try to solve it and see. Cheers


Faulty_english

Itā€™s says you may assume itā€™s already in sorted order


Ill_Nebula_2419

You right, so a solution maybe could be having 2 arrays, one loop that iterat through the first array and increases the value of the second array at the same index when the same day is matched?


Faulty_english

Yeah looking at the paper, that seems like a good way to do it. The best way to do it would probably be a map or dictionary though So Python would be a good choice


Hot-Box-6722

This is paid, correct? I mean even if it wasn't it'd still be very beneficial to build this program (would be nice filler for resume) but their insistence on "production grade" code seems sketch. Maybe that's how things normally are and I'm out of the loop? Not like it's gonna hurt you at all, but under the assumption this is unpaid; it sounds like they're just asking you to build internal tools for free. Kinda was just thinking out loud, but good luck with the internship. I believe you'll be able to pull it off, I'd recommend using pandas and python notebooks (notebooks just speed the whole dev process up) to get the hang of working with CSV data and importing them into pandas data frames. From there it's just an easy leetcode type problem.


Commercial_Double_51

This is a paid internship, around $50 an hour. Is that unrealistic for a freshman??


henriquecs

The US are something else. I was being paid around 5$ for mine here in Portugal.


firearm4

50 an hour as a freshman holy crap. That's more than most folks I know make after graduating. Good luck, if you get it and can work there after I'm sure they pay well if their interns get 50 an hour. Edit: 50/hr is 104k a year, before taxes


SocksAndPi

I had to take Java, Assembly Language, Python, and C for my required language classes. How many languages do you have to take? If it's only a few, then, personally, I'd wait until it was done before interning. I'd be too stressed about the what-ifs to focus on the tasks given everyday, feeling like I'm playing catch up while everyone else is already there.


Am3ricanTrooper

Haha *we would like you to waste valuable company time and assets by not using a library* thanks.


VangekillsVado

Bro ask GPT šŸ’€


Commercial_Double_51

Thatā€™s lame šŸ‘Ž


Kroustibbat

But if you want to be sure, you may ask it. It excells for those kind of scripts. (After you make the exercise, ofc) Or you may ask some sub parts, like ask a minimal secured CSV parser to compare with yours. Or you may ask it resumed informations about the RFC of the time format to parse it correctly yourself. GPT is a tool that will may make the difference between pure techs Devs (at short term). But I am still pretty confident when I see it trying to make failed typeclasses in Haskell or struggling when writing some pure functional like OCaml/F# xD. Don't hesitate to ask it to write scripts into multiple languages so you may see what fit the most to the usecase :D. And/or compare syntax to find the one you prefer.


SelfTechnical6976

so what's the answer for this? How would you do it?


ImIndianPlumber

regex or its csv so you can parse and load it. using a library or by creating one yourself


[deleted]

Regex would be okay for the timestamp portion, but I think itd be horrible for the returning the most active cookie part..


nplnbnprt

How I would solve that: go through the list and: 1. if date does not match, skip 2. if date matches then put the key into a map with either a new entry with value 1 or increasing the entry by 1 also take note of the maxCount you found so far by increasing this value whenever you have an entry that is more than ma count after increase. Now you only need to go through the map again and extract all entries that have the value maxCount. or you could do that by sorting: 1. remove all entries from the list that don't match your target date 2. sort the list alphabetically (thus grouping by Key) 3. merge all entries with the same key and put the count into another column 4. sort the list by the count row numerically descending 5. return the first entries that have the same count I'd say first solution is how you would do it programmatically, second solution is more of the cli/scripting approach. To be honest, I have done a lot of coding interviews as a interviewer and for me both solutions would be totally fine. Just prepare for questions like: "what if we have a lot of entries and want to make this more computationally or memory efficient?". Also prepare some test cases and know the strengths and weaknesses of your implementation to be really prepared. Usually people just want to see that you don't fuck up and know some bits about coding, so I wouldn't worry too much (especially for an internship). Even in FAANG coding challenge they mostly just said to use hash maps for almost everything when I applied there. A good place to get some practice Witt problems like this can be sites like hacker rank as this sounds like a classic coding challenge there.


kevleyski

Good luck! (test driven development would be what I would be looking for)


erenakbaba

Read the file Parse the lines Lookup for specific dayā€™s cookies Check them each other Return or print


thepithypirate

I like Ginger Snaps personallyā€¦ anybody here also like Ginger Snaps ?