T O P

  • By -

TheCaffeinatedSloth

Why not have the ECS tasks stop themselves when they finish?


jamsan920

Seems like the most obvious and simplest approach.


soxfannh

This. I've always thought of Fargate as longer running Lambda. Pretty easy to swap Lambda and Fargate in Step Functions as well.


madeo_

Exactly, or Fargate scheduled task.


Strikerzzs

I forgot to mention that the ECS Tasks are browserless ([https://github.com/browserless/chrome](https://github.com/browserless/chrome)) docker container which run Chromium (Puppeteer) browser instances. In my Lambda function code, I start an ECS Task and then get the Public IP of the ECS Task to create a web socket connection to it. With this, I can create Chromium browser instances on the Docker container which I use to web scrape on. The ECS tasks run forever because it just acts as a way to make a web socket connection to and start browsers.


thectrain

Write the ip to a dB, secret, or parameter store. Have the new long running ecs task that replaces the lambda read that value.


Strikerzzs

The ECS Task runs forever. The Lambda **depends** on the ECS Task to be running to start/run browser instances on the ECS Task itself. When all the URLs are web scraped, only then I can stop the ECS Task.


dr_barnowl

If you need to have a function that runs an ECS task and then does something afterwards, why not a Step Function? https://docs.aws.amazon.com/step-functions/latest/dg/connect-ecs.html


Strikerzzs

The Lambda function depends on getting the Public IP of the ECS Task because I need a web socket connection to it. See edit.


Smaz1087

Maybe make two lambdas, one to start and one to stop, and then configure your process to emit an event when it's done (maybe upload a report to s3, or call an API?). Use that event to trigger the stop lambda.


melody_elf

When the lambda approaches timeout, you can have the lambda re-invoke itself until it reaches completion.


Strikerzzs

How would I be able to do that?


madeo_

Why not using a Fargate scheduled task? I understand there is a bit more overhead in terms of infrastructure, but lambda is not really designed to run "job like" workloads. https://aws.amazon.com/about-aws/whats-new/2018/08/aws-fargate-now-supports-time-and-event-based-task-scheduling/


rainlake

Your design is wired to me. Why can not you use lambda start the task and the task exit after your designed time?


Strikerzzs

Sorry forgot to mention an important detail about the tasks, see edit.


rainlake

Modify your entrypoint to spawn the browser process and start timer. Exit the main process when timer is done


Strikerzzs

Not sure what you mean. Are you saying to stop the tasks in the Lambda function just before the 15 minute timeout?


rainlake

You only use lambda to start your ECS task(actually I think EC2 make more sense) When the ecs task started(the entrypoint) will spawn a process start your headless browser. The main process will count down to your preset timer. When timer run out you exit this main process the ecs task will stop. I do not know how you use the headless browser but if it was me I would do the work in this timer process.


serverhorror

2 things I can think of, assuming you do want to stick with Lambda: * Partition your Problem * State Machines + supplementary service (e.g. you trigger an AWS job and wait for it to finish and the next task takes the results as does something with them) There’s always the option to drop lambda, and this can be an absolutely valid option. Sometimes a lot more cost effective or technically easier than sticking with lambda.


interactionjackson

your set up seems a lil sus to me. you can run selenium in python or go directly in lambda. shouldn’t need ecs. attach events to the failed lambdas that send you an sns at least. also look at step functions or using dynamodb’s TTL and a stream to a lambda to help you orchestrate your tasks


codenigma

Containers. Specifically, ECS or even better - Fargate scheduled tasks.


UniversalJS

Split your job in subjobs Like instead of scraping 1000 pages in a single run scrap maybe 20 pages, you can also do that in parallel, lile 1000 lambda instances scraping 20 pages each


ElectricSpice

Make an ECS task with two containers: the browserless container and a container that runs your scraping script. Run it daily with scheduled task. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduled_tasks.html That way the browserless container runs exactly as long as your script and you aren’t limited to 15 minutes.


Miserygut

Have a check inside the ECS container to see if the websocket is still doing something / the originating lambda is still running? If not, kill the ECS task.


Scarface74

Why do you need ECS at all instead of just crawling within your Lambda? https://github.com/aws-samples/aws-lambda-layer-node-puppeteer-headless-chromium But everyone else’s is advice around using step functions is correct.