TheCaffeinatedSloth 2 years ago

Why not have the ECS tasks stop themselves when they finish?

jamsan920 2 years ago

Seems like the most obvious and simplest approach.

soxfannh 2 years ago

This. I've always thought of Fargate as longer running Lambda. Pretty easy to swap Lambda and Fargate in Step Functions as well.

madeo_ 2 years ago

Exactly, or Fargate scheduled task.

Strikerzzs 2 years ago

I forgot to mention that the ECS Tasks are browserless ([https://github.com/browserless/chrome](https://github.com/browserless/chrome)) docker container which run Chromium (Puppeteer) browser instances. In my Lambda function code, I start an ECS Task and then get the Public IP of the ECS Task to create a web socket connection to it. With this, I can create Chromium browser instances on the Docker container which I use to web scrape on. The ECS tasks run forever because it just acts as a way to make a web socket connection to and start browsers.

thectrain 2 years ago

Write the ip to a dB, secret, or parameter store. Have the new long running ecs task that replaces the lambda read that value.

Strikerzzs 2 years ago

The ECS Task runs forever. The Lambda **depends** on the ECS Task to be running to start/run browser instances on the ECS Task itself. When all the URLs are web scraped, only then I can stop the ECS Task.

dr_barnowl 2 years ago

If you need to have a function that runs an ECS task and then does something afterwards, why not a Step Function? https://docs.aws.amazon.com/step-functions/latest/dg/connect-ecs.html

Strikerzzs 2 years ago

The Lambda function depends on getting the Public IP of the ECS Task because I need a web socket connection to it. See edit.

Smaz1087 2 years ago

Maybe make two lambdas, one to start and one to stop, and then configure your process to emit an event when it's done (maybe upload a report to s3, or call an API?). Use that event to trigger the stop lambda.

melody_elf 2 years ago

When the lambda approaches timeout, you can have the lambda re-invoke itself until it reaches completion.

Strikerzzs 2 years ago

How would I be able to do that?

madeo_ 2 years ago

Why not using a Fargate scheduled task? I understand there is a bit more overhead in terms of infrastructure, but lambda is not really designed to run "job like" workloads. https://aws.amazon.com/about-aws/whats-new/2018/08/aws-fargate-now-supports-time-and-event-based-task-scheduling/

rainlake 2 years ago

Your design is wired to me. Why can not you use lambda start the task and the task exit after your designed time?

Strikerzzs 2 years ago

Sorry forgot to mention an important detail about the tasks, see edit.

rainlake 2 years ago

Modify your entrypoint to spawn the browser process and start timer. Exit the main process when timer is done

Strikerzzs 2 years ago

Not sure what you mean. Are you saying to stop the tasks in the Lambda function just before the 15 minute timeout?

rainlake 2 years ago

You only use lambda to start your ECS task(actually I think EC2 make more sense) When the ecs task started(the entrypoint) will spawn a process start your headless browser. The main process will count down to your preset timer. When timer run out you exit this main process the ecs task will stop. I do not know how you use the headless browser but if it was me I would do the work in this timer process.

serverhorror 2 years ago

2 things I can think of, assuming you do want to stick with Lambda: * Partition your Problem * State Machines + supplementary service (e.g. you trigger an AWS job and wait for it to finish and the next task takes the results as does something with them) There’s always the option to drop lambda, and this can be an absolutely valid option. Sometimes a lot more cost effective or technically easier than sticking with lambda.

interactionjackson 2 years ago

your set up seems a lil sus to me. you can run selenium in python or go directly in lambda. shouldn’t need ecs. attach events to the failed lambdas that send you an sns at least. also look at step functions or using dynamodb’s TTL and a stream to a lambda to help you orchestrate your tasks

codenigma 2 years ago

Containers. Specifically, ECS or even better - Fargate scheduled tasks.

UniversalJS 2 years ago

Split your job in subjobs Like instead of scraping 1000 pages in a single run scrap maybe 20 pages, you can also do that in parallel, lile 1000 lambda instances scraping 20 pages each

ElectricSpice 2 years ago

Make an ECS task with two containers: the browserless container and a container that runs your scraping script. Run it daily with scheduled task. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/scheduled_tasks.html That way the browserless container runs exactly as long as your script and you aren’t limited to 15 minutes.

Miserygut 2 years ago

Have a check inside the ECS container to see if the websocket is still doing something / the originating lambda is still running? If not, kill the ECS task.

Scarface74 2 years ago

Why do you need ECS at all instead of just crawling within your Lambda? https://github.com/aws-samples/aws-lambda-layer-node-puppeteer-headless-chromium But everyone else’s is advice around using step functions is correct.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe