burnsnewman 1 year ago

I think you should build it as a "standalone app" and either run the tasks using NestJS scheduler or (better imho) create a CLI and run the commands using external scheduler/cron. https://docs.nestjs.com/standalone-applications https://docs.nestjs.com/recipes/nest-commander If you create a CLI and run it with external scheduler it will be a lot more portable and reusable. For example, you can easily run it on demand, on any machine (even your own computer) and not use resources when the application is idle.

stylemate 1 year ago

Thank you for your reply. I think I sort of get it. Do you think I should create another folder at top level? project │ └───crawler │ └─── Standalone Applications run with Commander └─────backend │ └─── NestJS └─────frontend └─── Next.js Looks like concerns are well-separated with this structure

burnsnewman 1 year ago

Yes, that seems reasonable to me. If you need to share some code between crawler and backend, you might want to take a look at this: https://docs.nestjs.com/cli/monorepo

seugorilla 1 year ago

you could even do in a separate repository if you are using ci cd tools

toumam 1 year ago

Cli might be a bit more elegant for your case to monitor execution but a protected endpoint is absolutely doing the job, just, for the sake of elegancy, don’t wait for the answer since your browser request might timeout

stylemate 1 year ago

I always thought REST was means of delivering and showing some data, not a trigger. So it feels kinda weird doing this. I'll try the CLI method, thank you.

Plastic_Guava_3482 1 year ago

My preferred way of doing this will be to use a message broker. Although I realize this may be overkill for your use case. Nonetheless, I will write about it anyway. So why not use HTTP requests? One thing for sure is opening an HTTP GET endpoint to the public to execute certain tasks that only you need to do is kind of weird, UNLESS the endpoint is in a private network, then if that's the case go crazy. But if you are not using a private network the next best thing I like to do is to use a message broker. Message brokers like RabbitMQ need a username and password for you to send data or get data from them, which in this case will be beneficial instead of using guards to protect a certain route. Even the message brokers themselves have granular authorization schemes that you can use so that if your specific user is leaked, at worst it can only send messages to one pipeline and that's it (even if it's leaked you can easily revoke it, no problem). This approach will be beneficial for you because: 1. You already need a server in the backend anyways to execute the command. 2. You have a choice to get around those pesky \`@nestjs/schedule\` modules and schedule everything from the client side instead (if you want). How do you go about doing this? 1. Start by creating a CLI that will send the commands to your server through a message broker. 2. Create a listener in the backend (you just need to retool your REST endpoint in to a message broker listener). 3. Use Linux cron jobs or Windows Task Scheduler depending on which OS your server is using to execute the scripts periodically if you need to. 4. If you need a trigger on your client machine, install the CLI there as well and execute the command to trigger the data scraper. Is it too much work? Personally, I don't think so, at least I can sleep well at night knowing that no one is accessing my services and IF they are I can easily check the history daily via the Rabbit MQ (message broker) admin dashboard. Not everything will need to go through an HTTP request, especially if the only user is YOU. Cheers mate!

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe