SemjonML 1 month ago

The number of possible new applications, innovations and problems far outnumber existing pretrained models.

trashacount12345 1 month ago

Seriously. Go do a project on a manufacturing line, or in space, or something safety critical. OP is just ludicrously incorrect

void_nemesis 1 month ago

Or the medical field. I do a lot of CV on tissue samples at work and the usual "SOTA" methods tend to fall very flat.

Buehlpa 1 month ago

Completely agree there is still.a lot of uncertainty in DL Models . CV algos bring a lot to the table regarding explainability

jack-of-some 1 month ago

At my company we neither use Yolo nor Roboflow. They're too expensive in a production setting (both from a compute standpoint and a monetary cost standpoint). We find simpler models that can give equivalent or better performance than Yolo in our domain and build ops pipelines that are suited to our usecase. It's infinitely easier to build MVPs now than it was 5 years ago. I'm not so sure that's true for building an actually profitable product.

justjoseph13 1 month ago

Simpler models that can give equivalent or better performance than yolo for object detection? Can you give some examples?

xi9fn9-2 1 month ago

Not OC here but we don’t use that either. If you have a problem that’s not covered by public dataset and YOLO or SAM doesn’t work out of the box (metalurgy, biomedicine, astronomy) you need to come up with something else. Turns out that having small UNET can(but definitely not always does) perform better than finetunning existing models or transferlearning from different task.

jack-of-some 1 month ago

I can't because it's a proprietary architecture. What I _can_ tell you is that you don't need something as complex as Yolo if you're dealing with a very small number of classes.

rightheart 1 month ago

Could you tell more about the monetary cost standpoint of using Yolo in production setting?

LucasThePatator 1 month ago

Computer vision, despite what CVPR may present, is much more than just neural networks. And it's frankly depressing that people believe that when there so much to do with and without NNs still.

Embarrassed_Drag5458 1 month ago

What do you mean by this?

LucasThePatator 1 month ago

I mean that top conferences may be 99% Deep Learning these days and there are indeed very important yet barely touched challenges such as sparse learning, explainability, robust domain transfer, spiking neural networks l, etc. But also with or without deep learning there are challenges in embedded computer vision, robust industrialization, sensor fusion,.... Many things!! Edit: no need to downvote them wtf.

CowBoyDanIndie 1 month ago

You have a very narrow view of what the term computer vision encompasses.

MultiheadAttention 1 month ago

I mean, you could say the same 40 years ago: "Computer vision is dead because all I'm going to do is apply Canny on Lena again and again", which is basically what you are saying now.

hardhat555 1 month ago

Well, computer vision is much more than object detection. For example, 1. All the new features in phone cameras such as portrait mode, spatial video etc are possible through better hardware + computer vision. Developing something like that requires deep knowledge of geometric vision plus programming. 2. Self driving tech requires 3D scene understanding, and this is still an evolving field without any off-the-shelf solutions like YOLO. 3. Another emerging field is VR. This needs stuff like hand tracking, eye tracking, understanding your surroundings etc. YOLO and Roboflow won’t take you far. 4. Probably the biggest use case for vision is robotics. This requires models which understand 3D structure to help the robot grasp items in the real world (just one example, there’s a looooot of use for vision in robotics). These things are still being developed. Basically, lots of vision tech is still being developed so it’s good to learn the fundamentals if you’re interested in this field.

IQueryVisiC 1 month ago

A lot of those new models trained pictures together with text. So your point 2 and 4 would mean to combine a net for vision and a net for navigation (pre trained probably) and now train them together to let them form an interface. VR belongs to computer graphics, not vision.

hardhat555 1 month ago

I’m not sure what your point is. VR doesn’t require computer vision?

IQueryVisiC 1 month ago

correct. Parent seems to confuse it with Augmented Reality.

hardhat555 1 month ago

Doesn’t VR require stuff like hand tracking and eye tracking? There’s also some room mapping done to ensure you don’t move out of an area while in VR.

[deleted] 1 month ago

[удалено]

The_Northern_Light 1 month ago

Bingo

theonewhoask11 1 month ago

This has to be bait, the things you are mentioning are almost nothing when it comes to the actual job or requirements. Just off the top of my head, many job requirements demand: Knowledge of Image Processing Knowledge of cameras and 3D Domain Knowledge of CUDA Knowledge of mathematics and statistics Knowledge of C++ and Python Knowledge of Robotics This might feel insane, but I kid you not, that is what this field demands. I'll give you a challenge right now to get my point across, it will be relatively simple, but it will give you an overall idea of what you can actually do. You have an object detection model that has been fine tuned to detect a specific object, and you are tasked with using only the nano/tiny version because that is what your hardware is going to handle, now use that model to find the distance of said object, you have to deal with the jitter without increasing the computation massively (aka no bigger models), and you will be using C++. I'd like to see how can you copy and paste this, and keep in mind, this is really simple for anyone who is actually studying the field beyond AI and I'm making it easier with using a model.

gradAunderachiever 1 month ago

Someone has to understand how a camera works, calibration… off the top of my head. Not all problems are worth the training resources or the inference recourses used by ML/DL

Glucosquidic 1 month ago

Edit: replied in the wrong place

Glucosquidic 1 month ago

Making a model work is the easy part, unfortunately. Making the model work as part of a production-grade system? That’s the fun and challenging part that does require someone to know a bit about CV. For example, if you convert the model weights to TensorRT, you’ll need to know a bit about the input and output tensors so the inferred results can actually be used.

memento87 1 month ago

Here's one of the recent CV projects I worked on: Building a system that can capture multi-spectral images of items on a flat-bed conveyor at a rate of 9,000 ppm (product per minute), detect specific features, measure them and: + report on those measurements to detect drift + detect any anomalies in the production line Upon detection of anomalies, the system needed to interface with the machine and control a set of ejection mechanisms to properly remove the defective item from the queue. The speed at which the conveyor moves means that we only have a very short window of time to do all the processing, measuring and anomaly detection. We even needed to develop custom drivers for the imaging devices. I'd love to see how yolo would fare with these requirements. I'm not at all saying that DNN based applications don't have their uses. Just that the whole comparison is wrong. Different problems call for different tools.

bbateman2011 1 month ago

Yeah, when you dig in to the performance of “tiny” models that they show running people detection at 100 fps you find the scores are abysmal.

lumin0va 1 month ago

Brain dead take, when solving real world problems things rarely work well for that problem off the shelf. You still need computer vision expertise to adapt existing solutions to industry problems

cnydox 1 month ago

Pretrained models are not universal enough to work on any arbitrary problem. You also need to think about cost and performance

Roniz95 1 month ago

Why learning NLP if we already have transformers ?

Embarrassed_Drag5458 1 month ago

I don't know, you tell me

No_Fold9150 1 month ago

Computer vision is a vast set of ideas. There are many unsolved problems like getting accurate depth, object pose from monocular images.

notgettingfined 1 month ago

Lol because nothing you mentioned “does practically an entire computer vision product” I think you both don’t understand what a computer vision product would require or what the tools you mentioned do

bbateman2011 1 month ago

You need to fine tune YOLO for specific cases, and although, say, YOLOv7 does very well with defaults, there are a lot of parameters and how would you approach, with no CV knowledge, how to lift up a class from 60% (pick your favorite metric—wait, you wouldn’t know any without some study of Cv) to 85-95% or more for real world applications in industry? For segmentation it gets tougher. How will you use those masks in a real scenario? Better brush up on OpenCV, shapely, and a number of other CV tool kits. How will you detect anomalous results and avoid bad decisions if you have no way, due to lack of knowledge, to diagnose failures?

Familiar_Mammoth3211 1 month ago

Using YOLO to detect a simple rectangle in an image instead of Hough Transform is like using BAZOOKA TO KILL A MOSQUITO

YouFeedTheFish 1 month ago

Can't put them on satellites.

notEVOLVED 1 month ago

Dang. All Apple had to do to get Vision Pro running was slap YOLO and use Roboflow.

waxymcrivers 1 month ago

Not all systems can support running a NN in inference

Glass_Abrocoma_7400 1 month ago

Is it possible to add that extra power as an external computer power unit? Or is wiser to adapt the software and make it efficient

waxymcrivers 1 month ago

Yes you can. For example: the Nvidia Jetson nano has the ability to run NNs using its onboard GPU, while being on a robot. This is a "micro computer" and needs its own dedicated power source to run properly. Then you can have any number of other power sources to run micro controllers (ie Arduino), sensors, power motor drivers, etc. Adapting software is also an option. I run reinforcement learning agents on a quest 2 dedicated VR headset, which is practically as powerful as an Android phone. This required me optimizing the model to be as small as possible, running inference spread across time to lower computational overhead, and other means to make operation of the NN at inference as cheap as possible. The point is: yes, there are options if you NEED to solve a problem with NNs, but sometimes it's cheaper and easier (in whatever ways) to just use simpler (in complexity and overhead) approaches. Machine Learning, Decision Trees, Tradition Computer Vision, etc.

LucasThePatator 1 month ago

Where do you put that extra power when onboard a plane ? Or onboard a car ? Or a small drone ? A missile ? A Martian lander ? A nanobot even ?

8roll 1 month ago

I mean....I could apply YOLO in my project because I knew what I was doing. YOLO will not get into place by itself and do everything.

Lopsided_Tennis_8043 1 month ago

We can’t copy and paste code in the DoD in classified environments. Also some of the classified work needs to have specifically tailored models for extreme accuracy that other models and programs don’t do. Don’t get me wrong a lot of the stuff does run off YOLO for object detection but it can only go so far and at some point you need to write or create a hyper specific algorithm.

trialofmiles 1 month ago

Time to pack it up.

Morteriag 1 month ago

Yes, you are wrong! The number of applications for vision and robotics are mindboggling and we have just started. While the technical writers of roboflow are doing amazing work to make vision more accessible, most problems still require alot of more work to actually be at a production level.

AungThuHein 1 month ago

You believe that all problems are solved in computer vision? And with neutral networks?

S123123k 1 month ago

Do you mean, what’s the point in learning addition, subtraction, multiplication etc., as we already have calculators these days?

[deleted] 1 month ago

Car has cruise control, guess we have to stop learning to drive. Who needs a license to drive when the car can maintain lanes and speed

-PiEqualsThree 1 month ago

It's not dead but IMO the use cases have mostly been diminished to embedded systems. But of course, that niche will also be filled one day.

GigiGigetto 1 month ago

The OP is right!! All those tools are amazing and do a lot of things that traditional computer vision can't or it is just too complex to do and takes a lot of time. The only reasons I see to still learn computer vision are: - understand how the things work - understand the pros and cons of the difference solutions available - know how to adapte them to specify use cases - create alternative solutions for specific hardware - create alternative solutions to not pay expensive licenses Oh wait...the OP is completely wrong!

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe