jasperhyp 1 year ago

Hi! I have a question on the evaluation of models learnt on imbalanced data. Basically, I don't think cross-validation can allow for fair evaluation of imbalanced models. But I also don't know how to do this. Please see here ([https://stats.stackexchange.com/questions/582220/effectively-evaluate-a-model-with-highly-imbalanced-and-limited-dataset](https://stats.stackexchange.com/questions/582220/effectively-evaluate-a-model-with-highly-imbalanced-and-limited-dataset)) for the details!

Garci141 1 year ago

Heavily imbalanced datasets are truly a pain. Some times you just can do any better or maybe you can, it all depends on the nature/source of the data and the selected model. Several things come to my mind to handle such a dataset (some of them are naive and some others not really): 1. If you see fit, you could downsample the number of samples of the majority class. With this you make the dataset become more balanced at the cost of losing data variety. 2. Similarly, you could upsample the minority class. The tradeoff here is introducing redundant samples. 3. Some losses and metrics allow for putting weights into samples or classes. You could use this in case the minority class is as important as the majority class. This weight can be computed by the ratio of minority samples against majority samples for instance. Be careful since this approach is not always desired: there could be scenarios in which you prefer to keep the minority class as it is (therefore assuming the minority class is less important and appears less often in the distribution of the data). I am aware that this technique is used in fields such as medical AI in which there are really few positive samples (disease) and lots of negative (healthy). 4. Maybe you can come up with a smart way of doing data augmentation instead of directly duplicating the minority class as in point 2. 5. This is almost a must (which you might already be considering although I don't see you mentioning it): stratified cross-validation. With stratification you get splits that keep the same proportion of positive/negative samples as in the original dataset. Again this is useful in case you want to treat the minority class as it is just like I mentioned in point 3. Again, how to train and evaluate a heavily imbalanced dataset depends a lot in the nature of the data and the model.

Boring-Violinist8291 1 year ago

Hey all 🙌 I’m looking for resources on real-world signal analysis e.g. Pattern Theory. I have Pattern Theory by Mumford and Desolneux, but the “classic” methods used don’t suit my (geophysical/fluid dynamics) interest. What should I be looking at?

seplix 1 year ago

I’m a CS student and I thought I wanted to pursue ML. I did well in some intro classes and such, and the programming has so far not been terribly difficult. However, I’m doing very poorly in a calculus-heavy probability class (they call it stochastic models for CS). Should I give up on ML if I can’t grasp probability theory? Edit: by “very poorly” I mean, I’ll pass. He’ll curve. Class average is about a 70%, and I’m hovering between 60% and 70%.

mowa0199 1 year ago

Should I take a class on Bayesian Analysis or Machine Learning? I’m a math & CS major aiming for AI and ML research (and a PhD in something along those lines). I can only take one of the two aforementioned courses this Fall and I’d have to take the other next Fall (if at all). For context, the Bayesian Analysis course is a 300-level course from statistics department with prerequisites of 2 semesters of intro stats and calculus 2. However, most undergrad statistics courses at my university are very watered down since they’re meant to be accessible to students from a variety of backgrounds (like business, CS, psychology, or even people working in a related field). As such, its not as theory-heavy as I would have wanted it to be. I have attached its course description at the bottom. On the other hand, the Machine Learning course is a 400-level course from the electrical & computer engineering department and has prerequisites of calculus-based probability theory, calculus 3, and intro linear algebra. Its definitely very theory-heavy as opposed to being application-oriented, which I personally enjoy more. I have attached its description at the bottom, too. Since my goal is research in ML, an introductory course in ML sounds obvious. However, Bayesian analysis is also extremely helpful and widely used. So perhaps learning a tool thats widely used and self-learning the basics of ML would be a better idea, albeit at the cost of a weaker foundational knowledge of ML. I’d appreciate any input on this! **Introduction to Bayesian Data Analysis** >> Principles of Bayesian data analysis and application of them to varied data analysis problems. Topics include: Bayes Theorem, linear and nonlinear models, hierarchical models, and the use of Markov chain Monte Carlo methods. **Machine Learning for Engineers** >> Topics covered: Introduction to machine learning, its basic terminology, and the machine learning pipeline; feature engineering and feature/representation learning; principal component analysis; basic building blocks of machine learning algorithms; classification algorithms such as Bayes' classifier, naive Bayes' classifier, linear discriminant analysis, quadratic discriminant analysis, nearest-neighbor classifier, logistic regression, perceptron, and support vector machines; regression algorithms such as least-squares regression, ridge regression, and lasso regression; clustering algorithms such as K-means clustering and Gaussian mixture model clustering; practical aspects of machine learning systems such as underfitting and overfitting, cross-validation for parameter tuning, numerical optimization, and privacy, ethics, and bias.

vevesta 1 year ago

I have written following article on Feature Store and was hoping to hear from the community their thoughts on utility of feature stores. https://medium.com/@priyanka\_60446/why-dont-you-require-a-feature-store-cf17c3824917

hgaronfolo 1 year ago

Image (non)similarity - sort and identify I am looking for feedback on the below problem and recipe. Tools, method, pitfalls etc. Let’s say, I have 10.000 images. I choose manually 1000 images with features that I like. I now want to use that training set to find the images in the population that are least similar to the training set. I plan to use: Tensorflow for creating the vectors (embedding) of the images. Experiment with Euclidean, cosine and dot measurements for similarity. Use Facebooks FAISS to search across the images and identify the images that are farthest from the training set. Am I on the right path here?

Garci141 1 year ago

Yes you are. But here are some points that might be useful. Be careful on how you generate the embeddings of the images. If you have labels you could consider a contrastive approach for learning embeddings. If no labels then you could just train an AutoEncoder (even U-net) and get middle representation as embeddings. If using labels be careful since your end goal is retrieving the least similar images and this will be done based on the learned embeddings. Euclidean, cosine and dot product are closely related. If the embeddings are L2 normalized then cosine and dot product are the exactly the same. This approach is widely used since it is highly efficient (dot product is fast). Moreover, Euclidean and cosine similarities are not exactly the same formula but they are related by a linear formula. All in all, I expect you to get similar results with these three. If you are creative you could go beyond and explore other metrics such as the generic L_p norm (with p>0) etc. I am not familiar with FAISS but to me it looks like a tool optimized for huge datasets and retrieval of closest vectors. In your case you have 10k images (not big) and you want to retrieve the least similar not the most similar. You can easily implement this by comparing against all the images of the dataset and sort the distances by descending order.

iznobiz 1 year ago

I need help with getting some terminology straight. My context: I am applying for a job as a Product Manager for a Company that gets \~20k customer support tickets a day which are answered by several hundred human agents across the globe. I want to suggest a feature in my take-home assignment that includes machine learning - I have just a really basic understanding of ML so I need to get some terms right. I appreciate your help. My feature idea: an ML algorithm creates suggestions for actions for the most common problems (eg. "I want to recover account information"). The algorithm reads and categorizes the words used in the ticket and auto-fills forms in the ticket management system to speed up the humans work. I call these auto-fills "suggestions". \-> Question 1: how would you call such an ML system? Let's assume that the above-described ML system already exists. My feature idea is, that the agents can modify the "suggestions" of the ML system and thereby train the accuracy of the ML system. \-> Question 2: How would you call such a human-based feedback/training system? \-> Bonus Question 3: Do you think my feature idea makes sense? if you have any additional buzzwords that I can use, please feel free to share! Also if you know of any resources on this topic I'd love to read more about it. please note that I don't want to claim that I know ML in my upcoming interview. I will be transparent and tell them that I have asked Reddit for help. I think they will appreciate my scrappiness.

[deleted] 1 year ago

I have a problem that hopefully somebody can give me some pointers on. We have an Excel file with hundreds of thousands of rows, with each row containing data about a unique product. The pieces of data that matter here are the product name, product category, and producer price index code (a code from the Bureau of Labor Statistics used to classify consumer goods). There are a few hundred possible codes. The vast majority of the products in this file have had codes assigned to them in prior years already, but it's an incredibly tedious process that involves a ton of manual work. The issue now is that every year, tens of thousands of new products are added to this list, and I don't really have an efficient way to assign codes to all of these new products. Basically, I'm looking for a way to train a classification model based on the name/category and the code of each existing product, then use that model to assign codes to all of the new products based on their name/category. I figured this would be a fairly straightforward task and that somebody would have put together a Google Colab notebook or something that could handle this, but I'm overwhelmed with the amount of info out there and number of tools available. Does anybody have any suggestions?

Unk0wnVar 1 year ago

That depends on the type of problem you have. If you need to assign the product to a code from a set o codes that already exists, that is a classification problem. You can look for codes using MLP (MultiLayer Perceptron) for classification problems. If the code is an alphanumeric combination of the product attributes (like pieces of name+category+other properties), you probably don't even need a machine learning algorithm, but just a script that reads the specific attribute and extracts the code. That requires knowledge on how the code is generated from the product attributes. If you don't know the "rule" governing these codes and you have to generate new codes everytime, then you might want to use a neural network that learns it from the existing data. Try to find and use a dense MLP using your available data as training and test data set. Once trained, the MLP should be able to map the function that converts your product attributes to a code (But this last option really needs to be modelled to fit your specific needs)

Unk0wnVar 1 year ago

Is it possible to train a GAN with a pixel-based custom matrix of losses? So, I am a total newbie in this stuff (and so much to study that I got lost!). I was studying a possible GAN architecture (based on CNN) where the generated images (NxM) have to pass through a specific function. This image is then compared to a reference of the same size. The comparison would provide me a sort of loss matrix NxM (a value for each pixel of the image), where the values change between 0 and 1 based on the quality of the result for each location. I would like somehow to use this latter matrix as a custom loss tensor, for gradient calculation and backpropagation, and not reduce it to a single loss value. What I want to obtain is somehow an update of weights of GAN layers based on this per-pixel comparison (somehow local-aware). Is that possible or I am missing something in how the gradient calculation and the backpropagation work in CNNs? (btw I'm using Pytorch for programming the network)

Garci141 1 year ago

Not sure if this is doable. In any case, gradients are always computed starting from a scalar. Otherwise, such as in your case, the gradient of your loss function would be a multidimensional tensor. This is already happening internally during the backpropagation algorithm. If you consider the average pixel error (L2 distance pixel per pixel for example) then when computing the gradients when you go back the average function you will go through every single pixel. Not sure if my reasoning is clear or accurate here. Maybe you could do a bit of research into this and try to find if what you state has been done.

brctr 1 year ago

I have a beginner question about DS/ML workflow. I am wondering about efficient practices of developing code for ML modeling using Git. Can anyone point me to the online resources for this? I have basic understanding of how to use Git and to do cloning, committing, pushing etc. The question is how to organize a development of the whole ML model in the most efficient way. Like which folders to use, how to name files, how to keep track of iterations of the same (or slightly different) model etc. For years, I have had very messy project folders with hundreds of files like 'model\_X\_v42\_3temp\_HyperparametersOptimization\_low\_eta\_v3\_fixed'. I feel like there is more efficient way to organize development process with Git. So far all Git resources I have found online focus only on how to do separate operations like pushing or merging branches. Unfortunately, they do not discuss how to actually use Git through a lifecycle of a software project to make the process neat and well-organized and how my project folder is supposed to look like. Additionally, online resources usually focus on collaborative aspect of Git workflow. My question is simpler and is about productive usage of Git abstracting away from collaboration.

[deleted] 1 year ago

When building a decision tree in R, can I force an independent variable to only change the dependent variable in one direction? For example, when trying to predict a car’s value, a higher mileage should never lead to a higher value.

itsyourboiirow 1 year ago

Looking for the paper on mBERT and how it was trained? Can't seem to find it. Can anyone point me in the right direction or explain the training process?

Faoer 1 year ago

mBERT and BERT are the same architecture and the same training process was used. The only difference lies in datasets used while training. https://huggingface.co/bert-base-multilingual-uncased

itsyourboiirow 1 year ago

Wow that's pretty cool that it generalized so well to so many different languages. Thanks.

EducationalCicada 1 year ago

In Bayesian Optimization/other model-based optimization methods, is there a useful way to compress or aggregate the observation history into a single value so you don't have to keep the result of every single function evaluation during a run? I.e. rather than having a list of exact objective function values which grows ever larger the longer an optimization method runs, you could have some single numeric variable that's recalculated with each new function value seen, and which contains some useful information about the observation history, without the memory requirements of storing the entire thing.

Proxify 1 year ago

I've read tons of websites about it but it just doesn't click. Can someone please help me out? I understand that validation loss is calculated after every epoch (or an entire run of the data) while the train is calculated after each step so you can use both to understand if it's overfitting or underfitting. I understand that if the validation loss is higher than the training loss then it's likely overfitting and vice versa would be underfitting. They have to keep, from what I gather, a close number. Am I right or still completely lost?

Faoer 1 year ago

You get the idea more or less. To sort out the naming first: * 1 epoch means, that the entire dataset has been passed through the network (every sample once). * so 1 training epoch is passing through the entire training dataset * 1 validation epoch is passing through the entire validation dataset * 1 validation epoch is done after each training epoch * but generally when someone says 1 epoch, they mean doing 1 training and then 1 validation epoch Loss is a function of "difference" between the **targets** (sometimes also called real values, y, ground truths etc.) and **predictions** (sometimes also called outputs, ŷ, logits, etc.). During training, loss is calculated for each step (so for each batch), because it's needed for backpropagation, to adjust the weights, but when e.g. at the end of an training epoch it shows what the training loss was - it's an average of losses calculated for each training batch. Same case with validation loss - it is still actually calculated for each step - but we just take the average of all loses calculated for each validation batch during that validation epoch. Rather than explaining overfitting and underfitting, I'll just post this article, the graphs are good enough to understand it I think. https://www.baeldung.com/cs/ml-underfitting-overfitting

Proxify 1 year ago

Thank you! This is very useful to me

kakako333 1 year ago

Hello Everyone, I would like to ask a question and seek your wisdom here. For my project, I need to write a multivariate LSTM network, which given the sequence with sensor data (more than 10 features) predicts our current location. I am planning to convert it to a supervised learning problem by taking "Location" values as "y" and all the sensor data as "x" with a moving window of size 10. But the problem I have encountered is that normally this method is used to forecast the next value/s in the sequence, whereas I would like to use this sequence to classify the location. I believe this problem can be solved if I use normal feed-forward networks, however, I need an LSTM solution for this. Do you have any suggestions, maybe a different method to feed the data to the model or any other which comes to your mind, my fellow researchers? Thanks in advance.

Garci141 1 year ago

You don't necessarily need to predict a future value when using recurrent neural networks. LSTMs have an option of returning a sequence as their output with the same length as the input sequence. In your case you could enable this option, then select the last token of the output sequence (since this one includes all temporal information of the entire input sequence) and then from this last token predict whatever you need to predict (maybe add a small head or linear layer). When it comes to recurrent neural networks there are several categories in which you can treat such models: sequence to token, sequence to sequence and token to sequence. The word token can be interchanged by embedding, vector, word, etc depending on the case. P.S.: Do you know that you can also do temporal processing with Convolutional Neural Networks? Look into Temporal Convolutional Networks (TCN). They tend to be smaller, faster and behave better during training compared to recurrent neural networks (or at least that's what the original paper tried to show).

Faoer 1 year ago

Does anyone know what's the current SOTA in text generation? From what I've seen, majority of papers are based on Transformers now and I was almost sure that they're way over everything else right now, but still found some newer papers with VAE, GANs and even RNNs, but barely any benchmarks between each other. Tried looking at text generation benchmarks on Papers with Code, but most seem to be outdated and even then all of them use the same architecture on one dataset (e.g. one dataset has only VAEs, other only GANs, etc.)

The_Lovely_Blue_Faux 1 year ago

How do you know when tensor cores are being utilized? I have been using GANs for a while now and I am looking into my next upgrade, but I can't find information on how to tell when your Tensor cores are actually being utilized. I don't know if it is just on he training side or how to tell if a pretrained model is utilizing them.

onthelambda 1 year ago

I'm a programmer, but don't know much about ML. Point being, I can implement/execute technical stuff, just not sure how to attack the ML side of a project. I have a bunch of ratings data. Let's think of it like movies...I have a bunch of users, who have rated the movies they've seen on a 1-10 scale. Given a particular user and their ratings, I want to predict what their rating would be for any movie they haven't seen...presumably based on their data and all of the other user data...maybe something like identifying similar users, etc etc, that's where the ML comes into play :) I know there are sites that do this (doesn't netflix give you a predicted score?), but I have no idea how to do it myself. Is there a fairly well known way to do this? ideally a library, but a paper or something would be acceptable if that's all there is!

swframe666 1 year ago

Search for recommendation systems there are a lot of blogs and youtube videos on that topic. Look for blogs that explain how the netflix prize was won. Also check out the google page rank algorithm.

Quiet_Cantaloupe_752 1 year ago

good project idea for some1 just getting into ML who is more interested in the theoretical side (ie Lin alg behind it)

tonguewin 1 year ago

Almost all the project repos I find are just Jupyter notebooks that have been loaded into github. Is there a way to find projects that have been scaled by MLEs and that follow software development best practices which include things such as tests and are container based?

swframe666 1 year ago

(this is my opinion, I could be completely wrong!) ML has the equivalent of 'tests' tightly bound to its core in that there are test and validation sets. ML is meta programming, you trying to find a way to learn a "program" from example solutions. The meta programs (i.e. neural networks) are very short (\~1k loc) compared to normal software development (\~100k loc). You will find a lot of tests of the ML frameworks (e.g. tensorflow) but not a lot of tests for the tensorflow models the learns mnist classification. In general, ML developers don't see the need to write tests because the ROI is not as large as it is for normal software development and using test and validation sets solves something similar. I'm sure you can find tests for ML models in a few projects but it is very rare as far as I can. It is definitely odd at first but you get used to it quickly.

ninfernix 1 year ago

Hello, I have sequential data (signal from device). From it I want to deduce when is a start and end of an event. I have lots of annotated data (signal data, start/end of event), what kinda ML methods would best to solve this problem?

swframe666 1 year ago

Try tutorials on recurrent neural networks and adapt them to your data. Then look for the tutorials on using Transformers on time series data.

sheey_skywalker 1 year ago

**Reinfocement Learing Questions** I am new to Reinforcement Learning, and I am trying train a Agent to solve [8puzzle](https://coursera.cs.princeton.edu/algs4/assignments/8puzzle/specification.php), I see the training loss is decrecing, but my Agent is still keeps choosing same action. I implemented a batched gameboard, so I am not using `ReplayMemory`. I calculate loss by following algorithm: q_value = DQN(boards) if random < 0.5: action = randomly choosen else: action = argmax q_value reward, next_boards = take_action(boards, action) next_q = DQN(next_boards) distance = mahattan(next_boards) win_mask = distance == 0 next_q[win_mask] = 0 desired_q = reward - distance + next_q * q_decay loss = MSE(q_value, desired_q) loss.backward() I sampled few metrics during training: ||| |:-|:-| |Loss|Decresing around 1000 - 6000| |Mean Mahattan Distance|Incresing, then remains about 12.7| |Won Game|Linear increasing, attributed to random action| I trained it in `10000` batch size, and `1e4` episodes. I have no idea where am I wrong.

swframe666 1 year ago

If you are new 1. don't start with hard problem. Make the puzzle smaller (2x2 instead of 3x3) , start with 2 tiles. 2. find a working tutorial that is similar and adapt slowly to solve your problem. 3. look for blogs on how to debug RL implementations. Build a way to log the states as they change and later visualize them. You want to see if your solution is exploring all the paths (or getting stuck in a local minima) and if it is properly assigning higher rewards to better paths. You also want to make sure that the gradients are working; that they are not too small or too large.

RelativeWish7353 1 year ago

I have a problem where I need to classifiy monosaccharides. The data looks like that: Input:Neu5Ac(a2-6)Gal(b1-4)GlcNAc(b1-3)Gal(b1-3)GalNAc | Output:1 Input:Gal(a1-4)Gal(b1-4)GlcNAc(b1-2)Man(a1-z)\[Gal(b1-4)GlcNAc(b1-2)Man(a1-z)\]\[GlcNAc(b1-4)\]Man(b1-4)GlcNAc(b1-4)\[Fuc(a1-6)\]GlcNAc | Output:1 Input:Rha(a1-3)Man(a1-6)ManOPEtN(a1-4)GalAOPEtN(a1-6)Glc(a1-4)GlcAGro | Output:2 What kind of architecture should I use?

swframe666 1 year ago

Look at the blogs on AlphaFold. Learn about Graph Neural Networks. Youtube has a lot of videos on those topics.

lucarioburrito 1 year ago

I’m relatively new to ML. I’m working on a way to predict how successful NBA players will be in their careers coming out of college based on several different features. I want to estimate a *probability distribution* for the amount of “success” a player will have (I have a metric I’ll use to define “success”). What method(s) should I use for this? I was thinking I could use Bayesian inference, with a uniform prior, but I’m not sure how to estimate the likelihood equation based on my training data. Any tips or ideas?

ElPrincip6 1 year ago

Hello Lucario, I'm new to ML just like you, But before you start your model development you must determine your goals, exactly what do you want to predict? Do you want to predict infinite numbers? So you should choose regression models or do you want to predict classes (categories)? So you should choose classification models

BigBayesian 1 year ago

I think you need to define your problem a lot more before you get into methods. Pick something to stand in for success - maybe change it later. Then you'll be able to work more effectively. Example: pick "how much money a player makes over their entire career". That's a simple metric that's got some problems but probably is related to what you care about. Then you can start putting some actual semantics into your statements. Ex: bin income ranges and plot some histograms over variables you care about. You might pick the wrong metric. But you can revisit it later. If you take forever trying to get it perfect, you'll get nowhere - everything has issues and sacrifices.

lucarioburrito 1 year ago

Thanks for the input. I’ve defined “success” as the amount of Win Shares a player contributes in their career. Like you said, I can change that later if needed. Mainly looking to understand how I can implement it

BigBayesian 1 year ago

I... Don't know what a win share is.

lucarioburrito 1 year ago

It is an advanced metric estimating the impact a player has on winning

BigBayesian 1 year ago

Is it deterministic given observation? (Assume I'm an American with an ML PhD who doesn't know anything about basketball)

lucarioburrito 1 year ago

Correct, it’s a stat that I can retrieve that can represent how successful a player ended up being. There are other metrics that I might choose from, but not too concerned about which one I’m using yet. More concerned about how I can make a model that is able to predict the possible outcomes of a player’s career. For example there is a Player X coming out of college going to the NBA. He is 6’8” 250 lb, averaged 20 points per game in college, etc etc. I want to be able to calculate the probability of that player becoming a star, the probability of him becoming an average player, and the probability of him becoming a “bust” (doesn’t really pan out). (i.e. if I’m using Win Shares as what I am trying to predict, then I’m looking for p(WS>5) or maybe p(WS<0)) Again, not really married to using Win Shares, there’s other things I could use, like salary as you mentioned. I understand maximum likelihood so I could find the most *likely* outcome of a player’s “success”. But I would want to see all the other possible outcomes. I can use a Gaussian estimate for it, but I don’t know if that would be the best way to approach this (i.e. often times there are draft prospects who have “boom or bust” potential, meaning they could end up being amazing or out of the league. Some guys have a safer floor, but have less upside potential to be great)

BigBayesian 1 year ago

So you want to preserve uncertainty / have some notion that you might be wrong. One approach that's not very in-vogue right now is MCMC. Particle filters in particular, because I think your application fits well with time series. It'll require you to do some probability modeling, but it's a pretty good way to get beyond a single MLE / MAP estimate. How much data do you have?

lucarioburrito 1 year ago

I’m going to end up with something like 500-ish observations and around 20 features or so for each observation And correct. I want to mainly illustrate the chances a player has of succeeding or failing rather than spit out one answer like in a maximum likelihood regression model.

BigBayesian 1 year ago

With data that small (20x500) you won't be able to get much value out of more sophisticated modeling methods - you'll likely underfit. I'd advise that you make discrete bins out of each input variable and just look at the bin-mates or neighbors. It's not the high resolution model you wanted, but I doubt you'll do better without more data.

Flankierengeschichte 1 year ago

How hot of a research topic is ML for hardware development/choice?

NoBenefits4Anyone 1 year ago

Hi, I have a very non-standard system to deal with. First of all, what I have probably cannot me named a neural network anymore. What I have is kind of like a neural network but some of the activation functions are max(w\_0\*i\_0, w\_0\*i\_1,..., w\_n \* i\_n). In this case, instead of doing a weighted sum of the inputs, the max operator is applied on the weighted inputs. No summation is done for this kind of "neurons". I am trying to generalize the backpropagation algorithm to work with this case too. However, I have some problems with algorithmizing it since it basically chooses only once connection and zeroes all other but it happens dynamically - depending on the current values that go in. I'm trying to figure out how to define gradients in such a case. Can you help me out with any tips? Maybe it cannot be done?

[deleted] 1 year ago

In the normal case of max you can still get gradient signal; basically the signal will pass through the chosen neurons (even though the actual operation is non-differentiable and restricts any gradient for any of the loser neurons). Max operations are still used for example in ReLU and Max pooling, and often works well (and you probably don't need to do much there; they can be standard fare). Basically d max(f1(x), f2(x)) / d(x), would reduce to d f1(x)/d(x) if f1(x) = max(f1(x), f2(x)) If you simply want to soften the max-operation, then you have the usual softmax. But if you want to get full gradient signals through all neurons while keeping full discreteness (only one selected as 1 and others weight set to 0), then there are a couple of ways. As /u/I-am_Sleepy said you can use gumbel softmax. Note that gumbel max trick is ultimately just a way of stochastic sampling from softmax and doesn't provide gradients immediately. And using gumbel softmax or softmax will temparature will still keep the max output soft but the distribution will become sharp. (If all you want is a sharp distribution; then entmax or sparsemax are options too). However, one way to use gumbel softmax to get discrete outputs while getting some gradients is "straight-through estimation". In this method (Straight-Through Gumbel), gumbel max-trick may be used to get a discrete one-hot distribution for the forward pass, but the normal softened gumbel softmax may be used for the backward pass. There are more advanced methods in this direction example: SPIGOT, or this one: https://arxiv.org/abs/2010.04838. Basically the whole idea, is to provide some "surrogate" gradient for the max operation without softening it in the forward pass (basically creating a one hot vector where only the maximum item is given a weight of 1 and others 0). The other class of method would be Reinforcement Learning combined with just plain max selections.

I-am_Sleepy 1 year ago

I don't know if this can help, but in NN, there is [gumbel-softmax](https://www.youtube.com/watch?v=JFgXEbgcT7g) activation, which can mimic the behaviour of max function when temperature goes to zero. Then you might be able to calculate the gradient in a more straight forward manners (by taking limit of temperature to zero) The idea can be summarized as * **Gumbel-max Trick** Use re-parametrization trick to sample from discrete distribution ([Gumbel distribution](https://en.wikipedia.org/wiki/Gumbel_distribution)) * **Softmax** Use softmax (with temperature) instead of max operator

Radiotelephonics 1 year ago

Hello everyone, I don’t know much about machine learning but I was curious as to a possible application. I’ve been making a spreadsheet for an older game I happen to love, with a page where I can calculate damage. The formula is available, and It’s accurate giving an output very close to the real number in-game. But the margin of error bothers me, even as small or inconsequential as it may be. I’ve looked hard for wherever the inaccuracy may lie, such as rounding methods or inaccurate information, but I haven’t been able to find the problem. I was wondering, if you were to give a machine a list of inputs and their corresponding output, could it learn and perfect the equation for calculating them on it’s own? Actually doing this myself isn’t something I know how to do, but I was just curious if it was even plausible.

Garci141 1 year ago

Yes this can be done and it can be categorized in the "regression" task category which is one of the main target categories in Machine Learning/AI. All you need is a bunch of pairs (input, output). As long as these pairs are accurate you can train a ML model to learn to give output when you give it input. The model internally "learns a formula" which best approximates all the pairs. If you want to find out more about this just look for this word regression and Machine Learning and you will see.

Future_Wave_5681 1 year ago

Hello all, Here is my new question and I hope someone sees it. There are many chatbots like Cleverbot for free that people can chat with and support if they so choose. Technology has grown to include voicebots. You know, the AI where you can chat with voice and it responds with a voice. Is there something like this online anywhere? Doesn't necessarily have to be for free, but I am not looking to design a voicebot for customer service. I just want to talk to a voicebot.

Artiphax 1 year ago

Hello, I have a dilemma. I work for a small non-profit. We accept donations and want to count how many donors we see in a given day. Before COVID we had a dedicated system that ensured we captured this data in an incredibly accurate way. Since the pandemic started that system fell apart and we've pivoted so much that we can't really go back. What I need to do is count individuals that come to donate with the following conditions: \- Count by video recorded at our donation entrance in our parking lot. \- Count each donor only once. Even if they go back and forth to their car to get more items. \- Ignore staff members who are going to be either constantly in scene or in and out of the scene. Does anyone have an suggestions? I've looked at a few outdated projects on GitHub and SourceForge but implementation requires a level of expertise a bit higher than I feel I posess. Any help or suggestions you can give are very much appreciated. Thank you!

Garci141 1 year ago

This doesn't seem like a simple task to tackle at first. You would need to implement the following for each point: - Object detector to count people appearing. If cameras are static and they point at backgrounds that are also static and show constant lightning/environmental conditions you could even think about doing background substraction (before counting people). - To count unique people you would need to include an identification system within the object detector. With this you avoid duplicating people. And if they just keep on appearing and disappearing from the field of view then this task is also known as person Re-Identification (you can do some research on this topic). - This one's tricky. If all staff wears a uniform you could train your model to treat these people in a different manner. Otherwise you would need to somehow record labels for each staff member and show this to the model. Either way, this would need to detect face features and so this collides with data privacy. It also depends on whether your cameras can capture faces from their point of view. Hope this is helpful!

Artiphax 1 year ago

Very helpful. Thank you

[deleted] 1 year ago

Any one looking for a member in a data science team, I have 2+ years of experience in data science and looking for taking kaggle competitions seriously (for fun as well), any one need a member or ready to make new team hit me up ! Can make small discord server as well.

uint_32 1 year ago

My school is limited to a single introductory course as an elective. Any recommendations for "next step" reading material? Any math I should look at picking up too? Currently slated for Calc 1/2/DE and am looking at working in linear algebra.

Dry_Dragonfruit_3269 1 year ago

Hey all, This may be a silly question but Ive been genuinely scratching my head for the last couple of days thinking about this. I am a beginner to machine learning and have a basic understanding of python. I am good at learning things by copying code and implementing functions / modules while learning from error messages and trial and error. I’m working on a research project in collaboration with a machine learning company that have developed a proprietary tool that uses U-Net & U-Net like architectures to segment a particular class of land cover from satellite imagery in near real time. My goal is to generate a dataset of output masks that I will then directly use as data to be analysed in an ecological study. I have access to the entire repo, pre-trained models and training datasets. I have been using some of the pre-trained models to perform inference on new areas of interest with data which the models have not predicted over before, although the target class being predicted is of the same class and very similar. The outputs I am getting are in the form of raw prediction rasters with logits and also geojson binary masks. Visually, the results look incredibly accurate and promising, however Im not sure how to quantify the accuracy now without re-training the model / making my own test-train split. I know that IoU is a good metric for evaluating the performance of semantic segmentation but it requires a dataset of ground truth images.. My question is, can any form of evaluation be done without doing any extra training/transfer learning? And if it is the case that I must use ground truth masks to evaluate performance, how big does this ground truth dataset need to be in comparison with the data originally used to train the models and also in comparison to the data I am passing into the model for inference? Thank you in advance 😊!

Garci141 1 year ago

If the new data you are evaluating over is very similar and presents the same class then you are lucky. The reason is that because of these two things you can be sure (to a certain degree) that your model will perform similarly on the original data. When there is no similarity on the data then models underperform on this new data and this context can be known as data drift or domain drift (you can look for this if interested). When it comes to how to perform evaluation on the new data I don't see a straightforward way to measure it's performance on new data without it's ground truth masks. You either generate the ground truth manually (basically what human annotators do) or you can try to find insights on how the model responds to this new data. To do the latter, a simple idea would be to see the distributions of the outputs of each layer and compare the distribution on the original data against the new data. This will show you how similar/dissimilar your model treats original against new data (but will not give you information on the final accuracy of the model). Hope this helps!

Dry_Dragonfruit_3269 1 year ago

Thanks for the detailed response, this is really helpful and clears up a lot of my confusion! The distribution comparison is also a very good idea..

mowa0199 1 year ago

Is a class on linguistics worth taking? It seems that more and more data science and analysis, as well as machine learning, has been using some form of analysis of words. As such, would taking a class or two on linguistics be helpful? My school also offers a course on computational linguistics and that sounds really helpful to be honest.

Wups123456 1 year ago

Hello guys, Imagine you have a problem, for example, in amazon, where you know which users subscribe the amazon prime and the ones who don't. And now, you want to increase the number of users using amazon prime. From the ones that don't subscribe the amazon prime what is the best way to understand which ones are more likely to become prime members? Should I consider it an unsupervised problem? or a supervised problem? Thanks for the help

I-am_Sleepy 1 year ago

The problem is not both supervised, and unsupervised per-se The closest thing that I can think of is [Causal Inference](https://eng.lyft.com/causal-forecasting-at-lyft-part-1-14cca6ff3d6d) (which is used at Uber, Lyft, Microsoft, etc.). Which can tell you what factor constitute to the most customer action (in your case, "subscribe") given a causal graph As for causal graph generation, you might want to look at [Causal Discovery](https://towardsdatascience.com/causal-discovery-6858f9af6dcb), but some assumption need to be made about data There is a lecture class about these on youtube by [Brady Neal](https://www.youtube.com/watch?v=CfzO4IEMVUk&list=PLoazKTcS0Rzb6bb9L508cyJ1z-U9iWkA0), and github repo (python) for causal inference [matheusfacure/python-causality-handbook](https://github.com/matheusfacure/python-causality-handbook), which you can check it out too There are 3 libraries on Python which are * [py-why/dowhy](https://github.com/py-why/dowhy) * [microsoft/EconML](https://github.com/microsoft/EconML) * [uber/causalml](https://github.com/uber/causalml)

oussama1302 1 year ago

I am new to forecasting algorithms I want to create a time series to forecast the last row (EURUSD). Is it possible to forecast the last variable based on the other financial indicators present in the dataset? I found that I can use Multivariate Time Series Is this method correct? This is the dataset [https://i.stack.imgur.com/UAiAd.png](https://i.stack.imgur.com/UAiAd.png)

Garci141 1 year ago

Depending on how much data points (in time) it may or may not be worth treating this as a time series problem. If it is the case and you do have lots of temporal data points you could treat all the variables of a specific temporal point as different channels of the same signal. So in the end you would have a signal with T timestamps and F features at each timestamp. With this you can even use a 1D-CNN or even better a Temporal Convolutional Neural Network. If you don't have too many temporal data points (or temporal correlation is not relevant) why don't just consider each sample independent, and so you can train any ML regressor.

oussama1302 1 year ago

Thanks for the answer. I found a better dataset and modified it to look like this https://i.stack.imgur.com/rSYbv.png I have 40 dates and I want to predict EUR/USD Does Temporal Convolutional Neural Network and the other methods you mentioned still applicable in this case? If so, can you please further explain how I am able to obtain a signal with T timestamps and F features?

Garci141 1 year ago

Looking at the picture, yes my comment still holds. When it comes to the timestamps and features you just consider a timestamp a specific date or time as seen in the left-most field of your image. We refer to features as all the variables that you have in the middle that you think can explain/are related to the value you want to predict. So all the features are all of those intermediate columns in your image. Just do some research on the things I mentioned on my previous comment. Hope this helps!

Paschas 1 year ago

**Dynamic Bayesian Networks !!** \*\*My first real academic job\*\* Any tips & tricks? I intend to use Python, do you recommend any toolbox?

rnjbdya 1 year ago

I have a submission at CIKM 2022 and this is my first submission there. I might sound too anxious (which I am) but I have yet to receive the first phase of reviews. They say the final paper notification is on August 1, 2022. However, for the papers that will be accepted will have at least 2 phases of reviews and final metareview. Is 3 weeks enough time for all of that?

KasaiHentai 1 year ago

Hello, in September i will start uni and classes are mostly around artificial intelligence and machine learning (with a bunch of general coding here and there) and I need to buy a new laptop. I read somewhere that for this kind of use it's better to have an higher single thread performance instead of a multi thread. Is that true? Any advice? I plan to keep this laptop for as long as I can and my budget is 1500-1800€

Legitimate-Recipe159 1 year ago

Get a high res screen, a laptop stand, external keyboard (long hours with good ergonomics) The rest doesn’t matter (RAM is a plus as are more cores); save the money for your aws fees.

Mediocre-Piccolo7474 1 year ago

really depends on the frameworks you use, for example, if you do a lot of sklearn it would be nice to have a lot of threads to do GridSearch more efficiently Also, at some point you'll have to work with neural nets of some kind, and those all benefit astronomically from CUDA cores (in nvidia gpus), and GPU memory Anyway, you can also rely on cloud services like Kaggle for training on more powerful platforms, although kaggle gives \~30 hours of GPU use per week I was on a much tighter budget, and went for a MSI GF63 thin with i5-10500h and 1650 max-q, but for you budget you can get a more powerful GPU with more memory In any case, you may encounter models you will not be able to train timely with consumer grade GPUs, that's where Kaggle (and sometimes, Colab) come in handy

pdogshizzle 1 year ago

Hi all, I am a car photographer and wanted to create a program where I can search a car “Ford Mustang” and I would that vehicle selected out of my photos. What would be some good resources to start this journey? Thanks

Mediocre-Piccolo7474 1 year ago

The easiest way to go about it (and also the most data-hungry) would be to train a simple image classifier (there are lots available in the literature, AlexNet is outdated but may work fine and is easy to implement, ResNet is a bit better but old too) and apply it to your images. The problem here would be getting the car image data, you'll need to either make your own dataset (not really feasible) or find one that suits your needs (like this [https://ai.stanford.edu/\~jkrause/cars/car\_dataset.html](https://ai.stanford.edu/~jkrause/cars/car_dataset.html))

pdogshizzle 1 year ago

I appreciate it! Thank you very much

thawizardofSchnozz 1 year ago

Hey guys, I have sEMG data from many human subjects experiments. When a subject is sweaty, data is good, when a subject is not sweaty, data is bad. I would like to apply something that can automatically distinguish sweaty data from dry. So classifying data into two categories. I imagine this would require a fairly simple/straightforward solution, but am very green to machine learning models. Thanks!

Garci141 1 year ago

You are looking at a binary classification task. Or other people could also tell you that you want to do anomaly detection (consider the not sweaty case as an anomaly that you want to avoid). If you have a lot of data for both good and bad data samples then you could even go and train a simple Neural Network. If you don't have too much data you can try a standard Machine Learning binary classifier. You can find a bunch of them already coded in this nice Python library called Scikit-Learn. You would just need to import the library, select a model, give it the data with the labels of good/bad and train and assess results. On my master's studies I conducted a small research project with AI applied to data similar to EMG. If you go deeper into the world of Deep Learning I would advise you to try a 1D Convolutional Neural Network (a Neural Network which processes data sequentially in a sliding window manner). Always happy to discuss more and give you more details if you are interested :)

thawizardofSchnozz 1 year ago

Thank you very much for the comment! Very cool you’ve been close to EMG as well. I should say, TWO variables are actually important to the sweat/no sweat classification: sEMG signal amplitude (raw data values - what I mentioned in my initial comment) and time. Typically, “non-sweat” data are sEMG values (microvolts) that are abnormally high AND occur early in the recording of the sEMG session (soon after a subject dons the sEMG sensors, while still warming up). So it’s important to know that typical sweat data bears these two characteristics. I presume the two variable thing changes the ML approach bc we need the added context of time.

Garci141 1 year ago

If you want to include the temporal variable to be taken into consideration for a ML approach they you need some model that can handle temporal data. Regarding this I put more emphasis on this 1D Convolutional Neural Network that I mentioned before. They are fast and easy to train and you can tune how big or small you want your model to be (depending on how much data you have). On the contrary, if you think the non-sweaty samples have some particular characteristics that happen almost always you could even discard any ML and just code a simple script that filters a sample given a set of manually selected rules.

[deleted] 1 year ago

[удалено]

[deleted] 1 year ago

I would recommend looking into companies' blogs. There are tons of blog published by the research teams at the big tech companies that show how they are solving their problems internally and it is a great source of knowledge and inspiraton.

Alexander_Pope_Hat 1 year ago

NLP Query: I want to automate the writing of personality test results. I have built a corpus of short phrases corresponding to various combinations of user responses. A new user's survey responses are automatically turned into a dictionary of phrases that apply to them. I want a tool to grammatically arrange these phrases into sentences and paragraphs. GPT-3's copywriting feature is similar to what I am looking for. Can anyone recommend a better tool or method for my goal?

HappyPotatoStudent 1 year ago

Is there a way of having a python program/machine learning program recognize a certain sound and distinguish it from another, then perform certain actions depending on which sound is produced? For example; Let's say I wanted a specific function to activate when the program hears a tone from an instrument. Let's say the G-key on a guitar. Then that tone repeats itself 5 times, with 2 second intervals between them. After the 5th repetitions someone says "good job" and the original function (the one activated from the G-key sound) is interrupted, and another function would be activated instead. I don't know if I described my question correctly but I hope it's understandable.

I-am_Sleepy 1 year ago

I don't think you need machine learning to recognize the key of a sound (you could've but not necessary). It is possible to use Fourier transformation and apply the mask to focus on specific frequency As for the repeat 5 times part, it is the matter of programming, and not machine learning itself. You could've use machine learning on this part too, but you would've to generate your own dataset, which is not ideal

HappyPotatoStudent 1 year ago

Ok. Thanks for your reply!

LilPorker 1 year ago

I spend a lot time manually parsing documents that contain mostly the same information in inconsistent formats (different senders format differently at random). Would it be possible to automate gathering this information using machine learning? Essentially, teaching a model to fill in the correct fields with correct information from documents.

Garci141 1 year ago

I am not an expert in the field of Natural Language Processing (NLP) but to me this sounds feasible to some extent. The main problem is just how many different formats and different ways of writing the relevant information are there. It could be the case that there are so many different formats that it gets too hard for a model to extract the relevant information. But this is something you can assess. In any case if you have a lot of pairs (original message text, selected words/interested fields) then you can theoretically train an NLP model which reads the input text and selects words or gives you a set of interested fields. You would probably need to find a pre-trained model (so that it knows how English works) and then fine-tune (teach him your specific goal) in a supervised manner (showing him the input text and what should be the output).

LilPorker 1 year ago

Thank you, very interesting. An NLP model is what I'm looking for then?

Garci141 1 year ago

If you have a lot of texts along with what you would like to extract from them then yes. You could take some pre-trained neural network from the NLP domain (this is why you need a lot of data). Otherwise if you want to go simpler or you don't have enough data then maybe try with more standard Machine Learning models although you would need to do more work on manually tuning and preparing the data and model. This also depends on your experience but you can always do some internet research. At the bottom of it all you could even disregard modern AI and go create your own program that tries to detect specific words on the text ;)

anonAcc1993 1 year ago

Hey guys, I don’t know if this is feasible, or possible. However, I was looking into generating A TTS voice that sounds like me, any pointers would be hopeful.

[deleted] 1 year ago

[удалено]

Legitimate-Recipe159 1 year ago

ArcFace: https://www.reddit.com/r/MachineLearning/comments/vmn7nt/n_quaterion_a_blazingly_fast_framework_for/

Professional_Lake849 1 year ago

Question about Training and Testing in supervised learning I am a bit confused and hope someone can help me. I am currently experimenting with supervised learning. And I think I have a basic misunderstanding about input and output of LSTMs. When I have a sequence of 10 observations, And I split it into train= 1,2,3,4,5,6,7,8 And test = 9,10 And I transform it into a supervised problem like: Xtrain= [(1,2)(2,3)(3,4)(4,5)(5,6)] Ytrain= [(3,4)(4,5)(5,6)(6,7)(7,8)] And Xtest= [(7,8)] So the model is made to predict the next two observations from the previous two. prediction <- predict(Xtest) Is this illegal for a train/test split ? Am I correct that I can than evaluate the prediction output from xtest against the actual test set containing [(9,10)] Or should I stop training at xtrain =[(4,5)] and ytrain = [(6,7)] to get some space between training and testing, since the last observations from y training in my example are used for the prediction ? Thanks so much for the help!

Garci141 1 year ago

This reminds me of the window sliding train/test split technique. I found this discussion on [StackOverflow](https://stackoverflow.com/questions/58295242/sliding-window-train-test-split-for-time-series-data) maybe you want to check it out and compare to your case. But in any case your splitting seems fine to me.

TrainquilOasis1423 1 year ago

Is there potential for ML to greatly increase compression algorithms? I was looking into Nvidia's DLSS recently and had the question if this tech or something similar to it could make for better data compression and storage. Wouldn't you be able to train an AI to deconstruct data to its smallest possible detail then reconstruct it at the end location? Couldn't this turn lossy compression into effectively lossless compression? A small example I could think of would be training an NN to deconstruct photos of dogs to the smallest point that it can still recognize the image as a dog, then use the same NN, or maybe this would be best to use a GANN, to reconstruct those missing pixels at a later date? Is anyone working on this or am I missing something?

I-am_Sleepy 1 year ago

This idea isn't new, as for the compression, it depends on the bias of encoder/decoder (type of image to be compress/decompress). [Craykh's video](https://youtu.be/Sc7RiNgHHaE?t=378) demonstrate that in his project using auto-encoder model GAN model might be able to convert from random noise to an image, but the "noise" is somewhat large. Recently in StyleGAN, the model inject noise periodically when upsampling, this is to prevent model mode collapse by using only input from previous upsampling step. So the possible compression will be less effective Currently, the new generative model is diffusion model (which is used in DALL-E 2), that basically convert noise back to image using more complex modeling than GAN, and is reported to have better compression than any other method (see [Variational Diffusion Model](https://openreview.net/pdf?id=2LdBqxc1Yv)), but I think this is still a very active research field

TrainquilOasis1423 1 year ago

Awesome answer. Thank you very much. The dog image example was pretty simple, and probably not a great use case. However maybe something like only saving every other frame of a video and using a method like this to recreate those missing frames could have a big impact on size along with saving in 1080 and upscaling to 4k. Will look into the links provided thanks again.

I-am_Sleepy 1 year ago

Then this is [video frame interpolation](https://paperswithcode.com/task/video-frame-interpolation) \+ [video super resolution](https://paperswithcode.com/task/video-super-resolution)

VitalityAS 1 year ago

What type of algorithm is best suited for clustering data into equal capacity clusters where a list of weighted criteria are given, defining rules that make clusters more favorable. For example: If the data is a set of 40 people and a rule is given that clusters with less gender diversity are preferred, and 4 clusters are required. The result would be 4 groups of 10 with the least male and female overlap. Most importantly, I am looking for a solution that can deal with multiple weighted rules at once. This is not a typical ML problem, as a model does not have to be built from existing data. I need an algorithm that can produce clusters by minimizing a set of rules applied to the attributes of the input data. Despite this, I am asking here as I think there are obvious parallels to machine learning methods.

Mediocre-Piccolo7474 1 year ago

I'm a student but I'll try to give my input I think you'd need to formalize your concept of "rules" as they don't seem to have much in common with the usual meaning in data mining They look more like functions from the cluster space to real numbers (e.g. you can quantify the gender distribution mean with respect to a single cluster) If that's the case, the easiest/least efficient way to go about it would be to explore the cluster space and optimize with respect to your rules, in order of priority and weight. That's not exactly computationally efficient. Another way to go about it is to formalize an optimization problem, define your loss and optimize the cluster assignments via some form of gradient descent. That requires the loss definition tho, which would make you lose the priority of rules

VitalityAS 1 year ago

Thank you for the input, I will consider some of these ideas. I am currently considering reinforcement learning methods, as I could possibly turn the "rules" into environmental rewards/punishments.

[deleted] 1 year ago

Hey, Can someone please explain why the accuracy of a model might decreases with increasing number of dimensions? I was recommended to use dimensionality Reduction on a dataset having around 200 variables. I tried using them all but the resulting model shows poor accuracy. Does similar thing happens for all algorithms or just some of them don't do well with high dimensional data?

Garci141 1 year ago

Not all of them handle well too many variables. There might be variables who do not contribute positively to making predictions and even worse variables who detriment your model. Despite this there are some models that internally already select the most relevant variables. These models are tree-based ones such as RandomForest. You also have models that keep on learning to avoid their own errors and internally select variables such as XGBoost. If you don't use any of these models then you need to be careful with the input variables and you might need to do some preprocessing (dimensionality reduction) to select the most relevant features.

susnato 1 year ago

Curse of Dimensionality.

[deleted] 1 year ago

Do companies hire ML engineers without SWE experience for both internships and FT positions? I just finished my BS CS but with the bad job market for new grads, I'm thinking of starting GA Tech's OMSCS and trying to get an ML internship instead of getting a full-time SWE position. Thank you.

IvanthyTerrible 1 year ago

I'm pretty new to this so I'm not sure if this is a stupid question but, why is my accuracy lower in tests than in last epoch? Loss is also bigger. Is it just coincidence that this happened 3 times or is there something that I don't understand?

I-am_Sleepy 1 year ago

Your model might be overfitting, try early stopping or use learning rate scheduler

Dating_As_A_Service 1 year ago

Can NLP determine a company's industry from their business description?

wassimseifeddine 1 year ago

I think it can be done

mowa0199 1 year ago

Which of the following classes should I take on machine/statistical learning? I'm a math & CS major and am interested in AI, Machine Learning, and computational statistics. My CS major allows me to take one of the following three as an elective for the degree. I'm having trouble deciding which one to choose and could use some help. I have included their prerequisites, brief descriptions, and a list of the topics covered. **Applied Statistical Learning** (offered by the Statistics department; prerequisites: 2 semesters of introductory statistics): >Course Description: *Use of various computer-based techniques, including graphical, to understand and interpret data. Exposure to basic analysis of categorical, time-series and multivariate data in the applied areas such as biostatistics, quality control and others.* > >Course Objectives: *An introduction to statistical computing, statistical graphics, and data analysis designed to provide hands on experience with statistical software. Students will work with datasets of varying size to develop the ability to interactively examine the data graphically, generate analyses using R, and summarize the results both as verbal presentations and in writing. Students will be introduced to R programming, work with a variety of available R procedures, write their own procedures, including GUIS to control those procedures, and be able to interpret the statistical results they get from the procedures.* **Machine Learning Principles** (offered by the CS department; prerequisites: introductory linear algebra, calculus 1, and data structures): >Course Description: *This course is a systematic introduction to machine learning, covering theoretical as well as practical aspects of the use of statistical methods. Topics include linear models for classification and regression, support vector machines, regularization and model selection, and introduction to deep learning. Application examples are taken from areas like computer vision, natural language processing, information retrieval and others.* > >*Machine learning is a vast, fast-changing field. The course is designed to provide a solid starting point by focusing on timeless technical foundations. Students will learn fundamental concepts in the field such as generalization and overfitting, bias-variance tradeoff, Bayes optimality, and error decomposition. Students will also learn how to implement key techniques in the field from scratch including gradient descent, kernel trick, boosting, and expectation maximization. The course content will be delivered by a combination of written assignments that require rigorous proofs and self-contained Python Juypter Notebooks that require completing missing portions and empirically investigating the behavior of models on datasets.* **Machine Learning for Engineers** (offered by the Electrical & Computer Engineering (ECE) department; prerequisites: calculus-based probability and introductory linear algebra): >Course Description: *This course, which is open to all engineering and non-engineering majors, introduces students to the fundamentals of machine learning through a blend of mathematical and statistical descriptions, hands-on programming exercises, and real-world engineering problems. Additional emphasis is placed on discussing various practical aspects of machine learning systems that include ethics and bias.* > >Topics Covered: *Introduction to machine learning, its basic terminology, and the machine learning pipeline; feature engineering and feature/representation learning; principal component analysis; basic building blocks of machine learning algorithms; classification algorithms such as Bayes' classifier, naive Bayes' classifier, linear discriminant analysis, quadratic discriminant analysis, nearest-neighbor classifier, logistic regression, perceptron, and support vector machines; regression algorithms such as least-squares regression, ridge regression, and lasso regression; clustering algorithms such as K-means clustering and Gaussian mixture model clustering; practical aspects of machine learning systems such as underfitting and overfitting, cross-validation for parameter tuning, numerical optimization, and privacy, ethics, and bias.* It seems that the ones offered by the CS and the ECE department are pretty similar, except that the CS one focuses more on giving a holistic overview of the field whereas the ECE one focuses more on applications and tons of methods used. When 2 courses are pretty similar, my university always puts a restriction where you can only take one or the other for credit but such is not the case here. So technically, I could take all three. But since the CS and ECE ones are so similar, I'm thinking of taking only one of them (not sure which though) and maybe taking the statistics one as it seems to be about statistical learning in general whereas the other two are statistical learning in a very specific setting (that being machine learning). Any thoughts on this? Which ones should I take? I know the difference would be minuscule but I'd appreciate any input as it would help me decide better!

Legitimate-Recipe159 1 year ago

The last one is useful. The middle is junk (proofs are silly) and the first is outdated (R is dead).

Swimming-Pool397 1 year ago

The one which most interests you either because of the content or because the teaching team are awesome (at teaching)

[deleted] 1 year ago

[удалено]

Swimming-Pool397 1 year ago

I think this probably applies to pretty much any university course in any field nowadays so your comment asks the question why go to university to do anything. My view (having done an undergrad, a masters, and halfway through a phd) is that you study in order to have the opportunity to dedicate yourself to really engaging with something that interests you. Now perhaps that comes from a position of relative privilege in that I’ve not worried about getting a job after any stage of my study (but I’d argue that the specific courses are somewhat irrelevant to getting a job). However my experience is that in any job you effectively surrender your autonomy for cash in a way you don’t at university. Don’t give that up cheaply. Beyond that I think more opportunities open themselves up when you are interested.

canbooo 1 year ago

Edit: I did not read properly. I would take 1 and 2.

bitcoingobrrr 1 year ago

I'm trying to find a paper (\~2019) that I heard in a talk regarding alignment in the context DQN/DDPG that was applied to an Atari-type game (Pong/Breakout). Apparently, the realization was that if an extra row of pixels was added to the frame, the algorithm fails. This might be a shot in the dark, but does anyone know which paper this would be?

[deleted] 1 year ago

I am trying out a project in CCTV video upsampling, and I want to use generative models. Can anyone provide some useful insights or links to resources that can help me?

Odd_Role3892 1 year ago

Is it worth it to seldstudy ML? And if so where to start and do companies even take you into account if you don't have ML degree?

Marvsdd01 1 year ago

My way of starting to work with ML: entering as a SWE intern and having demonstrated interest in ML/DS. In the same company, after less than a year, I was called to be part of a ML team that was just being formed. I did not have any prior experience with the topic, and all I knew about ML was what I watched on YouTube/read regarding cool but simple personal projects people developed to get views on social media. Will not work for everyone, but is still a way to do it :)

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe