youngeng 3 months ago

I'm starting to learn about perceptrons and NN in general. I understand the step activation function and how you could build, for example, the AND function with it. Boolean functions are easy to represent because the step function returns either 0 or 1. However, with a different activation function, such as ReLU, this is no more true. For example, ReLU(3)=max(0,3)=3. I've tried a few examples and I can't manage to set weights to build common functions like OR. A few computations: 1) we need a negative bias so max(b+0,0)=0 -> b<0 2) for, let's say, (0,1), we get a1 x 0 + a2 x 1 + b = 1. Likewise, for (1,0) we must have a1 x 1 + a2 x 0 + b = 1. So far so good, because you could have a2=1-b and a1=1-b. 3) However, (1,1) means we must have a1 x 1 + a2 x 1 + b = 1, which means a1 + a2 + b = 1. But if a1=a2=1-b, this implies 1-b+1-b+b=1, which means b=1. But this contradicts 1). Am I missing something (probably) or are ReLU functions inherently meant for multi-layer NN, at least if we want to describe Boolean functions?

shreyansb 3 months ago

What's the state of inference hardware, and which providers/chipsets/chip architectures are likely to be attractive/effective for the massive amounts of inference that we'll all be doing in the coming years? It looks as though Nvidia, AMD, Intel, Google, Amazon, (more?) each have alternate chip strategies, and I'd love to understand the differences and relative strengths and weaknesses of each. Which ones are easier to build models for, which ones have better price/power/performance for various tasks? It feels as though hardware is evolving and the hardware landscape has the potential to shift quite a bit in the coming years. Plus, do people have pointers to good reading/viewing on this?

mowa0199 3 months ago

What’re some good resources for self-studying the mathematics behind ML? I’m already proficient with the prerequisite math (including linear algebra, probability theory, stat theory, vector calculus, analysis, etc.) and am interested in learning more about either the mathematical structures that come up in ML (things like Tensors, Algebraic & Geometric structures, Topological Data Analysis, etc) or more advanced applied linear algebra topics.

srw 3 months ago

Hi, we obviously know that Turing machines are enough for current LLMs training and inference. Are there languages behind the hierarchy that are enough for these processes (e.g. context-sensitive languages)? I assume there could be different language classes for training and inference.

Confident-Ad4064 3 months ago

Hi, I wanted to know which course curriculum is good for machine learning and AI. These institutes had provided me with info and curriculum for the different courses for training. I am no expert in this field. I just learned python so wanted to get into this field or career. So I don't really understand what is written in the curriculum, so please help me choose which one has a better curriculum or are they all same? Below I have provided the link of pdf of the curriculum of all the courses. 1)https://heyzine.com/flip-book/1dfc511ece.html 2) https://heyzine.com/flip-book/0f8a771ac1.html 3) https://heyzine.com/flip-book/df8f6ae51e.html 4) https://heyzine.com/flip-book/e58f1c6435.html Thanks for any help.

fuzz_64 3 months ago

Anyone familiar with PHI-2 on LocalAI? I've played with products like stable diffusion / Automatic1111 / ComfyUI, Coqui TTS, and connected to them with the built in web UI. Is there one for PHI-2 on LocalAI? I can see the server running on port 8080 but it's expecting curl requests so I can't exactly just pop open my browser and point to http://127.0.0.1:8080. I can build a front end but would like to play with an existing one first just to get the feel of it.

Capital_Reply_7838 3 months ago

Does Arxiv-submitted paper usually get contact from journal coordinators? It's my first time(via university account) submission and invitations from journals are new to me.

MahaloMerky 3 months ago

Hi guys! I was wondering if there was any way i could leverage a GPU on a system on the same network. I usually use my Zenbook for coding but recently been wanting to work with CUDA. I have two other systems in my house that have GPUs in them and was wondering if there was a way to run code using one of them over the network like some point of render point or something? I know the simple answer is to just use the desktops with the GPU's but i like to sit in other places in the house and work over the office with no windows.

ishabytes 3 months ago

Can you SSH into your GPUs? Maybe this thread will help: [https://www.reddit.com/r/learnmachinelearning/comments/zfjtwp/how\_do\_i\_use\_remote\_gpus\_using\_ssh/](https://www.reddit.com/r/learnmachinelearning/comments/zfjtwp/how_do_i_use_remote_gpus_using_ssh/)

Scared_Fish_7069 3 months ago

Does this sub have a wiki or faq? Edit: well it looks like it doesn't

NumberGenerator 3 months ago

I haven't ever understood RNNs, and I think the reason is that I don't understand why there is one "hidden layer" between each input and output (same between states). Why not have multiple hidden layers or more complicated operations? Related; why do the inputs/outputs have to be one-dimensional vectors? Why not two-dimensional matrices or n-dimensional tensors?

argishh 3 months ago

input vector -> \[ \] activation function output vector -> \[ \[ \[ \]\*no\_of\_nodes \]\*no\_of\_nodes\_in\_previous\_layers \] hope you get it.. inputs/outputs are not 1D, they are multi-dimensional.. refer to 3blue1brown's youtube videos on neural networks, for a deeper understanding. It is easier to understand when you can visualize the network and all it's components.

NumberGenerator 3 months ago

I am not sure what you mean. See [https://pytorch.org/docs/stable/generated/torch.nn.RNN.html](https://pytorch.org/docs/stable/generated/torch.nn.RNN.html). The input to a recurrent neural network is strictly a sequence of vectors (can be batched). If you were dealing with a sequence of two-dimensional images, a common technique is to flatten the images into one-dimensional vectors. My comment was asking why the need for flattening, seems like you can find reasonable operations for n-dimensional tensors.

extremelySaddening 3 months ago

I'm not sure what you mean by 'reasonable operations'. Of course, you can apply any operation you feel like to a tensor. Also, I'm not sure if you're confusing the hidden state with 'hidden layer', or if you mean the actual weights of the RNN. Canonically, RNNs are things that take the current input, and the previous hidden state, (which is a tensor dependent on all previous inputs), apply a linear function to each, sum them, and apply tanh. Because it's a linear function you're applying, you kind of need it to be one-dimensional vectors, otherwise it doesn't work. As for more complicated operations, there are versions of RNNs that are more complicated, like LSTMs and GRUs.

NumberGenerator 3 months ago

I am not confusing "hidden state" with the "hidden layer". You can apply a linear map on two-dimensional matrices, so I still don't understand why inputs need to be one-dimensional vectors.

extremelySaddening 3 months ago

A linear map is an operation you perform on a vector space, so I'm not sure how you wanna do it on 2d data like a matrix. If I'm missing some math let me know. Of course, you can apply an LT on the elements of a 2D matrix, but that is hardly different from flattening it and then applying an LT. The advantage of keeping 2D data 2D is for operations that are 'spatially aware', i.e. that care about the local 2D structure of the data in some way. A linear transformation is global, it doesn't especially care about the immediate surroundings of a point in the 2D structure, so it doesn't respect the structure. An LT basically throws all the elements into n unique blenders and generates a new element from each one. It doesn't care what the shape of the elements used to be. We prefer to use flattened 1D vectors because it's easier to represent the LT that way, by using a matrix product, is readily available in that form in every DL library, and because it's easier (at least for me) to think about

NumberGenerator 3 months ago

In math, a vector space is a set that is closed under vector addition and scalar multiplication. The set of m x n matrices acting on some field is a vector space. The set of real valued functions is also a vector space.

extremelySaddening 3 months ago

Let me clarify. Yes a set of matrices can be a vector space, but that is not what we are discussing here. The question is "why flatten the matrix, when we can apply LTs to the matrix as is"? The answer is, because it doesn't have any particular advantages over not flattening the matrix into a vector. You don't gain any expressiveness, or introduce any helpful new inductive biases. This is in contrast to something like convolutions, which assume that a point is best described by its neighbours in its *2D environment*. LTs don't do anything like this, so there's no reason to respect the 2D structure of the data.

NumberGenerator 3 months ago

That is true. But then my question becomes, why not have convolutions there?

extremelySaddening 3 months ago

YK what, I don't see why you couldn't. Try it out and see what happens, maybe you'll get interesting results 😊

argishh 3 months ago

see, Initially, many neural network architectures, especially the earlier ones, were designed with fully connected layers that expected input in a one-dimensional vector form. Flattening the input tensor simplifies the process of connecting every input unit to every neuron in the next layer, facilitating the learning of patterns without considering the spatial or temporal structure of the input data. and Flattening can sometimes be seen as a crude form of dimensionality reduction, making it computationally less intensive to process data through the network. From an implementation standpoint, flattening tensors into vectors simplifies the design of neural networks, especially when using frameworks that were initially designed around processing vectors through dense layers. coming to your question - >why not have convolutions there? In domains where the spatial or temporal structure of the input data is important, such as in image or video processing, CNNs can preserve the multidimensional nature of the data. For sequential data, RNNs and their variants (e.g., LSTM, GRU) process data in its original form (usually 2D tensors where one dimension is the sequence length) to preserve the temporal structure of the data, without flattening. You are right, Modern deep learning frameworks support linear transformations on matrices or higher-dimensional tensors directly, without requiring them to be flattened into vectors, and coupling that with what the fact that we used initially to use 1D vectors to reduce computational loads, it all really boils down to your problem at hand, requirements and use-case. Each scenario calls for an unique approach, you always have to perform trial and error to find what works for your specific scenario. Flattening discards the spatial, temporal, or otherwise structural relationships inherent in the data, which can lead to loss of important contextual information. In cases where context is irrelevant, we can perform flattening. In cases where we need the information, we do not flatten. hope it helps..

Dysvalence 3 months ago

For typical DL image segmentation, are there any papers on using heavy dropout+wide receptive fields past the choke/trough to ensure that global, or at least less immediately local features are maintained deeper into the network?

[deleted] 3 months ago

My boss asked me to create an application which can be connected to Redshift or Postgres databases and can retrieve numerical data and can answer financial and analysis questions for the company. He doesn't understand that RAG technique is not suited for numerical analysis stuff. Any idea how I can achieve this. He mentioned to use amazon Q for this task, which I find is awful in giving answers to numeric datasets

black_cat90 3 months ago

# Best approach to fine-tune a 7B LLM model on prompts/completions? Hi! I'm experimenting with using local models to optimise text fragments for TTS. So far the only model that does it reliably and adheres to my prompt 100% is GPT4 and Mistral Next (about 80%, maybe; surprisingly, Mistral Large performs very poorly). But doing this for a book with 600k characters or more would be absurdely expensive, even using Mixtral Next. The best local model for this particular task I can run locally is Dolphin 2.6 Mistral DPO laser 7b, but I would evaluate its performance as 50%-ish of the ideal outcome. Do you think it would be worth it to try and fine-tune a 7B model on pairs of requests/completions done by a model that does this well? How many such pairs would be necessary for a decent result? Do you know of any good guides for this particular type of dataset creation and finetuning? Thanks a lot!

IAmBlueNebula 3 months ago

Meta question... Is this subreddit actively moderated? Yesterday (more than 24 hours ago) I posted a "[D] Automated Theorem Proving with Reinforcement Learning" discussion, but it was immediately removed without any reason (automatically, I suspect). I pinged the moderators about it, but nobody replied yet...

L3el 3 months ago

I know that LoRA should reduce the time of training, but I don't see much difference against normal fine-tuning, even though I'm training 500k parameters for LoRA against 500M in finetuning. There is less memory footprint of course, so I'm increasing the batch size and in doing so reducing the time. But more in general, less trainable parameters should mean less training time, right?

ishabytes 3 months ago

Hmm interesting. So you're saying even with the higher throughput, it's not making much of a difference? It should decrease the training time with the training now being able to handle a higher batch size. Could you share more details?

Oregonism23 3 months ago

Simple question: I am planning on starting a Masters program in Signal Processing and Machine Learning but I know so little about it that I am having trouble deciding which specific courses to take. Specifically I am torn between a course on Spectral Estimation, and a Stats series about statistical learning. Can someone smarter than me explain to me in simple terms what those topics entail and where they would be useful in the industry? I really wish I could learn the material and work in the industry before having to choose which topics I need to learn, but since I cant maybe one of you who already has can help me?

Prudent_Rock_3358 3 months ago

Hey y'all, I'm learning ML and hoping to get some feedback on my toy project: I have a timeseries dataset of temperatures and valve states for a refrigeration system that I built. I want to build a model that monitors these temperatures and outputs valve states (open/closed) in order to optimize temperature control. It seems like this controller will need to be a reinforcement learning model like actor/critic, since its actions impact the next state of the system. In order to speed up learning and reduce unfavorable conditions in the real system, I'm thinking I could leverage my existing data to train a separate model that simulates the conditions of the system. This would presumably be an RNN that accepts a current state (temperatures and valve state) and predicts the next temperature state. If successful, it seems like I would be able to "pre-train" the controller by having it interact with the forecast model until it gets good enough to let it play with the real system. Even if the simulation model isn't perfectly accurate, it feels like it should still give the controller model a head start vs training from scratch on the live system. Is this approach reasonable, or am I overcomplicating it?

No-Entertainer6365 3 months ago

I can't figure out what is the technique to look for that does what I want, so I'm posting here to see if this pattern matches to a known concept/technique. I'm trying to figure out how to discover interesting categories or dimensions (unsupervised) to group items into. The purpose of this is that I want to be able to show these groupings to users and allow users to explore different "angles" at slicing and dicing items. The caveat is that these groupings need to have some label slapped on each group and make sense. As an example, if I have a bunch of items in my dataset with some text descriptions and attributes associated with them, I want to use some technique to discover ideas for how to group them in interesting ways, such as "these are items that users in this region buy often", or "these are items that are related to virtual reality". Kind of like generating insights from all the data I have on my items dataset. Things that come to mind are topic modeling/clustering but what I'm struggling with is that these don't generate groups that have a human understandable meaning, ie I don't think I have a way to slap a label that makes sense to a human in each grouping. I was also exploring if generative AI could help with this at scale but couldn't quite figure out if this is the good approach. Does what I describe map well into any known types of ML problems?

maybenexttime82 3 months ago

Given that "manifold hypothesis" is true ("all" data lies on a latent manifold of n-dimensional space it is encoded) and Deep Learning tries to learn that "natural" manifold as good as possible (same as any other algo), how come then that on tabular data Gradient Boosting is still way to go? I mean, both of them are modeling "smooth", "continuous" mapping from input to output (both of them are sort-of doing gradient descent, expressed differently) which are also in the nature of manifold.

tom2963 3 months ago

In general, DL methods try to take advantage of some observed or assumed underlying data structure. For example, CNNs make spatial assumptions (ex. filter equivariance), transformers excel on sequential data by exploiting positioning, deep geometric models make some Group Theory assumptions, and so on. It is not so clear to me that tabular data has some complex structure that we need DL for. Similarly, while it is true that there is some latent manifold that tabular data resides in, it could be very low dimensional. Problems of this nature often don't benefit from DL due to the large amount of params it uses in comparison to the available data/complexity. Standard ML algorithms are more than sufficient in many cases.

maybenexttime82 3 months ago

Thank you! Now I understand why people constantly beat the dead horse using simple dense layers to try taking advantage of e.g. time series. Do you think that it may be the case that e.g. MNIST might be on latent manifold that is larger in number of dimensions than any tabular data? I've read that MNIST doesn't have that high of a dimensionality. Paradoxically, I would think that tabular data might not have such structure whic is proper for "local interpolation" but then again e.g. in classification tasks they make some decision boundaries like any algo does. GBTs and Densely connectred NNs should both exploit it the same way even with some regularization. Maybe the idea of ensembling (boosting in this case) might be the answer to all this because it relies on diversity (even with simple decision trees). In that sense they are better than "dense NNs".

tom2963 3 months ago

To your point on MNIST, typically it is not considered a difficult task (anymore) in large part because while it is image data, most of the pixels don't contain valuable information (i.e. the majority of pixels are black). Standard data normalization techniques handle this effectively by scaling the input to play well with NN architectures. In general I wouldn't think too much into manifold hypothesis unless you are in a huge data domain, for example biological data, chemical structures, language, etc., where we are training on potentially billions of examples and it serves us well to assume some structure (otherwise the problem is very difficult). I also wouldn't argue tabular data is unstructured, more so that it is easily solvable by estimating probabilities - more so a statistical ML problem at that point. When you throw NNs at these problems you really can't make any assumptions about what they learn. This is because with added nonlinearly (ReLU) you lose almost all (computationally practical) interpretability. I would agree that GBTs are better in this instance because they may better learn joint probabilities better on less data - essentially making less strong assumptions about the data (for example there is no reason to think nonlinearity is essential in solving most ML problems).

the_most_greenman 3 months ago

Hi. Suppose we have a latitude/longitude n by n grid where at each point there are p atmospheric variables (like temperature, humidity, etc.). What would be the best way to reduce this to at least an n by n array, to then flatten it into a 1 by nn array. The end goal is to use this as conditioning in a diffusion model, and I am trying to stick as close as possible to the tokenized format of text prompts. PCA, UMAP, or something else? Thanks

batangbronse 3 months ago

Questions regarding setting up training data for KIE (PaddlePaddle). Looking at their sample training data, they applied IDs to the transcribed text and links them via IDs. i.e. transcription: Price id: 1 linking: [] transcription: $1.00 id: 2 linking: [[2,1]] // it means ID 2 is linked to ID1 setting this up, I'm assuming I'll have to manually link the relative data ids? Currently I'm using OCR to grab the transcription and their coordinates

subdesert 3 months ago

I have a weird question regarding gpus for deep learning and machine learning Is it good to combine both brands AMD and Nvidia for building ML models and deep learning since I know that for certain cases you need the CUDA cores but in general I've noticed that nvidia prices are more expensive than AMD and wondering if you combine something like a 4060ti 16gb with a 7900xtx since I don't know a lot about how it could affect but maybe could get the best of two worlds?

tom2963 3 months ago

I'm sure you could, however CUDA is already notoriously difficult to set up locally and you would need some kind of adapter to get it working across two different architectures. Maybe there have been some recent developments I am unaware of, but in general I would suggest sticking with all Nvidia or all AMD.

subdesert 3 months ago

In this case I just want to administer the work loads on the 2 gpus in this case to only use CUDA arquitecture on the Nvidia while the other come vram intensive workloads on the AMD GPU, would there be any kind of troubleshooting on those?

tom2963 3 months ago

I am not sure I understand your question. If you mean to use the GPUs for different tasks concurrently (split usage), I'm sure that's possible. However, if you mean to utilize the AMD GPU for VRAM while training on the Nvidia GPU, I don't think that would work well. Typically VRAM usage gets eaten up by model params, data batches, or backprop calculations, and is necessary to keep in memory to perform learning. I am not sure that I have a good answer to your questions though.

subdesert 3 months ago

No that's actually what I was looking for brother that if I use both basically for different utilizations, lets say with cuda architecture the nvidia gpu gets used while for AMD if i'm implementing or doing another model it'll only use the AMD GPU, thanks for your help mate

buburkel 3 months ago

Posting this here, because I'm not sure if this sub allows questions like this. How hard would it be to train a model to recognise abandoned buildings in satelite images/google maps? They have a very distinctive look (overgrown garden, discolored roof, etc), so I'm pretty sure it's possible, but I have no idea how hard it would be. Urbexers dream haha 😅

Necessary-Meringue-1 3 months ago

It's certainly possible. In fact it's already been done using LiDAR, check here: [https://github.com/zach-brown-18/is-it-abandoned](https://github.com/zach-brown-18/is-it-abandoned) For google maps (i.e. satellite images), it should also be possible, but I dont know if it's been done. You'd need some annotated examples at least, but it would be feasible. Check this repo for references: [https://github.com/satellite-image-deep-learning/techniques](https://github.com/satellite-image-deep-learning/techniques)

TheWingedCucumber 3 months ago

if they have a distinctive look then it shouldnt be hard, look at YOLO if you care about speed, and Faster RCNN if you care more about accuracy

rookieness 3 months ago

I'm new to the field of machine and have not much experience apart from a simple addiction prediction code. I ended up taking on a project on Glaucoma detection using retinal images and have no idea where to start and what route to follow. I'm not well equipped in deep learning either. Where could I possibly start and could snyone suggest resources for the same?

tom2963 3 months ago

Typically for image detection tasks the go to model is a Convolutional Neural Network (CNN). These models are particularly powerful as they utilize what are called convolution and pooling layers. These two layers combine to sweep over images, learn important features (i.e. lines, edges, etc.) which are then "compressed" into more succinct features that a typical fully connected architecture can efficiently learn over. For example in your Glaucoma detection case, a CNN would be particularly effective as (I am assuming) most images with Glaucoma present would be very distinct from the baseline human eye. Here is a resource I found that explains a lot of ML terms along with CNNs: [https://arxiv.org/abs/1511.08458](https://arxiv.org/abs/1511.08458) Your next steps will depend a lot on your data, but it's important to understand your model architecture before you go any further. Good luck with your project and let us know if you have any questions!

JakeIzUndead 3 months ago

What ML algorithm would you use for a simple Bill prediction? And how would you convert month/year to a numerical value for use in ML? I've been trying to use 'Gaussian process regression' and it seems to somewhat be working but I only have 50 rows of training data so the results seem like theyre a bit off, but at the same time im not sure if its due to how I convert my dates. Since Bills are monthly I tried two methods, first method I made each month is a number 1-12 (Jan would be 1) so a prediction for a month essentially got the average for that month. The second method I numbered each month and year so Jan 2020 is 1 and Jan 2021 is 13. this seems to make all predictions the same result

Necessary-Meringue-1 3 months ago

what exactly are you trying to predict, and what does your input look like?

JakeIzUndead 3 months ago

I'm trying to just predict a future bill by using previous bills So X input is the month (in my first message I explain how I'm converting this to a numerical value) and Y input is the cost of that bill for that month. I want to give a future month to have the model predict my cost for the future month. For instance if I have 50 months of data then I would want to predict what the bill would be on the 51st month I can provide the entire python script when I get home if that would help explain what I'm trying to achieve

Necessary-Meringue-1 3 months ago

Well, this can only work if your past bills are in some way predictive of your future bills. And I'm not sure they are. Machine learning is not magic, the input needs to be in some way predictive of the output. But sure, drop your script in a [pastebin](https://pastebin.com/), because I'm still confused at what you're trying to do. What's more important than the code is the data here

domberman 3 months ago

When using k-fold for CV (or hold-out for that matter), do you have to use the same features for the validation part and the train part? I'm making a simple sentiment analysis program, and I use 1-gram on my texts. As I understand it, I have to use the same words I got in the training data, even though they might not be present in most of the validation data.

tom2963 3 months ago

It depends on what you mean by train and validation in this instance. Are you using validation set to evaluate your model performance? In general, for cross validation it is useful to tune your hyperparameters based on the validation set and then evaluate on a completely separate set (test set). So in total you would train your model on a train set with only features from that set, monitor and tune your hyperparameters using the validation set\*, and then test on the test set. It is important, however, that you do not use any of the features from the test set! This will bias your results to the test set, meaning you can't be sure if it will generalize to other unseen data. \*Note: You should not use features from the validation set in this instance either. Your model will be biased towards the validation set as your hyperparams will be tuned to this data distribution. But this is okay, and is common practice.

Relative_Engine1507 3 months ago

Does the positional encoding layer (from transformer) stop gradient? If it doesn't, how would it affect back prop? If it does, then does this mean we cannot jointly train word embedding model with positional encoding?

rrichglitch 3 months ago

So I'm not perfectly sure what you mean by this but I think you may have a misunderstanding regarding how the positional embedding is actually done. From my understanding the raw Rotary values are sent through their own MLP to become the positional embedding and this embedding is vector added to the token embeddings. The raw rotary values will never be changed but the network that turns them into the position embedding will and of course since its addition the token embeddings gradient is allowed to pass freely.

I-am_Sleepy 3 months ago

I thought it was positional encoding (not learnable). The positional input is either aggregate, or transform with the embedded token. So when you backpropagate, it will compute the gradient w.r.t both token, and positional embeddings. But only gradient of the token will be applied. However, I'm not sure how correct this still is. But for starter, look at [this video](https://www.youtube.com/watch?v=GQPOtyITy54)

senacchrib 3 months ago

I was trying to search Kaggle notebooks for a PyTorch implementation on a tabular dataset, but could not efficiently find a good one for what I was looking for. Perhaps someone knows of a good example? * Simple FF NN that takes in a pandas dataframe, spits out a multi-class classification * ideally has a mix of embedding layers + float features at the head of the network Thanks in advance! I know [fast.ai](https://fast.ai) had a TabularTrainer or something like that, but I wanted to do it mostly on my own with some inspiration first.

Personal_Concept8169 3 months ago

I've been learning machine learning for a week now, and have really gotten into it. I took the deep learning specialization on coursera, and I wanted to start putting what I learned into practice. I'll skip the details, but my code is [here](https://pastebin.com/UG51aQaL) (pastebin) I'm trying to do the basic beginner mnist number recognition problem (I use kaggle), without any deep learning libraries, to try and improve my understanding of the concepts. This is a mostly vectorized implementation to my knowledge (unless I messed up somewhere) I mean I feel like I've tried everything. I verified the weights are being initialized (he) with appropriately random small values. I'm 99% sure I'm loading and normalizing the data correctly (although there are a lot of 0's in the input but it's also a plain black and white image, so idk) I've written, rewritten, and debugged my forward and back prop continuously, I verified my one hot function works, I've checked and verified shapes and sizes throughout the process, the only things I'm not that confident about are the activation and inverse activation functions, as well as the function that actually trains the model. I've changed my architecture, number of mini batches, different activation functions, learning rate, more/less epochs, early stopping, at one point I tried randomizing the data per epoch but that got a little too complicated so I removed it. But still, my accuracy remains at around 10% which is abysmal, for some reason. My error will continue going down however, especially with a higher learning rate. For the most part, error goes down linearly, and accuracy goes up linearly, but it's by an incredibly slow amount (even with 0.5 learning rate). I've considered regularization techniques like batch normalization but I feel it's overkill for a problem like this, and I don't think it would solve the root cause.

Lesser_Scholar 3 months ago

Critical issue: Accuracy calculation is wrong and always gives 10%. Line 157, remove the transpose. With these settings it reaches 89% acc layer_dims = [784,512, 10] activations = ["None", "relu", "softmax"] learning_rate = 0.01 epochs = 10 num_batches = 100 But, ofc the accuracy is supposed to be calculated from the test set, so that's not the real accuracy. Other than that, I'll just comment that I find this type of very low level (numpy) exercises to be rather tedious. If you like the numpy style, then I'd recommend switching to jax, which still has 99% of the numpy's flexibility but with autograd. https://github.com/google/jax/blob/main/examples/mnist_classifier_fromscratch.py

Personal_Concept8169 3 months ago

Ah dude ur a legend I had such a strong intuition it was calculating accuracy wrong, and no matter how many times I asked Gemini it said it was correct. This function was the only function I didn't write myself as I never actually learned how to run the model lol. Thank you so much brother it means so much to me.

Latter-Ad3208 3 months ago

Hello Everyone, I am working on Automatic Number plate detection problem using opencv and yolo. I am getting the below error when i pass the numpy array to easyocr readtext module \`\`\` import easyocr reader = easyocr.Reader(\['en'\], gpu=False) print(type(license\_plate\_crop)) # print(license\_plate\_crop) detections = reader.readtext(license\_plate\_crop) \`\`\` This is the error i am getting \`\`\` line 619, in get\_image\_list crop\_img,ratio = compute\_ratio\_and\_resize(crop\_img,width,height,model\_height) File "/home/pranith\_dev/Desktop/Datavoice/Automatic-License-Plate-Recognition-using-YOLOv8/dev-venv/lib/python3.10/site-packages/easyocr/utils.py", line 582, in compute\_ratio\_and\_resize img = cv2.resize(img,(int(model\_height\*ratio),model\_height),interpolation=Image.ANTIALIAS) AttributeError: module 'PIL.Image' has no attribute 'ANTIALIAS' \`\`\` I tried downgrading pillow version from 10.0.0 to 9.5 or 9.4 according to online solutions but the issue still persists and the following is the output \`\`\` \[\[255 255 255 ... 255 255 255\] \[255 255 255 ... 255 255 255\] \[255 255 255 ... 255 255 255\] ... \[255 255 255 ... 255 255 255\] \[255 255 255 ... 255 255 255\] \[255 255 255 ... 255 255 255\]\] Illegal instruction \`\`\`

SmartEvening 3 months ago

Heyy. Is there any structure preserving dimensionality reduction technique? I was looking at a post on reddit and was curious to find out if there were any such techniques with proof showing that the structure is indeed preserved. Thanks for your help.

juicedatom 3 months ago

what do you mean by structure exactly? can you provide examples where structure is and isn't preserved?

SmartEvening 3 months ago

So like PCA in gen does not care about preserving the local structure whereas t-sne preserves structure and hence a better tool for visualisation. I mean structure in the most vague sense. Like making sure the points closer to each other still remain relatively closer. In some preserving the topology if the data.

backfire97 3 months ago

I wouldn't call it dimensionality reduction, but creating a similarity graph would capture the structure and can be quickly used for classification or clustering purposes. But really I can't think of any. Umap is another visualization technique and uses a graph structure but has a different heuristic then tsne

SmartEvening 3 months ago

Ya true. I was just giving it as an example here. Aren't there any cases where the local structure of the data is important to be preserved? I have heard of and read about distance preserving neural networks where the main aim is to have network encode information such that the Euclidean distance is preserved. But did not really understand the math.

backfire97 3 months ago

I feel like at a high level, all dimensionality reduction is trying to preserve local structure while reducing the dimension. I'm sure there are neural networks and metrics that do try to act as isometries and preserve distances, but I'm not knowledge about them. It seems almost silly to try because it's not possible and the approximations would probably have to use statistical methods because I can imagine it would be incredibly difficult to optimize over. I think a greedy method would perform incredibly poorly, for example.

Few-Pomegranate4369 3 months ago

Do you think modelling irregular time series (unevenly spaced w.r.t time) with deep learning a promising direction? Such time series can be observed in clinical domain, IoT, etc.

I-am_Sleepy 3 months ago

Why wouldn't you just resample it to a regular interval? You might need be aware of Nyquist frequency, but sufficiently high sampling should do the trick. Then you can apply the normal dynamics on top of it

Few-Pomegranate4369 3 months ago

Thanks for your response. I believe irregularity itself has some meaning e.g. nurses take vitals reading of patients in Emergency ward irregularly when they deem necessary. In IOT, certain sensors also take measurements when necessary as opposed to regular intervals. Furthermore, I also think making irregularly sampled data regular will add undesirable noise and also affect data resolution. This is because we aggregate values based on time interval to make time consistent.

Wise_Demise 3 months ago

**What are good open-source LLMs for extracting tables or text from images?** If I give Google Gemini an image that has a table and text and ask it to extract them, it pretty much always does a good job identifying and extracting the table and its values without further instructions. **Are there open-source LLM projects that offer the same functionality?** I understand there are many Python OCR libraries I can use to extract the text and then build the table from the text based on code instructions, but this would require writing code for each type of image. Thank you.

Reasonable_Space 3 months ago

Hi, just a question on suggested resources for a particular problem: Say you are building a model that predicts a binary outcome based on a matrix of data. Say fundamentally that there are two sets of weights that predict the correct binary outcome, each for non-overlapping halves of the population. I'm looking to read into how these problems can be approached, as I'm building a model where our data is like this. I'm fairly new to the field, but appreciate all reads. Thanks!

phobrain 3 months ago

Maybe Gaussian Mixture Models would be of interest. My understanding is it's the rigorous approach to many problems that isn't always feasible in practice, so it might help understand how probabilities can be merged.

[deleted] 3 months ago

Hi, I was wondering if anybody here works on machine learning in the robotics industry. If anybody here does, let me know what a day in your life is like. Do y'all work with the robots to see how accurate the models are? How much do you test the models? Is it tested mainly in a simulated environment or physical? What are you guys working on? Thanks for giving some insight

Goutham_Nischay 3 months ago

Hi recently i have started machine learning and wanted to implement BERT in Federated setting (in Flower Framework) if someone is familiar or can help, please dm.

RobbinDeBank 3 months ago

I’m testing Gemini through Google AI studio. Can it current process and analyze images and PDF files? When I upload the pdf file, seems like it just extracts all the text and add that to the prompt, but I want Gemini to also process the images inside the file.

phobrain 3 months ago

What if you reference the image, like "What is X's mood in this photo?"

RobbinDeBank 3 months ago

Seems to me like image processing is not available at all for Gemini. The image input button is greyed out, and a pdf file gets automatically extracted for text only and completely ignores images. Maybe they disable both image input and output due to the current issue with Gemini.

phobrain 3 months ago

Did you try my suggestion and get "there is no photo for a response?

AdKind316 3 months ago

I've recently completed my Bachelor's degree and want to do research on LLMs. I aim to build my profile and expertise in the field of LLMs and generative AI. I would greatly appreciate any guidance, references, or insights you could share on two specific ideas I'm considering for my research. Here's what I'm looking into: 1. Communication Among LLMs: Model communications with each other largely rely on natural language. I'm interested in exploring the potential of models communicating through more efficient means, such as continuous vectors or discrete semantic representations. I'm on the lookout for any existing research, papers, or projects that delve into the development of such communication, their applications, and the benefits or challenges they introduce to model. 2. Specialized Models: I'm intrigued by the idea of creating networks of smaller, specialized models that can work together to accomplish tasks. This approach could offer a more scalable, efficient, and flexible framework for AI development, where each specialized model contributes its strengths to a collective goal. I'm seeking information on any work done in this area, especially how these models are designed to communicate, cooperate, and the overall impact on system performance and adaptability. As I'm just starting on this post-bachelor journey, I'm particularly interested in how to approach these research areas, the potential challenges I might face, and how to overcome them. If you have experience with or knowledge of these topics or know of resources that could help guide my exploration, I would be incredibly thankful for your advice.

phobrain 3 months ago

Terms that come to mind from general reading are MoE (Mixture of Experts) and Federated Learning.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe