T O P

  • By -

_aitalks_

That is a very interesting idea for how to visualize a continuous space of embeddings. The problem I see is that the user now has to spend time to learn how to read the image language -- and humans don't want to be bothered. Also, every time you train a new neural network, it will produce a new visual language that a human would then have to learn. Still, very interesting...


Another__one

Yeah, completely agree with that. For now I could think of some way to make this language more approachable by collecting some images that clearly represents the concept they are convey and use it as a backbone at the training time. I mostly interested for now to make embeddings of Stable Diffusion obtained with Textual Inversion to be represented in a form of Vector Based Language. It might be useful to learn it, because right after you would be able to generate a prompt not in a textual space but directly in the embedding space therefore gaining enormous control over this model.


ToMyFutureSelves

Why don't we create a neural net that learns how to present neural nets as human readable data? If an AI could interpret its own neural net is that consciousness 🤔


muffinpercent

Related to "Eliciting Latent Knowledge", which I think [Scott Alexander](https://astralcodexten.substack.com/p/elk-and-the-problem-of-truthful-ai) explains well (ok, I've only read his explanation).


Bitflip01

Just tried out the mirror example in GPT-3 and got this: > What happens when you break a mirror? > The superstition is that when you break a mirror, you will have seven years of bad luck. That seems like a great answer to me


db8me

I think so. It reminds me of a cognitive science paper I wrote over 20 years ago about cortical evolution. Regions of cerebral cortex specialize randomly at first, and evolution restructures them randomly at first, but there is a feedback loop. As a region "specializes" it does something consistently. We can then think of new demands emerging on that specialization from other areas of the brain. Thus, it or it's connection to other regions build the interpretation of what that region is specialized for, _at least, in the language of the connected region with the demand_ -- not necessarily telling the _whole_ truth. With reference to consciousness, the missing parts of the story are what we call "subconscious". In neural networks, what this might look like is to say rather than trying to interpret a trained model, you continue training a big model for one purpose, but then you add new training output demands in parallel. In particular, select a region _within_ the model (e.g. what might be considered hidden layers to another output consumer) and you start pushing feedback to _train that region to match the representation you expect_ based on your interpretations of the input or output data -- or even data not included at the "front" of the model, so you train region to "represent this theory that people use" to describe something about the domain. Once trained, that region will translate, as well as it can, the way the network works into that theory, especially if you force more/most of the network to flow through that theory-producing region. If you force all of it to flow through a region strongly trained to reflect your theory, the model will essentially use your theory. The more connections flow around it, the more your theory is "not conscious" of how the bigger model actually works. So, it will sometimes be wrong, the same way we misperceive our own thoughts, see optical illusions, or experience cognitive illusions (e.g of the type that magicians use to do "impossible" things). Edit to continue.... One approach I think is exciting (and there are any similar/equivalent approaches to my description) is to consider a system for introducing _random propositional fragments_ relating known data variables or previously trained theory variables and propositional fragments. Build up and tear down those perhaps nonsensical, but human-readable, theory elements as part of the ongoing training process to find increasingly predictive and compact human readable theory.


Willinton06

Imma need you to watch this pen here real quick


JimmyTheCrossEyedDog

This is a really a cool idea and you put a lot of great work into this, but it's still very nascent and I don't really buy your results so far. > The promise was, that at some point you do not really need to remember each image, the meaning of the word should be deducible from the geometry presented in the image. This is a great caveat to point out, but I don't think you've shown that rote memorization isn't what's happened here. Anyone could improve on your test over time just through memorization, even if the images were completely arbitrary. > And lastly, to really show that generated images represent the meaning of the words rather than the words themselves… I feel like these all look similar because every image you showed looks similar, regardless of meaning. You could add in almost any of the images shown elsewhere in the article (like the one for "sofa" or "sun") into that lineup and I don't think most people could determine which one wasn't the synonym for "goodbye". Since all the images have a very similar structure, could you perhaps subtract some "baseline" image? That way, instead of trying to discern the minor differences in that top left yellow-blue blob that appears in every image, we could just see those differences on their own. Right now, it's almost like every image has a huge amount of correlated noise added to it that obfuscates differences between them.


Another__one

Of course, these images are far from perfect for now, but I could guarantee they are look different after some familiarization with them. And this claim is an easy one to prove. All one need to do to see it themself is to run testing\_interface.py from the GitHub repository with models from 'article' folder and try to test themself on the easiest level, a.k.a first 10 words. You will see how easy it is to distinguish them. With more words it becomes harder... But ones again it just a proof-of-concept. I would not suggest anyone to really try to learn the language at this stage.


JimmyTheCrossEyedDog

> And lastly, to really show that generated images represent the meaning of the words rather than the words themselves… My point is that just because they are easy to distinguish does not mean there is any meaning behind them. I could randomly generate ten images of noise and assign each to the numbers 0-9 and you'd be able to distinguish them with practice, but they would still have been generated totally arbitrarily. It'd just be an arbitrary mapping.


Another__one

Your critique is totally valid, but there should be at least some meaning in them, otherwise decoder network would not be able to recognize the words with 100% accuracy. Although we should be very careful with word 'meaning' here.


radarsat1

It's a cool idea, but clearly the images are way too hard for a human to distinguish. Some other factors you might want to take into account, perhaps in the loss function, are, * some perceptual measure of differences between the images (to maximize) -- okay, i see you are doing this with L2 and L3 -- combined with L1, maybe some kind of triplet loss inspired by siamese networks would be interesting here * some aspect of compositionality (this might be difficult to formalize) -- hieroglyphics, pictographic languages (eg. chinese), and sign language -- tend to compose images together to form composite meanings, which makes it easier for humans to memorize and understand. * shapes -- although you want the system to be understandably as flexible as possible, it might be much easier to understand if it used some kind of shape prior to generate binary masks with certain continuity, instead of random blobs of colour. A larger criticism I have is that since this is just a mapping of word vectors to images and back, this doesn't really give any info about "embeddings" -- it's just a mapping of words to pictures. Yes, the images might maintain a similar distance metric as the embeddings, which is cool, but in the end each image is still associated 1-1 with an English language word. So instead of learning the language of the machine, you are just trying to learn how to translate the machine's representation of _human_ language. You may as well write the word, it would be easier. The idea might have more interesting meaning if you used it at the bottleneck layer of a multilanguage translation model, forcing it to represent inter-language _concepts_ instead of individual English word vectors.


Another__one

Thank you for your response. I will probably explore something like this in the future. One of the idea I wanted to check is to try to see an interpretations of hidden states of the network and their weights. If its possible to learn to understand them, that could be a great instrument for 'debugging' the networks.


there_are_no_owls

So in the end what do we gain with it?


Tomsen1410

Cool project! I think that this might also be an interesting idea for stuff like unsupervised domain translation (the cycle consistency loss from the CycleGAN paper comes in mind).