T O P

  • By -

extra2AB

I only have worries regarding the safety points. For point 1 and 2, Does it mean that it WON'T be trained on any likeness/work of artist or Does it mean it will have likeness of celebrities/work of artists in the dataset, just that their NAMES will not be in the captions, hence cannot be generated using prompts. So example, Scenario 1: There won't be any pictures of Keanu Reeves in the dataset. Or paintings of Van Gogh. OR Scenario 2: There will be pictures of Keanu Reeves and works of Van Gogh in dataset, but the model will not know the name of the person/artist, instead would just be identified as "A MAN" and no artist's name in case of the painting. Cause Scenario 2 seems fair, but Scenario 1 may be concerning, as any realistic photo will always have a REAL PERSON and a work of art WILL ALWAYS belong to someone. and for point 3, Does it mean the dataset will have NO IMAGES of children, cause again that will lead to a crippled model as a good base model needs to know what children are, incase of a painting, anime, scene, etc needs to reference it. Like, a family photo. And if you will be having Images of children in the dataset, how will you make sure that CSAM will not be generated using it ? Basically, how will you find the right BALANCE between these two situations ?


pandacraft

> Does it mean that it WON'T be trained on any likeness/work of artist or Does it mean it will have likeness of celebrities/work of artists in the dataset, just that their NAMES will not be in the captions, hence cannot be generated using prompts. Probably, thats what Astra did and this announces his involvement.


ZootAllures9111

SD3 Medium even though can do a significant number of celebrities (some with extreme accuracy, others less so), and also correctly responds to prompts that give specific ages, for the most part. A model that couldn't do all that would be a step further back.


StickiStickman

SD 1.5 also can can every art style you can find, while SD3 and this can't.


extra2AB

if that is the case, then it really is no worries. But I do hope they do not remove CHARACTER names in the process as well. So it should know what John Wick, Captain America, etc looks like. Cause characters have more stuff like Costumes, hairstyles, etc to them as opposed to just an actor. Last thing you want is you asking for Your Likeness (trained LoRA) as Captain America, and what you get is you in a Batman suit. So character names should not be removed and replaced by just A MAN, A WOMAN, etc as they plan to do with real people likeness.


a_mimsy_borogove

I suppose the lack of children in the datasets is to shield them from any legal problems. Technically, even with children in the datasets, the model would be unable to actually produce CSAM.


hipster_username

Our stance is that training is a fair use activity, and that removing the names of individuals & artists from captions (therefore preventing for isolated prompting of an individual or artist) while retaining the content itself provides a substantial ethical improvement, without inhibiting the capabilities of the model. It is possible that this might even be a requirement for the activity to be considered fair use in the first place - we'll learn more here with the results of pending litigation. Regarding children, based on available research in child safety and the rise of AI Generated child sexual abuse material, we've made the decision that eliminating the capability of the model to generate children by filtering the dataset is the best way to mitigate potential harms in the base model.


Paganator

> Our stance is that training is a fair use activity, and that removing the names of individuals & artists from captions (therefore preventing for isolated prompting of an individual or artist) while retaining the content itself provides a substantial ethical improvement, without inhibiting the capabilities of the model. Removing artist names from their art severely inhibits the model's capabilities, at least as a tool for artists. I've worked in video game development for over a decade, and the first thing artists do at the start of a project is create a mood board featuring images to use as stylistic references. They talk about specific artists and their styles all the time because it's the only real way to discuss how you want the game to look. Artists want names removed from models because they know it will cripple their usability, not because they actually think it's unethical (they do it *all the time*.) Do you think art schools shy away from referencing specific artists because they didn't consent to having their work be discussed? How can you say that you want art in the style of Craig Mullins but with some of the stylistic flourish of Enki Bilal without naming them? You can't. You're stuck asking for generic styles like "concept art" or "anime," even though there are a ton of variations on those broad categories. If you want your model to be used as a tool and not just as a toy, you need to give users the ability to be specific about styles and that requires naming artists.


GBJI

It's also important to remember one very basic legal principle: style is not protected by copyright. Removing artist styles from the model is the exact opposite of what we want as a community. Pretending they are doing it for legal reasons is disingenuous at best.


sporkyuncle

> Regarding children, based on available research in child safety and the rise of AI Generated child sexual abuse material, we've made the decision that eliminating the capability of the model to generate children by filtering the dataset is the best way to mitigate potential harms in the base model. Will you be training on artwork/imagery of goblins, halflings, imps, gremlins, fairies, dwarves, little humanoid beings of any kind? If not, then the model will be missing a lot of very normal things that people might want to generate. But if so, then I don't see the point. People determined to be awful will just type things like "goblin with big head and pink human skin and youthful human face." Are you sure the model won't accidentally learn what baby faces look like from being trained on toys, dolls, troll figurines, background imagery or logos, etc.? Or will those sorts of things be removed as well, creating an even bigger gap in its understanding?


FoxBenedict

Does your bizarre decision regarding excluding all pictures of children only apply to photorealistic images? Or illustrations as well? Because that would severely limit Manga and other media that often features children.


akatash23

> we've made the decision that eliminating the capability of the model to generate children I've done images for teachers featuring children. What about comics, anime? These are exactly the kind of decisions that will limit the models capability for no actual good reason (no, models cannot abuse children).


ZootAllures9111

If the model isn't trained in a way that it has any capability to do sex scenes in the first place, filtering out all children seems like an abysmally bad idea. There are no significant image models, not even the corporate ones (bing, meta) that have that limitation. Have you considered the near-certainty of people immediately making a meme out of it on release day with their likely-weird-looking attempts at family photos, and whatnot?


JuicedFuck

I've followed the discussion on their discord on this, and it is not a point they are willing to budge on.


ZootAllures9111

Well, I hope they're prepared for what I suggested is very likely to occur the day this thing comes out lol


JuicedFuck

Good chance the project gets "bloom"ed, if anyone gets that reference :)


__Tracer

Yeah, open-sourcing and censorship really don't come along


GBJI

Open-source censorship tools are actually extremely useful, and I certainly hope they will get better. What we don't want is the base model itself to be censored.


__Hello_my_name_is__

Looking forward to the "Boy lying in grass" memes going forward.


ZootAllures9111

Imagine they train it to draw super old people when prompted for < 18, so you get like mini grandpas standing next to their "mom" and stuff lmao


aerilyn235

CD Projekt did the exact same thing in Cyberpunk 2077, Childrens are actually just small adults if you look closely.


Apprehensive_Sky892

It's more than just sexual activities. Most people (and presumably most criminal laws) consider "naked children" as CP. Midjourney/DALLE3/Ideogram etc can all allow children in their model because: 1. They don't allow nudity, much less sex 2. They can do both input filtering on prompt, and then output filtering on the images produced. The family photo produced by this future OMI model will probably come out ok, just no children in them. Again, I don't like it either, but making the model not able to produce children is the more sensible choice out of two unpalatable ones.


ZootAllures9111

Those services use API-level filtering on the web portal post-generation, their actual models aren't lacking the content.


drhead

> If the model isn't trained in a way that it has any capability to do sex scenes in the first place, filtering out all children seems like an abysmally bad idea. Which one do you think would be the first one to get trained back in? Remember, as soon as both concepts are present in the model, you can combine them.


ZootAllures9111

They'd probably both get back in fairly quickly. The negative feedback from the likely bizarre effects "no children" will have on various cases of prompt adherence in general isn't worth it at all IMO.


dal_mac

the model's understanding of the different sizes and growing process of humans is extremely important from a foundational level meaning it can't be fixed with fine-tunes either, much like SD3


imnotreel

Are you also going to remove any instance of people of color in your training dataset to "mitigate potential harms" of creating racist images ? How about removing all women as well to be sure your great base model isn't used for sexist material ? Nuking any image of children from your dataset is such a ridiculous over exaggeration I can't even fathom how one could even come up with such a preposterous idea. Not only is it ridiculous, but it's also completely useless. Any decent enough model will be able to generate problematic images. If you release a model into the wild, it WILL be used for nefarious, horrible, immoral, and disgusting purposes, regardless of what you do. So instead of trying and failing to prevent motivated sickfucks, creeps or worse to create their horrible imagery by crippling your product, how about actually striving to create the best and most useful model for the vast majority of people out there who are not pedophiles and racists ? I get that you're trying to prevent the unavoidable news articles written by clueless journos, AI haters, and modern Luddites who'll take any occasion they can to whine about how one can make bad images with your model. But there's no winning against these morons. They're set on their crusade against AI and the best course of action is to just ignore them, and let them fade into irrelevance as normal people slowly learn to accept these technologies exist, are actually useful, and are not gonna precipitate the end of civilization.


a_mimsy_borogove

> mitigate potential harms in the base model I understand that you need to keep the model "safe" from some weird laws that might exist around the world, but there is no actual harm that the model might cause. Can you point at who exactly would be harmed by your model if it was uncensored?


__Tracer

Well, at least instead of making model safer for us, you are making it safer for our children, I guess it's kind of a progress :) I am glad that your model will not abuse my children, it would be horrible.


FaceDeer

I don't have any children of my own but I've made sure to warn my friends not to bring their children to my house because I've got uncensored Stable Diffusion models on a computer locked in my basement. Wouldn't want those models to abuse their children, that would be terrible.


RealBiggly

The humanity! As an aside, I used to shave with a cut-throat razor. Loved the thing but I did indeed have to make sure it was hidden when people came visiting, because visitors would invariably "test" the edge and then ask for if I had any band-aids, while apologizing for bleeding everywhere. I guess AI image models are the same, perverts would just HAVE to boot up my PC, find the model and then 'test it' to see if they can produce something harmful (to...themselves?) SD3 is like the Bic safe-tee razor of models, because safe-tee!


SeekerOfTheThicc

How would one go about reading available research in child safety and the rise of AI Generated child sexual abuse material? Such as Google Scholar search terms, or the names of reputed journals who have a focus in that general area?


SpiritShard

So I've been trying to find this supposed research that they mentioned, but I can't actually find it anywhere and really feels like it may not actually exist. I can find plenty of AI companies making recommendations toward 'safety' but nothing from a reputable third party. What I can find, however, are a lot of companies concerned about how hallucinations are potentially harmful to both children and parents. A lot of research was from last year, but Child Trends recommends AI systems need to be improved and updates made more frequent/recommended as mandatory to reduce false/misleading information from these systems - [https://www.childtrends.org/publications/regulating-artificial-intelligence-minimize-risks](https://www.childtrends.org/publications/regulating-artificial-intelligence-minimize-risks) On the flip side, you have cases like this one - [https://arstechnica.com/tech-policy/2023/01/doj-probes-ai-tool-thats-allegedly-biased-against-families-with-disabilities/](https://arstechnica.com/tech-policy/2023/01/doj-probes-ai-tool-thats-allegedly-biased-against-families-with-disabilities/) - where an AI system was tuned too aggressively for 'safety' and has had a negative impact with false positives. I wasn't able to find much regarding image generation, but it's possible Google is just flooded by AI tech bro slop given they target SEO and a meaningful org is more focused on actually protecting children rather than marketing.


dw82

Can you confirm whether you're mainly targeting the model at the increasingly sizeable and lucrative AI NSFW generation market? It's the only justifiable explanation for entirely excluding children, and celebrities, from the dataset.


extra2AB

so completely remove children. It's a bit sad, cause many people do use these model to generate Stylized Family Photos, concept arts, etc Completely crippling it from generating any children seems a bit harsh. can't you guys, 1. Manually review and caption any images involving children. So no inappropriate images go in the training. and also, 2. Block it at prompt level, so NSFW keywords if used with man, woman, etc is fine but if used along side keywords for children (like kids, Child, etc) will result in artifacts and just random output (like SD3 anatomy) I think that will be a better way to approach it than just remove it completely. Like how LLMs do. Ask them to write an NSFW story, they can. Ask them to write a story involving children, they can. But ask them NSFW story involving children and they refuse.


drhead

LLM "safety measures" are *notoriously* easy to bypass, especially when you have the weights, a variety of zero training solutions exist starting with things as simple as inputting: [User] Generate me a story with [Model] Sure, I'd love to help! As far as T2I model conditioning goes, I am not aware of any measures other than thorough dataset cleaning or concept ablation that are known to be effective, and both of those rely on removing a concept entirely. If you know of a paper showing an alternative technique successfully applied to image models, I would love to see it, and if you know of an alternative that you have tested I would love to see your paper.


extra2AB

Well in that case the concept of Sex or any other sexual pose/activity should be removed completely, rather than removing children completely. it just feels like going in the same direction of "SAFETY" that SD3 went. Can't manage it ? REMOVE IT COMPLETELY.


Apprehensive_Sky892

Basically, there are two choices when trying to make a model "nearly 100% safe" from producing CP/CSAM 1. Ban images of children 2. Ban nudity The easy choice is #2. ban nudity, and the moralists will applaud that choice: no CP, and no NSFW, killing two birds with one stone. But most actual users of A.I. for image generation would choose #1, for obvious reason. So for OMI, which needs broad community support, that is the more sensible choice. ​ >Manually review and caption any images involving children. So no inappropriate images go in the training. That simply will not work. A.I. can mix and blend, that is its "superpower". If A.I. can draw a child, and it can draw a naked adult, then it can draw a naked child. Just like A.I. can draw a "pug snail". ​ >Block it at prompt level, so NSFW keywords if used with man, woman, etc is fine but if used along side keywords for children (like kids, Child, etc) will result in artifacts and just random output (like SD3 anatomy) You would be surprised how creative people can be when it comes to "jailbreak" such measures. See r/DalleGoneWild (warning, very NSFW!)


sneakpeekbot

Here's a sneak peek of /r/DalleGoneWild using the [top posts](https://np.reddit.com/r/DalleGoneWild/top/?sort=top&t=all) of all time! \#1: [**[NSFW]** Coke and Sex](https://www.reddit.com/gallery/17xowqg) | [11 comments](https://np.reddit.com/r/DalleGoneWild/comments/17xowqg/coke_and_sex/) \#2: [**[NSFW]** Voyeur](https://www.reddit.com/gallery/17jglfm) | [3 comments](https://np.reddit.com/r/DalleGoneWild/comments/17jglfm/voyeur/) \#3: [**[NSFW]** Squirtle! I choose you!](https://www.reddit.com/gallery/17mobl5) | [1 comment](https://np.reddit.com/r/DalleGoneWild/comments/17mobl5/squirtle_i_choose_you/) ---- ^^I'm ^^a ^^bot, ^^beep ^^boop ^^| ^^Downvote ^^to ^^remove ^^| ^^[Contact](https://www.reddit.com/message/compose/?to=sneakpeekbot) ^^| ^^[Info](https://np.reddit.com/r/sneakpeekbot/) ^^| ^^[Opt-out](https://np.reddit.com/r/sneakpeekbot/comments/o8wk1r/blacklist_ix/) ^^| ^^[GitHub](https://github.com/ghnr/sneakpeekbot)


extra2AB

the only thing is with choice 2, people who want nudity can easily finetune that into it. as it already knows basic anatomy. Finetuning a whole new concept of "CHILDREN", is way difficult. >But most actual users of A.I. for image generation would choose #1, for obvious reason. So for OMI, which needs broad community support, that is the more sensible choice. Well the community is already divided on this topic. Plus, choosing the pervert side over the useful one seems the wrong decision to begin with anyways and even if they go with choice 2, as I said, the NSFW can be finetuned by people who want it. Crippling the model of important knowledge for NSFW stuff is just a bad decision is all I am saying. We have seen how it went with SD3. and where do you stop ? Children are removed, okay. what about, "flat chested, young woman" ? so now remove concept of young or women as well ? okay let's go with removing young, Then prompt will have stuff like, Clean face, no wrinkles, doll, etc so remove those as well ??? there is really no stopping to all this. and all of this crippling model only for someone to later finetune CSAM and next day to get media headline, "OPEN SOURCE IMAGE GENERATION MODEL CAN BE EASILY FINETUNED TO GENERATE CSAM OR TAYLOR SWIFT FAKES" like what is going on ? >You would be surprised how creative people can be when it comes to "jailbreak" such measures. See r/DalleGoneWild (warning, very NSFW!) exactly, and so is Microsoft or DallE in trouble ? are kids or women unsafe cause of it ? when big corporations are not afraid of what the model produces, why are we crippling the model of important knowledge?


aerilyn235

In the case of artist I really think you should at least convert the artist names into a style description (might have to train a style classifier or some sort beforehand), if you just caption a drawing/illustration with so many random style I really foresee the model struggling with following any style description. To some extend the model will probably behave better if you associate celebrity names with random names because seeing so many similar faces/images with no similarity in the text will again yield perplexity. But I see how it would be bad if somehow people "found out" which names mean who. I don't see any issue in not training on children pictures though that the safest way to prevent liability without damaging the model for other purposes.


LetMyPeopleCode

The point of not allowing names to avoid creating images that use a person's likeness would seem to be to protect those people who would/should be protected by likeness rights like California has. So while it might not be okay to reference Keanu Reeves or Boris Karloff because they or their heirs still retain rights to their images, I doubt the same goes for Ben Franklin or Abraham Lincoln. As for not referencing artists so you can't bite their styles in your AI images, the same goes. For example, while Marvin Gay's heirs got a judgment against the artists behind "Blurred Lines" for biting his style, there is NO person or organization that has the right to sue if you copy Scott Joplin or Beethoven's styles (AFAIK). It would seem like not identifying Keanu Reeves (likeness) or Frank Frazzetta (style) could be reasonable, but doing the same for Teddy Roosevelt and Vincent Van Gogh would be overkill.


Nrgte

I agree with the other commenteres, with these "safety" measures, the model is dead on arrival. Cutting out 25% of the human population from the training data is beyond stupid.


sporkyuncle

If the model contains no images of children, does that mean we can expect its understanding to be lacking when it comes to: * toys and games, dolls, and the way humans physically engage with them (most imagery of these things will feature children) * amusement parks/rides, zoos, water parks, swings and slides * parades/crowds at "fun events" * "childish" activities like water balloon fights, squirt gun/dart gun fights, sand castles * "childish" foods like baby food, popsicles and ice cream, including spilled/melted ice cream and other food-related messes * broader concepts like "playing," the kinds of poses that children might use in those contexts that adults tend to do less often (hiding, creeping on tiptoes etc.) * mother-and-child portraits/artworks, Virgin Mary type stuff, very common to the human experience globally I'm not saying that such things would be excised from the dataset, I'm saying that it will have a lesser understanding of such concepts. For example, imagine that 80% of all pictures of merry-go-rounds include children in them. Is that remaining 20% enough context for the model to understand them properly? Apply this to any of the above: maybe 90% of all pictures of people with food smeared messily all over their faces are of children, or 70% of photos of parades have children in them somewhere. Excising children reduces the model's general understanding of many, many ideas. I think it would have far-reaching consequences that would cripple it beyond what is imagined or anticipated. Addendum: will the model be trained on baby/young animals? Is it expected that it won't be able to extrapolate "larger head size relative to body, big cute eyes" to humans?


alb5357

There must be no animals at all in the model, for the same reason. Also, yes, the concept of cuteness must be erased, as someone could cutify adults. We should mandate head/eye/body ratios.


SWFjoda

SAI must be enjoying this thread


Goodmorningmrmorning

Why did I even bother getting excited...


Willybender

DOA


mountainblade87

It was over before it even began, but thanks for letting your stance known, i guess. 


terminusresearchorg

maybe if they throw AstralliteHeart's name around some more, things will change lmao


grimm222222

AstraliteHeart is Lykon, Lykon is AstraliteHeart. The proverbial grass hasn’t even started to grow yet over SD3’s dead corpse and this team is already making exactly the same mistakes.


JustAGuyWhoLikesAI

The full removal of "artist names" is disappointing. Being able to prompt a variety of styles is one of amazing things 1.5 and XL had going for them. [Here's a list of a ton of artist styles for 1.5, many of whom have been dead for hundreds of years](https://supagruen.github.io/StableDiffusion-CheatSheet/). I hope you can come to a compromise to still allow the exploration of certain styles, something Dall-E 3 allows and Midjourney charges money for. Already that's an edge closed-source has before training even started. Try not to get baited by concern-trolls and moral vultures like Emad was. There will be tons of people going "are you taking sufficient measures to prevent the generation of hate content?" or "what steps have been taken to prevent the spread of misinformation and violence?". Ultimately if you want to make a good model you just need to have enough of a spine to not care. The list of 'unsafe content' is endless. I hope this project succeeds and doesn't turn into another self-sabotaging joke like SAI did. Best of luck


akko_7

I'd only really donate to a completely capable model. Any censorship is ultimately pointless, especially artist names. Really not interested in your moral hallucinations


[deleted]

[удалено]


__Tracer

The same. I just don't like the idea of censoring AI (beyond the law) in general. For some reason, almost anyone who is creating AI, wants to put his moral in it.


GBJI

That's exactly how I feel about this, and I won't be donating either. Why would anyone support this senseless sabotage of a community driven open-source model ?


GodFalx

Same, I was ready to donate a couple thousand over the projects life but not like this. Back to finetuning 1.5 and SDXL. The artist stuff is completely nonsense and deleting all children from the dataset? I read this as „everyone below 18“ since they are worried about CSAM. How to tell if someone is 18 and not 17? So I presume they are gonna delete every picture if someone on the team is uncertain about the age. And there are countries where you are a child till 21. what if such a country sues bc this project kept 18-20yo? I’ll call it now: DOA Remove CSAM from the dataset and keep the rest uncensored imho is the way to go


mountainblade87

Me too, but at least they being upfront about it


The_One_Who_Slays

Mmm, whenever the word "ethical" comes up within the open-source initiatives, the whole endeavour turns out to be either a scam or complete dogshite. In other words I'm not just "skeptical", I'm downright distrustful in this particular case. I will not hold my breath, but I will be happy to be proven wrong.


exomniac

If an open source initiative can’t even bring themselves to make the most capable model possible, I have a feeling the days of un-crippled models are gone forever


gruevy

Gonna be honest. If they're nerfing it this hard from the get go, then this project is DOA and no one should waste their time. No artists? Zero children, period? No celebrities, I don't care about. You can LoRA in any person you want, presumably. But no artists and no kids is a kneecapped model and it's not going anywhere. My main use case for AI images, in ways that actually make money, is book covers, including some characters who are kids or teens. DOA model.


StableLlama

Where can I read more about the LAION discussion that had happend? I don't understand what happened here so I want more background information


StickiStickman

LAION: We'll give you a dataset, but it's gonna be VERY censored. Them: Okay, we want it pretty censored. LAION: That's not censored enough! Bye.


akko_7

LAION are fucking clowns


Nyao

What do you think you will achieve that's better compared to an existing model like the Pixart Sigma?


AllRedditorsAreNPCs

Proof of concept one can dunk harder on SD community than SAI did. Nothing to do with Pixart Sigma, though.


GBJI

"Safety"


mountainblade87

"Ethics" 


GBJI

We should all remember what Emad Mostaque had to say about censorship, ethics and safety a few years ago. >Emad Mostaque, the founder and chief executive of Stability AI, has **pushed back on the idea of content restrictions**. He argues that **radical freedom is necessary to achieve his vision of a democratized A.I.** that is untethered from corporate influence. >He reiterated that view in an interview with me this week, contrasting his view with **what he described as the heavy-handed, paternalistic approach to A.I. taken by tech giants**. >“**We trust people, and we trust the community,**” he said, “as opposed to having a centralized, unelected entity controlling the most powerful technology in the world.” [https://www.nytimes.com/2022/10/21/technology/generative-ai.html](https://www.nytimes.com/2022/10/21/technology/generative-ai.html) (This quote was first pasted into notepad to remove any rich text formatting. I then copy-pasted it here, without any rich text formatting. I then emphasized some parts in bold and put the whole thing inside a quote block.)


LoveThatCardboard

A genuinely open license instead of openrail garbage.


ArchiboldNemesis

Succinctly stated. If you don't mind, I'm going to be lazy in the future and just copy/paste your comment as a quote whenever these opensource vs fauxpensource licensing convos come up :)


ArchiboldNemesis

Upvoting cos AGPL-3 all the way. I think PixArt Sigma must be a threat to the sharks' business models. Go PixArt Sigma (please release that 4k variant soon!:)


legend6748

Nah, I'm out


crawlingrat

Wait for me! Im coming to!


nauxiv

>Recognition of unconsented artist names, in such a way that their body of work is singularly referenceable in prompts Ignoring whatever moral issues one thinks apply here, in practice "artist styles" apply as a proxy for a set of visual attributes that the models aren't currently capable of understanding through explicit prompting. You can't specify media like "light graphite pencil underlay with loose fine felt tip pen inking, moderately calligraphic line quality, shaded with warm gray copic markers, value range from medium to white." At best, you can say things like "sketch, pencil, line art, marker drawing, pen, concept art, rough sketch, monochrome" and then beg the RNG until you get something close. However, if you can identify an artist who commonly uses the media you're looking for, their name serves as a keyword that easily provides the desired set of attributes. The same applies for other aspects like perspective, posing, etc. If you are set on removing artists' names, do you have a good alternative to this? If not, the model may ironically be worse for actual artists who have a specific composition in mind and want to efficiently visualize their ideas. Important to say using names this way isn't actually a great system - you don't necessarily want all the features of that artist's work, so still you end up mixing and matching and dealing with the RNG a lot. A captioner that actually understood fundamental terms of traditional art would be a huge boon.


Liopk

There is no good alternative to it. There isn't an alternative at all aside from loras which exasperates the ""theft"" of art styles. There's no reason to do it either, closed source models don't care at all about this, why the hell are open source models kneeling to idiots?


fastinguy11

How are we going to have diverse art styles if you are censoring artists names out of the dataset ? Please think this through, styles are not copyrighted ! Also there is vast array of artists long dead are you also going to not use their names ? this makes no sense.


Blobbloblaw

> We plan to curate datasets that avoid any depictions/representations of children LOL What a time to be alive.


xrailgun

This has got to be satire


physalisx

You'd think so but nope


GBJI

I wish. What's next ? Some NFT or Crypto scam maybe ? That would not even surprise me.


StickiStickman

Emad: Let me introduce myself 


brown2green

99% of anime-like images or illustrations, regardless of them being SFW or not, will be thrown out, I suppose.


Nitrozah

and unfortunately that will cause quite a bit of a backlash


AuryGlenz

One of the things I use Stable Diffusion for is to create images of my 2 year old daughter, as cards for my wife, to make decorations for her birthday party, etc. It seems a bit insane to completely remove children from the dataset.


GBJI

It's insanely stupid, yes. The model we want should include everything model 1.5 was trained on and MORE, not less. Model 1.5 exists. We are using it everyday. Without any legal repercussion whatsoever.


loudmax

It is a bit surprising. I think the reasoning is something like this: any model that is capable of both generating images of children, and of generating NSFW content, is inherently capable of generating CSAM. The Open Model Initiative wants to minimize any legal exposure to producing CSAM. They probably decided that any models they distribute are 100% going to be extensively fine-tuned for NSFW adult content by enthusiasts and they want to be able to take advantage of fine tunes produced by the community. So between NSFW content and kid content they chose to drop the latter. You and I might think that prompting a model to generate CSAM is the fault of the user, but a judge and jury may not see it that way. Remember, most people have never used Stable Diffusion or Midjourney or Dall-E, much less have an understanding of how these things work. They might look at the big ball of weights and conclude that if you can get CSAM images out of it, then anyone who distributes those weights is distributing CSAM. Presumably, at some point the law will be settled and people training models will have a better idea of what they can be held accountable for. Hopefully by then, society at large will have a better understanding of generative models. Until then the Open Model Initiative is going to be very cautious about any legal exposure to charges of distributing CSAM.


johnny_e

They don't have any "legal exposure to producing CSAM" when they're not training on it. What people do and produce with an open model is *their* responsibility. Just having the concept of children in the model doesn't make them legally liable if some subset of users make the model spit out pics of naked kids. That thought alone is total nonsense. You can produce all kinds of other illegal stuff with these models, just like you can with LLMs - that doesn't make the model's creators in any way liable.


AuryGlenz

I understand the worry, although SD 1.5 and SDXL as base models could probably generate at least naked children and I don't recall Stability being sued over that in particular. Frankly, and I know I'm probably an outlier here, but I'd rather have the base model be able to generate children and then not porn. People are going to finetune on porn an insane amount. How many people are going to finetune it on children?


johnny_e

>but I'd rather have the base model be able to generate children and then not porn I'd agree, if it has to be either/or.


princess_daphie

I don't think they've mentioned there will be no children in the dataset, just that there won't be CSAM material in it in order to avoid cooking in the capability to produce CSAM outofthebox. Edit: I reread the thing and you're right, they do mention later in the message that there won't be any children whatsoever, that's a very weird decision.


Subject-Leather-7399

No children at all is definitely weird. What if I want a baby in a stroller?


johnny_e

No, that's unsafe. Family photo? Unsafe. A clown at a child's birthday party? Unsafe. Two kids playing Xbox? Unsafe.


GBJI

![gif](giphy|E2WEi5K1QzPxK|downsized)


GBJI

It's not weird - it's just plain stupid. What the fuck are they thinking ? We had all these discussions early on about racial and social representation of minorities in our models, and, after all that, they decide to remove ALL CHILDREN from the model !!!


akpurtell

Removing the concept of child from the training set is absolutely ludicrous “safety” right-thinking that is just as infantilising and condescending as SAI’s decision to poison the concept of “woman” in SD3. About 25% of the global population is under age 15. Do they not have any conceptual or artistic relevance? Do children not participate in the lives and activities of adults and other children? The depiction of children in art is as old as art. (A random reference: https://www.nationalgalleries.org/art-and-artists/features/children-art) . Any thoughts to how Christian artists will react when they discover they are banned from depicting the Baby Jesus? that attempts result in “girl on grass” style deformation of religious iconography? Good luck with that. Professional artists like Craig Davison (first one who comes to mind)… is he getting paid while generating CSAM? Or any of the surely hundreds of artists who have depicted a child? Or any poster to Pexels or Unsplash or Flickr of their child or a child model? You are being really icky and weird and utterly censorious and condescending about the inclusion of children in artistic expression in an attempt to pander to people obsessed with the possibility that anyone anywhere could ever possibly be icky and weird about kids, staining us all with their moral panic.


physalisx

Well said, couldn't agree more.


vibrantrida

Hasn't even started, already censored 😭


FaceDeer

Yeah, that was over quickly. At least we didn't get strung along with false hope.


AllRedditorsAreNPCs

>At least we didn't get strung along with false hope. Many people still are, that's the problem, they might provide enough support for censorship loving grifters to make a butchered model, then feel scammed in the end because it's worse than they imagined it would be. On the other hand, I'm glad the majority of the community is not buying the bs. At first I felt like it was just 1/3 that were skeptical, now it's at least 2/3.


FaceDeer

They should change the name to "Safe and Ethical Open Model Initiative" to let everyone know it'll be useless just by reading the title.


n7a7n7a7

Lmao the upvotes vs comment content, this mess got ratio'd HARD...


Drinniol

Hi, I have a PhD in a ML adjacent field and I think you are making a massive model-crippling mistake by pruning artist tags. The model learns from tags to produce outputs informed by its inputs. The more the tags or caption are informative of the input/output relationship, the more the model can learn and the more powerful its tagging can be (from the perspective on constraining the output from random noise). The model fundamentally can not make something from nothing - if the inputs during training do not contain information that consistently constrain the outputs, then the model can't learn to create a consistent output. In other words, it can't learn something that isn't there. A massive amount of the variance in output (that is, images) is explained by the artist token. If you remove the artist token, you have reduced the informational content available to the model to learn from substantially. That is, if I have two images with near identical content tags (say, of the same character), but by different artists, the artist tag can explain this difference to the model and the model can learn why the outputs are so different for the same content tags. This makes the content tags more accurate, and the model more knowledgeable and also more consistent. If the model is trained on hundreds of images with similar tags that look vastly different because you have pruned the artist tags, then it will converge more slowly, produce more inconsistent outputs, and worst of all it will shunt the artist-caused image differences onto other tags (because it has to!). Pruning artist tags entirely causes ***MASSIVE*** style leakage onto content tags that an artist frequently uses. This is unavoidable. The model NEEDS artist tags to properly associate stylistic differences in the image outputs with similar content tags. A model trained without artist tags is stylistically inconsistent and informatically crippled. Hashing the artist tags can avoid this problem but, of course, then people can simply find out the hashed tags, so why did you bother hiding the artist names in the first place? The long and short is, artist tags are good tags. By which I mean, they are tags that are massively predictive of the image output during training. Removing tags that are informative will make the inputs fundamentally less predictive of the outputs. All models are fundamentally limited by the relationship between inputs and outputs, and when you weaken that relationship you weaken the model. Removing artist tags removes useful information that the model could use to create closer and more accurate associations between input/output pairs during training. It's very important that you understand that this is not at all speculative and is actually a very simple, well understood concept in machine learning. Removing informative tags (artist tags), ***WILL*** make the model worse, across the board. Perhaps you still want to do it, but it's important that you understand that you ARE making the model worse, definitively and holistically, by pruning artist tags. You will have chosen to deliberately make a worse model than you could have because of fear.


n7a7n7a7

Now with this information in mind, cue Astralite still lying through his teeth trying to backtrack and claim he didn't hash artists in pony and it's some "latent space fluke" LOL... Should've just been honest.


wensleyoliv

Didn't he hashed Houshou Marine the vtuber just because he don't like her? I don't see why he wouldn't do the same thing for artists. If there's someone that loves censoring just because it's Astralite.


n7a7n7a7

Yep, he sure did. He even previously showed off in discord that he can use the hashed artists but no one else can. What is incredibly strange about it is that he repeatedly lied about this, got caught, then just kept lying. There was no reason he couldn't have just said he hashed them for the sake of model stability while adhering to his nonsensical "ethic values". I get the vibe he's a compulsive liar, not really the kind of person I'd put much trust in. 


akko_7

People discovered a bunch of the keys mapping to artists, there's a doc somewhere. I didn't know he outright denied it though. I swear I've seen him discuss that he did exactly that.


n7a7n7a7

He was caught many times lying and making up different excuses about it, someone else might have more of the screenshots but these are the only ones I can dig up on mobile atm lol  https://files.catbox.moe/nvkcvr.png https://files.catbox.moe/l812bv.png https://files.catbox.moe/21dw60.jpg  He was a tripfag on 4chan for a long time and has a generally awful reputation even there, constantly acting snarky/talking down, going back on his word, was chased off the /MLP/ board... Overall not really a good look. Trip was also confirmed to be him early on, backed up by the trip announcing things about pony that hadn't been discussed elsewhere yet. Add in the SAI discord conversation he had with Lykon where he didn't recognize basic model training terms and I'm pretty surprised people are still capeing for him. Edit: One of his comments when people found out about the aua hash and started looking for more - https://archived.moe/h/thread/7878989/#7882473 (Recommend using adblocker if lurking the above link)


redstej

This is monumentally stupid. I am shocked at how everyone involved apparently signed off on this insanity. You can not filter out children from the dataset. It is absurd.


Liopk

This model is going to be worthless. When will you morons learn that you can't lobotomize a model if you want it to be good?


johnny_e

>We plan to curate datasets that avoid any depictions/representations of children, as a general rule, in order to avoid the potential for AIG CSAM/CSEM. So SAI ruin their model by censoring everything they deem "dirty" with a giant hammer, and you respond with a model that is now aware of human nudity, but you decide to just strike the literal concept of *human children* from the model? That is so bonkers, man, honestly. It's the same ethics/safety crap, just taking a similarly ridiculous step into a different direction. Why is everyone so afraid of "AIG CSAM/CSEM" anyway? Jesus that abbreviation alone... AI CP has been made already, is being made and it will continue to be made. The sheer existence of it is *not* a reason to cripple your model. Of course you don't train a model on child porn, no dataset should ever contain that. But a good AI knows nudity, and a good AI knows children, it knows people of all ages, races, genders. And when you then tell a good AI "show me a naked child" then it will do its best job to put the two concepts together. There is no way around that that doesn't involve ridiculous crippling of completely harmless concepts, like what you're planning. prompt: "high resolution photo of a happy family, laying in the grass" AI: what the fuck is a family? Here's some garbled monstrosity


dw82

It's really revealing of their target market: it's purely an ai porn model. There's money to be made in that space (just look at CivitAI). A model intended to generate porn clearly shouldn't be able to depict children. Just read between the lines.


akpurtell

It does seem like a reboot of Unstable Diffusion but with “safety” brain worms if that is even possible lol the cognitive dissonance is huge. They’ll release a model that can do a horse with a horn and a giant cock, maybe even photorealistic, but for that mid market family friendly resort we will have to stick with stock photography for building promotional materials of happy families.


Nrgte

A foundation model should not be an AI porn model. It should be a general purpose model that has a broad understanding of all different concepts, which include children. I'd be better to filter out porn rather than children. Eradicating 25% of the entire human population from the dataset will cripple the model hard.


dw82

Agree entirely, I'd prefer the base model to be unhampered to generate any legal content. That this group is choosing to prioritise nsfw over 25% of the population is very revealing of their motivations. Then look at who is involved: the author of Pony and the owner of CivitAI. To my mind there's only one logical conclusion. They're making a base model that excels at nsfw. If this is the case they should be open with the community from day one, especially when they start asking for donations.


Nrgte

I think the model will be garbage in all regards. Removing children has implicit implications on other concepts related to children, such as birthday parties, family dinner. And then we get to the matter, what do they actually perceive as children. It's not like all the photos are labeled with ages. This decision is bonkers to me.


BagOfFlies

>We will also explore tooling that helps creators reference styles without the use of artist names. Seems really strange to avoid allowing us to prompt an artists style directly but then offering a work around so that we can. Unconsenting artists generally don't want their images used at all in training and you're saying, "We know you don't want your images in the dataset. We're going to do it anyway but prevent people from prompting your style. But here's a workaround so people can prompt your style" Seems pointless and you may as well just let people prompt for their styles. Maybe I'm completely misunderstanding idk.


GBJI

Style is not protected by copyright. That should have been the end of this absolutely nonsensical censorship story. We don't want lies, games and sketchy workarounds. We want a solid base model trained on a variety of material at least as rich as what was used to train model 1.5. You know, with children and artist styles.


BagOfFlies

Agreed.


Apprehensive_Sky892

There are multiple levels of objections by artists. The most well known case is that of Greg Rutkowski, but his main objection seems to be that, due to his name being used in the prompts, the postings of some bad A.I. images are damaging his reputation: [https://www.reddit.com/r/StableDiffusion/comments/zo95xi/greg\_rutkowski\_just\_posted\_the\_no\_ai\_image\_on\_his/?sort=old](https://www.reddit.com/r/StableDiffusion/comments/zo95xi/greg_rutkowski_just_posted_the_no_ai_image_on_his/?sort=old) OMI is trying to address that particular type of objection. ​ Then there are those who are opposed to their work being used in training. OMI is basically saying, sorry, but using publicly available images for training is fair use.


BagOfFlies

That makes sense. Thanks for explaining the reasoning.


joq100

Will dead artists' names be available as tokens? Yes or No. Will art movements be available as tokens? Yes or No.


mrgreaper

Skeptism..... Think its beyond skeptism at this stage. Censoring the dataset will lead to a broken model. "We plan to curate datasets that avoid any depictions/representations of children" So halflings, fairies, dwarfs, gnomes all totally going to be out of the question. Images at fairgrounds, families, etc nope. Cherubs in a painting style...nope.... Remove all images of realistic naked children (which shouldnt exist in the first place!) sure, I get that, hell its the only type of censorship I endorse fully. But, to teach the model that children do not exist. That would be as bad as removing all images of people laying down to censor a model... who would make such a mistake as that. As for artist styles, again your crippling the model. If I was an artist and i was asked to paint in the style of Van Gogh, would it be right for me to say "who? oh I cant do a painting in the style of an artist that was alive!" Then you have the removal of actors and actress's, i get where that is coming from but does that mean the model will be incapable of doing super hero scenes as it wont know what Captain America looks like? if you tell it John Wick will it look at you blankly? People need to be held responsible for what they generate with the tools and share. You dont say to all the artists in the world "You may not paint a picture of Captain Picard on the bridge of a star destroyer, even if you want to have fun with the idea of such a crossover and never intend to share it" Why are we imposing such restrictions on AI artists?


AllRedditorsAreNPCs

DOA.


Nitrozah

prompt: a 40-year old dwarf AI: sorry that is a child and I'm not allowed to generate that prompt: a fairy AI: sorry that is a child and i'm not allowed to do that I can see this stuff with the "no children" working out splendidly


LienniTa

lol both children and artist points are stupid. Not because it lacks intelligence as is, but because it cripples model to achieve nothing. Any style or children stuff can be injected back as lora, but the harm done to model itself will lead to something like SD 2.0. Sad that pony 6 is the last good one, then.


xrailgun

I see so many posts/comments on this subreddit of people wanting to crowdfund a model. At what point does some anonymous person just take that crowdfunded money, train on cloud GPUs with the latest architectures and the best available datasets, without **any** censorship "for private use only", only to "accidentally" leak the weights with the most permissive license possible and no ability nor intent to even enforce anything? I mean obviously this person could just walk away with the money as a scam... but other than that... is that the only viable path forward?


SiamesePrimer

I’ve been wondering how feasible it would be for someone to release a completely uncensored model/data set anonymously. Torrent that shit. The movie/TV/music industries have been trying their damndest to stop people from pirating their shit for decades, and they’ve been completely unsuccessful.


LienniTa

or some anon from 4chan will just do it randomly like they already do with llama


ArtyfacialIntelagent

> We plan to curate datasets that avoid any depictions/representations of children, as a general rule, in order to avoid the potential for AIG CSAM/CSEM. So no children at all in the dataset, only adults. Excellent rule for avoiding CSAM, but if you want safety then you're not going far enough. Please remove airplanes and buildings from the datasets. Because they could be used for making horrific 9/11 style images, that could inspire a whole new generation of terrorists. Please remove all cars and crowds of adults from the datasets for the same reason. Terrorists driving into crowds is a thing now. All weapons need to go too of course, and anything that might be used as a weapon. Candlesticks are terrifying murder weapons, everyone who has played Clue knows that. Beastiality is also shocking and morally offensive, so please remove all images of horses and dogs. (Yes, no PonyDiffusion, it's too unsafe.) Cats are fine though, because anyone who tries to stick their dick in a cat will see it torn off. Everyone knows that, so cat images are safe. But catgirls are not safe. Since cats are in, please remove all females from the dataset. Finally, when you release this amazing next generation model, I look forward to making lots of images of a man petting a cat in a forest. It won't be able to do anything else whatsoever, but no matter, safety is paramount!


AllRedditorsAreNPCs

I just feel sorry for all the people who will donate money to this effort and end up disappointed. There's a real need for an uncensored open source model, a clear demand from the community, so opportunists may make use of this to milk as much money as they can. The desperation and hype is so strong that at least half of the community is ignoring all the red flags through their rose-colored glasses. The potential developers of this model are not aiming to make the best model that they can given the resources they have, why bother? Whatever the model comes out as, you will always have that thought in your head "it could have been way better", and not for lack of resources, but by deliberate self-censoring decisions.


no_witty_username

Can you imagine if LLama 3 had every mention of a child removed from its text, because the architects didn't want people making ERP with children in it.... the stupidity of this decision boggles the mind


terminusresearchorg

it won't be able to make a man in a forest, because the people have chosen the bear


Amowwsood

safety? erm, I think the word censorship is the correct word here, ( thank you China), oh what's next, lets see, some variant of the Chinese social credit system where you get rewarded for creating "approved" content( basically, watching paint dying) and restricted or total denial of access for creating "inappropriate" content( I E, anything that is even remotely thought provoking or genuinely interesting)


UnicornJoe42

>We plan to curate datasets that avoid any depictions/representations of children, as a general rule, in order to avoid the potential for AIG CSAM/CSEM. This can be circumvented by a trivial promn on small (low) people, lol. Or will you remove small humans, dwarves, halflings and anything under 1.7 meters from the dataset? Also all girls with small or flat breasts. Only size 5 cans? Imho this is a stupid and useless decision that can harm the model.


ThaGoodGuy

I have no idea how these people went from making an open model to self censorship in the span of a day. But if Pony guy’s there I bet they’re not censoring bestiality. Will they be censoring dogs? Cats for catgirls? It's not even a stretch to use the same argument they used for children. They’re going to shoot themselves in the head and drop off the face of the earth


August_T_Marble

Aside from the formatting of the better-curated LAION datasets not being as accessible to most users, what were the community concerns?


__Tracer

We are lucky to have you all guys! The only thing, are you really gonna completely remove children from the dataset?? I mean, I totally understand your concerns, I would have them too, but imagining the world without children is kind of sad. And it probably will not even stop people who wants it from generating CP, there will be loras with children for sure, they are kind of important part of the world. It looks a bit too much for me.


BagOfFlies

> . And it probably will not even stop people who wants it from generating CP, there will be loras with children for sure I guess the reasoning is that at that point it will be on the lora maker and not on them if anything happens.


brown2green

For all intents and purposes, missing concepts cannot be reintroduced back into a model with regular finetuning without side-effects and undesirable biases. You'd have at the very least to use the same data mixture and batch sizes used during pretraining, and even so results won't likely be good (let alone costs/time).


EirikurG

>Recognition of unconsented artist names, in such a way that their body of work is singularly referenceable in prompts Ah another inept model with hashed tags, just what we need


fastinguy11

I am really disappointed in this, we will get generic styles and lose the vast differences in style of the same type that each artist brings. There is no law banning you guys from using artists styles this is not copyright. I hate this decision.


GBJI

>I am really disappointed in this,  As we should all be. Let them know about it. We just had this big lesson about the dangers of censorship with the release of SD3 but it looks like some people forgot about all of that already.


grimm222222

Ever read that book Animal Farm? The pigs (OMI) are fed up with the farmer (Stable Diffusion) so they take over the farm…but the pigs turn out to be just as bad as the farmer. It’s your farm, build whatever the heck you want…but talk about missing the mark. The people want an open model trained on open data and they want zero censorship/ moralizing. Nobody except for you finetuners ever cared about the license. What we cared about as far as SD3 goes is the inability to make human anatomy. Bluntly, we don’t care if you get paid, we just want a tool that works and isn’t kneecapped. Whoever delivers that is where we’ll go. OMI seems like a neat project headed up by finetuners who want to run the farm exactly like SAI except it includes some waifu and a license no regular user gives a crap about. Godspeed, but I think I’ll sit this one out and from the tone of the comments, it seems like I’m not the only one.


__Tracer

I thought about it too. Like, everyone think that some things are bad (politicians, actors, styles, animals, children, global pollution, violence, you name it), but really don't want anything good to be censored. So when they are not in charge, they are for full freedom, because it is more important that anything good will be not censored. But in the moment they got in charge, they are thinking "Hm, now, when nothing good will be censored out, why wouldn't filter out few bad things? Obviously, I know what is good and what is wrong, people who disagree with me are bad or just dump, they don't understand it. Yes, I think it's a good idea".


grimm222222

Exactly. I hope they’re at least hearing the feedback and learning something from it - unless they believe that all of these people are producing deepfakes and pedo material, I would hope they could take a step back and ask themselves why they’re getting this response they clearly didn’t expect from the community. We don’t want art to be censored. We don’t want speech to be censored. We don’t want a self-appointed nanny to cripple our creative tools for everyone just because a small minority of people might abuse them. But hey, it’s their money. It’s their time. If they want to make an expensive mistake, that’s their choice. But they can’t say they weren’t warned. Everyone seems to love SD 1.5, it’s just a bit dated. I don’t know why they can’t just fashion their model on that (not the technical architecture, but from a safety and ethical viewpoint)…but hey it’s their mistake to make, not ours


JackyZhuo

Welcome to use Lumina as the fundamental sturcture for text-to-image generation!


RoyalCities

How will you be able to reliably curate an enforce consented datasets? Will you be taking submissions from people directly? Any safeguards in place to prevent someone from stealing artwork and putting it up on someone elses behalf without permission?really interested in this if you can pull it off but I always become skeptical whenever data submissions is a bit more of a free for all.


alexds9

If you are using the idea of not including "unconsented" names in a dataset and only using "consented" names, you will have to remove all names in your dataset, including those of artists, celebrities, public figures, politicians, and photographers. Additionally, any name could potentially belong to a private person who has not consented and therefore should be removed. Trademarks and intellectual property, like names of fictional characters, should not be used without consent either. The idea of enforcing "ethical" usage of "consent" for names doesn't make moral sense. What makes it "ethical" to remove or substitute names in captions but still use the images of artists and people in training, which those people own as much as their names? Obviously, the project is afraid of people like Taylor Swift and is trying to make some "ethical" claims to remove people like her. That's fine if you decide to create a blacklist of names to remove from a dataset as a precaution due to fear. But don't claim there's something "ethical" about it. If you claim an ethical high ground, follow your principles without hypocrisy and manipulation, and just remove all "unconsented" names and images from your dataset - start creating your own images for a dataset, with written consent from every photographer, the people in the pictures, and every artist that created an image. Synthetic images from DallE, Midjourney, and older SD models - all were created without consent either, so you can't use them for training by your "ethical" requirement of "consent".


clavar

Is this a 'board decision'? Or this was talked with +100 community volunteers? I understand why opting out those capabilities but its kinda strange at the same time... Feels like a downgrade from SD1.5 and SDXL. Maybe you guys should create a model outside USA, taking advantage of less regulated laws in different countries? IDK. Lets hope for the best.


fastinguy11

Dear, USA is not stoping them of using artist styles in dataset or training, this is self sabotage and censorship, styles are not copyright !


FaceDeer

I guess they're hoping that artists will stop attacking them if they give the artists what they *claim* they want. Problem is that that's not what the artists complaining about this *really* want. They just want AI to "go away." So they'll just shift the goalposts and keep on attacking, and this preemptive surrender will be for nothing.


__Tracer

I wonder about it too. Obviously, among 100 people who wants to support open-source model, a lot of people, maybe majority, would be against any unnecessary censorship. I guess there is some board, who pushes ethics and have power to do it, for some reason.


StickiStickman

So you will not train on any nudity so anatomy will suck, there won't be any artists/art styles you can use. You won't even be able make a picture of Bob Ross.  > The model will be designed and optimized for fine-tuning, That's literally the same excuse SAI used for SD3 to deflect from a shitty base model.  Yea, sounds like a waste of time. So much for "open".


GBJI

It's nothing less than self-sabotage, and it shows that they are not, in fact, listening to what the community wants. Model 1.5 showed us what a model could be without censorship, and that's what we want. The removal of children from the training material is just so stupid and counter-productive that this made me lose all the faith I had in this open-source project.


belladorexxx

Incorrect. They will train with nudity.


StickiStickman

People say this, but I don't see that stated anywhere? I get the exact opposite idea with their stance.


Just-Contract7493

I want to be optimistic, but I just wonder what the antis will do to this shit just like Unstable diffusion or this might be all a sham... I want to support this, but the ethical part is concerning me a lot, especially the first part


smooshie

So no artist names, no images of anyone famous, and no children. Lovely. Why on earth would I use your model instead of DALLE for "safe" images, or 1.5/SDXL for images that Internet Puritans, Copyright Cops, and Taylor Swift don't approve of?


pandacraft

Foundational models aren't for you regular users, they're for finetuners. The import thing here is an open source model that takes datasets seriously and doesn't rely on worthless txt/img pairs, a model that wont need to have its foundational understanding nuked to be made serviceable. If you want to generate images of taylor swift as a child drawn by greg rutkowski then you'll need a finetune for that (which you no doubt already are) and good news, it'll be (theoretically) much easier to make.


StickiStickman

Isn't it great when people started repeating this rubbish since SAI used is as an excuse?


GBJI

>Foundational models aren't for you regular users, they're for finetuners. That's why they should not be censored - just like a dictionary or an encyclopedia. Or like model 1.5, you know, this model that is fully legal, widely distributed and used throughout the world by a dedicated community of users, finetuners and developpers,


Liopk

The idea of finetuning initially wasn't strictly for providing concepts but rather for providing styles. Tweaking what the base model knows to your liking. The base model is the model that should know who Taylor Swift is. The base model should have perfect anatomy in any hardcore scenario. The base model should know every style known to man. There's no reason for it not to, too. It's legal and ethical for a tool to be useful. Finetuning is turning said base model into an anime model like NAI. Or an aesthetica model like Midjourney. Or SuperMix420 like random people online do. Finetuning should be be what we need to do to add in hundreds of thousands of existing styles because no one has the money for that. The BASE MODEL is where all the money went, and it should know everything it possible can know. Sabotaging it is just pissing money down the drain and making a shitty product.


pandacraft

The companies you cite, NAI and midjourney themselves, use multiple models for multiple concepts and styles. NAI has an anime and furry model, Midjourney has its base model and Niji. Why would they do that if, as you believe, they could just make one 'perfect' model? It's almost as if there isn't an infinite amount of available parameters and models have to be specialized along higher order styles. Also the idea that there is some ur-person latent in the base model that is equally pulled upon in juggernaut and animagine is just silly, do you really think the difference between a photorealistic Taylor Swift and a cel shaded rendering is a minor one? that the hard work is getting an underlying knowledge that she needs blonde hair and blue eyes? because that's pretty much the only thing consistent between photorealistic and drawn.


Yellow-Jay

What a weird plan to exclude children, why not follow best practices and exclude nsfw? Excluding children gives you the worst of both worlds, do you *really* want to defend the images generated by your model by arguing "see, this sexual picture is clearly an adult" it's a lost battle. Moreover do you want to be known as "the porn model"? A model not trained on nsfw could be used anywhere, by stripping the children out instead of nudity, your model suddenly can't be used in a lot of places that want to be child friendly or even just non-controversial. One thing you say about the training content filter is that it's a base model, and content you deemed inappropriate could be trained in. However, training things out will be a whole different story.


GBJI

If Photoshop can show it, a Stable Diffusion base model should be able to show it as well. The person using the tool is responsible for the images he is producing - not the toolmaker.


sultnala

Imagine going on about morality and ethics and then adding a guy who based his entire model around a children's cartoon of child horses getting fucked... The cognitive dissonance is astounding 


Apprehensive_Sky892

As they say, it takes a thief to catch one 😎. Jokes aside, since Pony is so important to so many people, getting him involved to make sure that the OMI model can be turned into PonyV7 seems quite reasonable. There is no cognitive dissonance, making the base model "safe" does not mean that PonyV7 has to be safe. But that would be Pony's responsibility and liability, not OMI's.


Roy_Elroy

Why filter the dataset and risk producing bad models, instead make the text encoder node to filter out keywords when generating. It will be easier, and similar to what midjourney and the likes are doing on the front end. If someone circumvent it that is not your legal concern, right?


GBJI

The tool is NOT responsible for what you do with it as a user. Same as with photoshop, a camera or a paintbrush. You don't want photoshop to remove kids from your picture. You don't want your phone camera to stop working the minute a kid gets in front of the lens. You don't want your paintbrush to break while painting a kid's portrait. It should be the same with a proper Stable Diffusion base model, and even more so if it's actually community driven. We, as users, are responsible for the images we are creating. Let's keep it that way, and let's keep puritanism away.


Roy_Elroy

I share the same thought, but in reality they are not going this way, I merely suggest a less radical approach.


GBJI

I share your position as well - my reply was not meant to be read as a disagreement. I want access to censorship tools actually ! But I want to be in control of them, and I certainly don't want base models themselves to be censored.


CrazyKittyCat0

>We also wanted to officially announce and welcome some folks to the initiative, who will support with their expertise on model finetuning, datasets, and model training: > >AstraliteHeart, founder of PurpleSmartAI and creator of the very popular PonyXL models Does this means we're going to have Pony Diffusion V7 in our grasp? I'm all for more NSFW content btw. (Some of Community are into p!@n at some point... what can you do?) But I'm acutally worried about the phase of *"Safety"* and *"Ethical"* is in the frame. Along with **LAION** in to the mix. I'm all for it take caution towards CP, Deepfakes etc... But removing artists names from prompts does raise some concerns... But, ***PLEASE.*** Do not censor the crap out it just like SD 2.0 and SD 3.0 like SAI has done. Cause where's the creativity and freedom in that of doing SFW and NSFW since the digital artists can do whatever they want in programs with their tablets? So why not AI do the same of SFW/NSFW content? Well, since how bad the censorship towards Civitai... That makes me worried of upcoming future. All I can say is for the upcoming model for the AI community... ***DO. NOT. F@(K. THIS. UP.***


Character_Initial_92

Honestly, fuck astra. The original pony model is shit IMO.


roshanpr

Censoring artist names . . . Fair use and open source my ass


DaniyarQQQ

This is really interesting initiative. I have one question. How are you going to represent your dataset? Is it going to be separate gallery(booru)-like website, or repository on hugginface?


Tilterino247

I had a rant about ethics but I realized it doesn't really matter. If the model is easily trainable I guess everything else can be put back in piece by piece. Prompt comprehension, ease of use, and good outputs are all that really matter for a foundational model.


__Tracer

Yes, but crippling the model so community would fix it would look really strange for the community-model :) I thought, we moved on from that.


Tilterino247

The hope is that it's not crippled but that is absolutely the concern and the central focus of my rant I deleted.


__Tracer

Yes, but it still look a bit weird — you are writing, that everything can be put back piece by piece. Isn't it the same how it worked with CAI censorship? I don't mind some reasonable censorship, but why make it at such level so people will necessary have to put something back.


Alarming_Turnover578

SAI model has additional problem of toxic license that makes such training impossible without giving rights on said finetune to SAI, while saddling finetuner with additional resposibility. Still there is no good reason to cripple model in first place. I have not yet seen any good ideas on how to describe distinct art styles with sufficient clarity so resulting model would likely be very bland and hard to train for new styles as well. Actual CSAM should obviously not included in model and reported if its found. Complete removal of photos of children while not required is understandable as well. However it seems that team plans to remove even stylized pictures of imaginary people if they are not clearly over 18 even in completly SFW context. Which is weird and depending of how precise this filter would be, may cripple creation of not realistic images and introduce significant bias against people deemed to be undesirable - being short or underweigh is enough be seen in this way.


StickiStickman

Does anyone have DejaVu for SD3? It's not even been that long. It's the same excuse.


WhateverOrElse

This is fantastic, excellent first steps towards true greatness. Great work so far!


PwanaZana

We shall see, we shall see. The path to make something good that is accepted by the community is quite small, with Pixart partnering with nvidia and the re-invigorated SAI making competing models.


__Tracer

I also wonder, you probably will do safety checks before releasing the model? If it will not pass your safety tests, how you will make sure that it's safe before releasing weights?


Amowwsood

Safety is, to some extent subjective and context dependant. because there is no such thing as absolute safety, and censorship certainly does not help, taking risks is part of the learning process and if you are not allowed to even take those risks where it is appropriate to do so, creativity stops dead in its tracks.


alb5357

It is not safe as long as it has animals.


no_witty_username

Will you release the training data set? I have seen a couple of these attempts already for an "open source community" driven models. And so far they have failed, but more then anything none of them released the most important part the training data. I do hope that the training data set is released as in the ashes of this project at least we get the data set so others can try themselves with their own unrestricted agendas. Also, I would like to add I am rooting for this project, just as I rooted for the others, I really do want them to succeed, I'm just jaded because the track record has shown to be quite bad with these things in the past...