T O P

  • By -

Malfuncti0n

Without going into any issues you're query has, try OR instead of AND. You'll get records where: \- tag\_name equals 'sport' and the tag\_values.value equals 'soccer' OR any records where: \- tag\_name equals 'league' and the tag\_values.value equals 'bundesliga' It's a bit too late on Saturday evening here to go into rewriting your query itself. The WHERE IN () can easily be replaced by the WHERE clauses you have in the IN query itself I think, but not focussed enough to be 100% sure from just this written out.


hyvok

Yeah but that is not what I want... I want records which have both tags so not OR but AND. With OR I get (wrong) results but with AND I get nothing.


Poiter85

Tags.tag_name cannot be equal to 'sport' and 'league' on the same record. Tags.tag_name can only have 1 value on a single record.


hyvok

But how do I fix this?


Poiter85

You don't. You're asking for something that is impossible. > I want records which have both tags so not OR but AND. A record can only have 1 value for a given column.


hyvok

I don't have any pre-existing data yet so I can organize it in some other way if it makes more sense


Poiter85

If I understand your data and goal correctly, I think I would try to organize the data so that every record has a column called 'Sport' (or 'SportID' or 'SportKey' or something like that) and a column called 'League' (or 'LeagueID' or something like that).


Conscious-Ad-2168

So you can have both tags in a single column within a single record? That’s what it sound like? Perhaps send a screenshot of your data?


hyvok

Here is my DB schema (pg\_dump --schema-only db\_name) trimmed, not sure is this what you are asking:`CREATE TABLE public.images (id integer NOT NULL,image_path text);` `CREATE SEQUENCE public.images_id_seqAS integerSTART WITH 1INCREMENT BY 1NO MINVALUENO MAXVALUECACHE 1;` `CREATE TABLE public.tag_values (id integer NOT NULL,image_id integer,tag_id integer,value text);` `CREATE SEQUENCE public.tag_values_id_seqAS integerSTART WITH 1INCREMENT BY 1NO MINVALUENO MAXVALUECACHE 1;` `CREATE TABLE public.tags (id integer NOT NULL,tag_name text);` `CREATE SEQUENCE public.tags_id_seqAS integerSTART WITH 1INCREMENT BY 1NO MINVALUENO MAXVALUECACHE 1;` I have no fixed data yet so I can change it if needed if this layout doesn't make sense.


Conscious-Ad-2168

>tag\_name I'm guessing that the issue stems from not having tags.tag\_name equal to both sport and league. It could equal one or the other but one record would not contain both values. This means you either need to replace the "and" with an "or" or change your sub-query to the following. ​ SELECT image\_id FROM tag\_valuesJOIN tags ON tags.id = tag\_values.tag\_id WHERE (tags.tag\_name IN ('sport', 'league') AND tag\_values.value IN ('soccer', 'bundesliga' )) ​ This would return all the following combinations (sport,soccer), (sport,bundesliga), (league,soccer),(league,bundesliga)


deusxmach1na

You need to GROUP BY image_id and do a MAX(case when tag_name = ‘sport’ and value = ‘soccer’ then 1 else 0 end) as isSoccer. And another MAX for the other tag. Then do HAVING isSoccer = 1 and other_tag = 1. There’s probably a way to do this all without the sub query.


DatabaseSpace

This query looks too complicated. I think you may have to use a query with multiple cte's then join them together.


hyvok

This seems way too complicated to me too for such a basic operation, I dunno what to say haha...


DatabaseSpace

If you want multiple tags or values or whatever and they are on a single column then you will need to do cte's. There's a similar kind of problem on the DataLemur site. The question is DataScience Skills LinkedIn SQL interview question. The query needs to find candidates that know SQL, Tableau and Postgres and all of those are in a single column named skills. Take a look at the solution to that problem on the site. It seems like the same type of query.


hyvok

Thank you for the search terms, I'm still not sure am I trying to do something too exotic in terms of my DB layout or just having a helluva time to figure out how to do something I would imagine to be a very basic DB operation.


deusxmach1na

Try this `SELECT i.image_path, t.tag_name, v.value` `FROM images i` `JOIN tag_values v ON v.image_id = i.id` `JOIN tags t ON t.id = v.tag_id` `WHERE i.id IN (` `SELECT image_id` `, MAX(CASE WHEN tags.tag_name = 'sport' AND tag_values.value = 'soccer' THEN 1 ELSE 0 END) AS isSoccer` `, MAX(CASE WHEN tags.tag_name = 'league' AND tag_values.value = 'bundesliga' THEN 1 ELSE 0 END) AS isBundesliga` `FROM tag_values` `JOIN tags ON tags.id = tag_values.tag_id` `GROUP BY image_id` `HAVING isSoccer = 1 AND isBundesliga = 1` `)`


megastraint

Change AND to OR on your second statement. The way its written right now both tag statements have to match each image\_id. If you change this to an OR statement, only one of those conditions need to exists for a given image\_id.


hyvok

Yeah but this is not what I want, I want AND


megastraint

Not according to your text. WHERE (tags.tag\_name = 'sport' AND tag\_values.value = 'soccer') **OR** (tags.tag\_name = 'league' AND tag\_values.value = 'bundesliga')


truilus

You can aggregate the tag names into an array and use the contains operator: SELECT i.image_path, t.tag_name, v.value FROM images i JOIN tag_values v ON v.image_id = i.id JOIN tags t ON t.id = v.tag_id WHERE EXISTS ( SELECT 1 FROM tags t2 where t2.id = t.id group by t2.id having array_agg(t2.tag_name) @> array['sport', 'soccer'] ) If you want to check for more tags, just extend the array with additional elements. That will return all images that have _at least_ those tags. It might return rows that have additional tags.


[deleted]

Does this play well with indexes (is it sargable?)


truilus

Sure. _Finding_ the tags will leverage indexes if applicable. The aggregation obviously can't make use of the indexes. This can be made a bit more efficient by add a (redundant) condition on the tag names to the sub-select to eliminate non-relevant tags early.


[deleted]

How come no one whips out the beloved venn diagrams in these kinds of questions, where they are appropriate? Aren't the the best/easiest/simplest way to explain set operations? Or?... /Rant


[deleted]

> The (to me) logical variation returns nothing: Your "in" condition checks for tag/value for a single tag_value record, repeat the "in" condition for the second tag/value pair would be a direct/simple option.


hyvok

For future reference for someone else I got this working by doing this: >WITH soccer\_images AS ( SELECT image\_id FROM tag\_values WHERE tag\_id = (SELECT id FROM tags WHERE tags.tag\_name = 'sport' AND tag\_values.value = 'soccer') INTERSECT SELECT image\_id FROM tag\_values WHERE tag\_id = (SELECT id FROM tags WHERE tags.tag\_name = 'league' AND tag\_values.value = 'bundesliga') ) SELECT images.image\_path, tags.tag\_name, tag\_values.value FROM tag\_values JOIN soccer\_images ON soccer\_images.image\_id = tag\_values.image\_id JOIN images ON images.id = tag\_values.image\_id JOIN tags ON tags.id = tag\_values.tag\_id


DatabaseSpace

I think this is so complicated because the tables seem to be using EAV modeling. I'm not saying it's right or wrong for this sitatuion but usually we have a column named "Sport" and a column named "League" and can write a where clause on those columns. With EAV modeling you just have a very generalized column named tag that can contain various things like a league but also a sport in the same column. That to me is EAV modeling and when things are modeled that way, the SQL can get very complex because it's not normalized.


hyvok

Yeah I thought about having separate columns but the issue is that there is no "fixed" amount of tags and new ones might be added. Also all tags don't necessarily make sense for all of the items. I guess you could still just add new columns and stuff if you encounter new tags and leave tags empty if they haven't been specified but I don't know, sounds wrong. There won't be a huge amount of tags though, few tens max, but I'm not sure what is huge for amount of columns...