What is the most efficient way to store tags in a database?
I am implementing a tagging system on my website similar to one stackoverflow uses, my question is - what is the most effective way to store tags so that they may be searched and filtered?
My idea is this:
Table: Items
Columns: Item_ID, Title, Content
Table: Tags
Columns: Title, Item_ID
Is this too slow? Is there a better way?
Solution 1:
One item is going to have many tags. And one tag will belong to many items. This implies to me that you'll quite possibly need an intermediary table to overcome the many-to-many obstacle.
Something like:
Table: Items
Columns: Item_ID, Item_Title, Content
Table: Tags
Columns: Tag_ID, Tag_Title
Table: Items_Tags
Columns: Item_ID, Tag_ID
It might be that your web app is very very popular and need de-normalizing down the road, but it's pointless muddying the waters too early.
Solution 2:
Actually I believe de-normalising the tags table might be a better way forward, depending on scale.
This way, the tags table simply has tagid, itemid, tagname.
You'll get duplicate tagnames, but it makes adding/removing/editing tags for specific items MUCH more simple. You don't have to create a new tag, remove the allocation of the old one and re-allocate a new one, you just edit the tagname.
For displaying a list of tags, you simply use DISTINCT or GROUP BY, and of course you can count how many times a tag is used easily, too.
Solution 3:
If you don't mind using a bit of non-standard stuff, Postgres version 9.4 and up has an option of storing a record of type JSON text array.
Your schema would be:
Table: Items
Columns: Item_ID:int, Title:text, Content:text
Table: Tags
Columns: Item_ID:int, Tag_Title:text[]
For more info, see this excellent post by Josh Berkus: http://www.databasesoup.com/2015/01/tag-all-things.html
There are more various options compared thoroughly for performance and the one suggested above is the best overall.
Solution 4:
You can't really talk about slowness based on the data you provided in a question. And I don't think you should even worry too much about performance at this stage of developement. It's called premature optimization.
However, I'd suggest that you'd include Tag_ID column in the Tags table. It's usually a good practice that every table has an ID column.