How to store arrays in MySQL?

I have two tables in MySQL. Table Person has the following columns:

id name fruits

The fruits column may hold null or an array of strings like ('apple', 'orange', 'banana'), or ('strawberry'), etc. The second table is Table Fruit and has the following three columns:

fruit_name color price
apple red 2
orange orange 3
----------- -------- ------

So how should I design the fruits column in the first table so that it can hold array of strings that take values from the fruit_name column in the second table? Since there is no array data type in MySQL, how should I do it?


Solution 1:

The proper way to do this is to use multiple tables and JOIN them in your queries.

For example:

CREATE TABLE person (
`id` INT NOT NULL PRIMARY KEY,
`name` VARCHAR(50)
);

CREATE TABLE fruits (
`fruit_name` VARCHAR(20) NOT NULL PRIMARY KEY,
`color` VARCHAR(20),
`price` INT
);

CREATE TABLE person_fruit (
`person_id` INT NOT NULL,
`fruit_name` VARCHAR(20) NOT NULL,
PRIMARY KEY(`person_id`, `fruit_name`)
);

The person_fruit table contains one row for each fruit a person is associated with and effectively links the person and fruits tables together, I.E.

1 | "banana"
1 | "apple"
1 | "orange"
2 | "straberry"
2 | "banana"
2 | "apple"

When you want to retrieve a person and all of their fruit you can do something like this:

SELECT p.*, f.*
FROM person p
INNER JOIN person_fruit pf
ON pf.person_id = p.id
INNER JOIN fruits f
ON f.fruit_name = pf.fruit_name

Solution 2:

The reason that there are no arrays in SQL, is because most people don't really need it. Relational databases (SQL is exactly that) work using relations, and most of the time, it is best if you assign one row of a table to each "bit of information". For example, where you may think "I'd like a list of stuff here", instead make a new table, linking the row in one table with the row in another table.[1] That way, you can represent M:N relationships. Another advantage is that those links will not clutter the row containing the linked item. And the database can index those rows. Arrays typically aren't indexed.

If you don't need relational databases, you can use e.g. a key-value store.

Read about database normalization, please. The golden rule is "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key.". An array does too much. It has multiple facts and it stores the order (which is not related to the relation itself). And the performance is poor (see above).

Imagine that you have a person table and you have a table with phone calls by people. Now you could make each person row have a list of his phone calls. But every person has many other relationships to many other things. Does that mean my person table should contain an array for every single thing he is connected to? No, that is not an attribute of the person itself.

[1]: It is okay if the linking table only has two columns (the primary keys from each table)! If the relationship itself has additional attributes though, they should be represented in this table as columns.