How to store multiple options in a single table?

Please read up on Data Normalization, General Indexing concepts, and Foreign Key constraints to keep data clean with referential integrity. This will get you going.

Storing data in arrays may seem natural to you on paper, but to the db engine the performance with mostly be without index use. Moreover, you will find on Day 2 that getting to and maintaining your data will be a nightmare.

The following should get you going with a good start as you tinker. Joins too.

create table student
(   studentId int auto_increment primary key,
    fullName varchar(100) not null
    -- etc
);

create table dept
(   deptId int auto_increment primary key,
    deptName varchar(100) not null -- Economics
    -- etc
);

create table course
(   courseId int auto_increment primary key,
    deptId int not null,
    courseName varchar(100) not null,
    -- etc
    CONSTRAINT fk_crs_dept FOREIGN KEY (deptId) REFERENCES dept(deptId)
);

create table SCJunction
(   -- Student/Course Junction table (a.k.a Student is taking the course)
    -- also holds the attendance and grade
    id int auto_increment primary key,
    studentId int not null,
    courseId int not null,
    term int not null, -- term (I am using 100 in below examples for this term)
    attendance int not null, -- whatever you want, 100=always there, 0=he must have been partying,
    grade int not null, -- just an idea   
    -- See (Note Composite Index) at bottom concerning next two lines.
    unique key(studentId,courseId,term), -- no duplicates allowed for the combo (note student can re-take it next term)
    key (courseId,studentId),
    CONSTRAINT fk_sc_student FOREIGN KEY (studentId) REFERENCES student(studentId),
    CONSTRAINT fk_sc_courses FOREIGN KEY (courseId) REFERENCES course(courseId)
);

Create Test Data

insert student(fullName) values ('Henry Carthage'),('Kim Billings'),('Shy Guy'); -- id's 1,2,3
insert student(fullName) values ('Shy Guy');

insert dept(deptName) values ('History'),('Math'),('English'); -- id's 1,2,3

insert course(deptId,courseName) values (1,'Early Roman Empire'),(1,'Italian Nation States'); -- id's 1 and 2 (History dept)
insert course(deptId,courseName) values (2,'Calculus 1'),(2,'Linear Algebra A'); -- id's 3 and 4 (Math dept)
insert course(deptId,courseName) values (3,'World of Chaucer'); -- id 5 (English dept)

-- show why FK constraints are important based on data at the moment
insert course(deptId,courseName) values (66,'Fly Fishing 101'); -- will generate error 1452. That dept 66 does not exist
-- That error is a good error to have. Better than faulty data

-- Have Kim (studentId=2) enrolled in a few courses
insert SCJunction(studentId,courseId,term,attendance,grade) values (2,1,100,-1,-1); -- Early Roman Empire, term 100 (made up), unknown attendance/grade
insert SCJunction(studentId,courseId,term,attendance,grade) values (2,4,100,-1,-1); -- Linear Algebra A
insert SCJunction(studentId,courseId,term,attendance,grade) values (2,5,100,-1,-1); -- World of Chaucer

-- Have Shy Guy (studentId=3) enrolled in one course only. He is shy
insert SCJunction(studentId,courseId,term,attendance,grade) values (3,5,100,-1,-1); -- Early Roman Empire, term 100 (made up), unknow attendance/grade
-- note if you run that line again, the Error 1062 Duplicate entry happens. Can't take same course more than once per term

Some simple questions.

What course is in what department?

show all, uses table aliases (abbreviations) to make typing less, readability (sometimes) better

select c.courseId,c.courseName,d.deptId,d.deptName
from course c
join dept d
on c.deptId=d.deptId
order by d.deptName,c.courseName -- note the order
+----------+-----------------------+--------+----------+
| courseId | courseName            | deptId | deptName |
+----------+-----------------------+--------+----------+
|        5 | World of Chaucer      |      3 | English  |
|        1 | Early Roman Empire    |      1 | History  |
|        2 | Italian Nation States |      1 | History  |
|        3 | Calculus 1            |      2 | Math     |
|        4 | Linear Algebra A      |      2 | Math     |
+----------+-----------------------+--------+----------+

Who is taking the World of Chaucer course this term?

(knowing the courseId=5)

The below benefits from one of our composite indexes in SCJunction. A composite is an index on more than one column.

select s.StudentId,s.FullName
from SCJunction j
join student s
on j.studentId=s.studentId
where j.courseId=5 and j.term=100
+-----------+--------------+
| StudentId | FullName     |
+-----------+--------------+
|         2 | Kim Billings |
|         3 | Shy Guy      |
+-----------+--------------+

Kim Billings is enrolled in what this term?

select s.StudentId,s.FullName,c.courseId,c.courseName
from SCJunction j
join student s
on j.studentId=s.studentId
join course c
on j.courseId=c.courseId
where s.studentId=2 and j.term=100
order by c.courseId DESC -- descending, just for the fun of it
+-----------+--------------+----------+--------------------+
| StudentId | FullName     | courseId | courseName         |
+-----------+--------------+----------+--------------------+
|         2 | Kim Billings |        5 | World of Chaucer   |
|         2 | Kim Billings |        4 | Linear Algebra A   |
|         2 | Kim Billings |        1 | Early Roman Empire |
+-----------+--------------+----------+--------------------+

Kim is overwhelmed, so drop drop the math class

delete from SCJunction
where studentId=2 and courseId=4 and term=100

run that above select statement showing what Kim is taking:

+-----------+--------------+----------+--------------------+
| StudentId | FullName     | courseId | courseName         |
+-----------+--------------+----------+--------------------+
|         2 | Kim Billings |        5 | World of Chaucer   |
|         2 | Kim Billings |        1 | Early Roman Empire |
+-----------+--------------+----------+--------------------+

Ah, much easier term. Dad won't be happy though.

Note such things as SCJunction.term. Much can written about that, I will skip over it at the moment mostly, other than to say it should also be in an FK somewhere. You may want your term to look more like SPRING2015 and not an int.

And as far as id's go. This is the way I would do it. It is personal preference. It would require knowing id #'s, looking them up. Others could choose to have a courseId something like HIST101 and not 17. Those are highly more readable (but slower in the index (barely). So do what is best for you.

Note Composite Index

A Composite Index (INDEX means KEY, and vice-versa) is one that combines multiple columns for fast data retrieval. The orders are flipped for the two composites in the SCJunction table so that, depending on the universe of queries that go after your data, the db engine can choose which index to use for fastest retrieval based on the left-most column you are going after.

As for the unique key, #1, the comment next to it stating enforcing no duplicates (meaning junk data) is rather self-explanatory. For instance, student 1 course 1 term 1 cannot exist twice in that table.

A crucial concept to understand is the concept of left-most ordering of column names in an index.

For queries that go after studentId only, then the key that has studentId listed first (left-most) is used. In queries that go after courseId only, then the key that has courseId left-most is used. In queries that go after both studentId and courseId, the db engine can decide which composite key to use.

When I say "go after", I mean in the on clause or where clause condition.

Were one not to have those two composite keys (with the column 1 and 2 in them flipped), then in queries where the column sought is not left-most indexed, you would not benefit with key usage, and suffer a slow tablescan for data to return.

So, those two indexes combine the following 2 concepts

  • Fast data retrieval based on left-most or both (studentId and courseId columns)
  • Enforcing non-duplication of data in that table based on studentId, courseId, and term values

The Takeaway

The important takeaway is that Junction tables make for quick index retrieval, and sane management of data versus comma-delimited data (array mindset) crammed into a column, and all the misery of using such a construct.


For completeness sake, not in a matter that this is a general recommended solution:

MySQL as of version 5.7.8 provides the the JSON datatype, which allows to store and retrieve objects and arrays in the JSON format.

This way, you can store entire objects and arrays into a field, as an array would look like:

 ['subject_1', 'subject_2', 'subject_3']

Especially beginners don't know this, and they reinvent the wheel by yet another comma-separated string implementation or using language-dependent serialization/deserialization approaches.

At least JSON is very commonly used and easily parsed as a data exchange format.

There are valid use cases for using storing arrays and objects inside a MySQL field, e.g. for speed optimization or when you have unknown or dynamic properties that you still want to save in a DB.

Yet as a rule of thumb, if you rely on storing objects and arrays into MySQL, then either your database design is most likely broken or you should rather use a document-based database like MongoDB.