How to write a simple database engine [closed]

I am interested in learning how a database engine works (i.e. the internals of it). I know most of the basic data structures taught in CS (trees, hash tables, lists, etc.) as well as a pretty good understanding of compiler theory (and have implemented a very simple interpreter) but I don't understand how to go about writing a database engine. I have searched for tutorials on the subject and I couldn't find any, so I am hoping someone else can point me in the right direction. Basically, I would like information on the following:

  • How the data is stored internally (i.e. how tables are represented, etc.)
  • How the engine finds data that it needs (e.g. run a SELECT query)
  • How data is inserted in a way that is fast and efficient

And any other topics that may be relevant to this. It doesn't have to be an on-disk database - even an in-memory database is fine (if it is easier) because I just want to learn the principals behind it.

Many thanks for your help.


If you're good at reading code, studying SQLite will teach you a whole boatload about database design. It's small, so it's easier to wrap your head around. But it's also professionally written.

http://sqlite.org/


The answer to this question is a huge one. expect a PHD thesis to have it answered 100% ;) but we can think of the problems one by one:

  • How to store the data internally: you should have a data file containing your database objects and a caching mechanism to load the data in focus and some data around it into RAM assume you have a table, with some data, we would create a data format to convert this table into a binary file, by agreeing on the definition of a column delimiter and a row delimiter and make sure such pattern of delimiter is never used in your data itself. i.e. if you have selected <*> for example to separate columns, you should validate the data you are placing in this table not to contain this pattern. you could also use a row header and a column header by specifying size of row and some internal indexing number to speed up your search, and at the start of each column to have the length of this column like "Adam", 1, 11.1, "123 ABC Street POBox 456" you can have it like <&RowHeader, 1><&Col1,CHR, 4>Adam<&Col2, num,1,0>1<&Col3, Num,2,1>111<&Col4, CHR, 24>123 ABC Street POBox 456<&RowTrailer>

  • How to find items quickly try using hashing and indexing to point at data stored and cached based on different criteria taking same example above, you could sort the value of the first column and store it in a separate object pointing at row id of items sorted alphabetically, and so on

  • How to speed insert data I know from Oracle is that they insert data in a temporary place both in RAM and on disk and do housekeeping on periodic basis, the database engine is busy all the time optimizing its structure but in the same time we do not want to lose data in case of power failure of something like that. so try to keep data in this temporary place with no sorting, append your original storage, and later on when system is free resort your indexes and clear the temp area when done

good luck, great project.


There are books on the topic a good place to start would be Database Systems: The Complete Book by Garcia-Molina, Ullman, and Widom