Doing calculations in MySQL vs PHP
I'd play to the strengths of each system.
Aggregating, joining and filtering logic obviously belongs on the data layer. It's faster, not only because most DB engines have 10+ years of optimisation for doing just that, but you minimise the data shifted between your DB and web server.
On the other hand, most DB platforms i've used have very poor functionality for working with individual values. Things likes date formatting and string manipulation just suck in SQL, you're better doing that work in PHP.
Basically, use each system for what it's built to do.
In terms of maintainability, as long as the division between what happens where is clear, separating these to types of logic shouldn't cause much problem and certainly not enough to out way the benefits. In my opinion code clarity and maintainability are more about consistency than about putting all the logic in one place.
Re: specific examples...
I know this isn't what you're referring too but dates are almost a special case. You want to make sure that all dates generated by the system are created either on the web server OR the database. Doing otherwise will cause some insidious bugs if the db server and webserver are ever configured for different timezones (i've seen this happen). Imagine, for example, you've got a
createdDate
column with a default ofgetDate()
that is applied on insert by the DB. If you were to insert a record then, using a date generated in PHP (egdate("Y-m-d", time() - 3600)
, select records created in the last hour, you might not get what you expect. As for which layer you should do this on, i'd favour the DB for, as in the example, it lets you use column defaults.For most apps i'd do this in PHP. Combining first name and surname sounds simple until you realise you need salutations, titles and middle initials in there sometimes too. Plus you're almost definitely going to end up in a situation where you want a users first name, surname AND a combine salutation + firstname + surname. Concatenating them DB-side means you end up moving more data, although really, it's pretty minor.
Depends. As above, if you ever want to use them separately you're better off performance-wise pulling them out separately and concatenating when needed. That said, unless the datasets your dealing with are huge there are probably other factors (like, as you mention, maintainability) that have more bearing.
A few rules of thumb:
- Generating incremental ids should happen in the DB.
- Personally, i like my default applied by the DB.
- When selecting, anything that reduces the number of records should be done by the DB.
- Its usually good to do things that reduce the size of the dataset DB-side (like with the strings example above).
- And as you say; ordering, aggregation, sub-queries, joins, etc. should always be DB-side.
- Also, we haven't talked about them but triggers are usually bad/necessary.
There are a few core trade-offs your facing here and the balance really depends on you application.
Some things should definitely-everytime-always be done in SQL. Excluding some exceptions (like the dates thing) for lot of tasks SQL can be very clunky and can leave you with logic in out of the way places. When searching your codebase for references to a specific column (for example) it is easy to miss those contained in a view or stored procedure.
Performance is always a consideration but, depending on you app and the specific example, maybe not a big one. Your concerns about maintainability and probably very valid and some of the performance benefits i've mentioned are very slight so beware of premature optimisation.
Also, if other systems are accessing the DB directly (eg. for reporting, or imports/exports) you'll benefit from having more logic in the DB. For example, if you want to import users from another datasource directly, something like an email validation function would be reusable is implemented in SQL.
Short answer: it depends. :)
I don't like reinventing the wheel. I also like to use the best tool possible for the task needed to be done, so:
- When I can get the resultset straight from DB without further processing I do it - your case it's a simple query with a simple
WHERE
clause. Imagine what happens when you have 10 millions users and you get them to PHP, just to need 100 of them - you guessed it - it's very possible for your web server to crash - When you need to get data from 2 or more tables at once, again, MySQL is much better than PHP
- When you need to count records - the DB is great at it
- I tend to favor application level processing to FK constraints
- Also, I tend to avoid stored procedures, preferring to implement that business logic at application level (unless, of course we are talking about huge data sets).
In conclusion, I would say that your colleague is right in the case presented
If you put half your logic in the database and the other half in the php, then 6 months down the track when you come to make a change it will take you twice as long to figure out what is going on.
Having said that though, your database queries should have just enough logic so that they provide your php with exactly the data it needs. If you are finding yourself looping through thousands of mysql records in your php code, then you are doing something wrong. On the other end of the scale though, if you're running if / else statements in your mysql queries you are also doing something wrong (probably just need to rewrite your query).
I'd steer clear of stored procedures. While they are a great concept in theory you can usually accomplish the same result in the php with a much faster development time and you also have the added benefit of knowing where all the logic is.
MySQL will scale better as result sets increase. Frankly, treating a database as a "dumb data" repository is a waste of resources...
Maintainability tends to be tainted by familiarity. If you're not familiar with PHP, it wouldn't be your initial choice for maintainability -- would it?