How can I design a Java web application without an ORM and without embedded SQL [closed]
Solution 1:
Why, oh why, is the industry so wedded to this disaster of a concept? None of the previous answers adequately addresses @johnny's concerns. They are all half-baked hand-waving justifications that steer well clear of the concrete issues facing database programmers.
@TheTXI's answer is typical of the responses I get when asking the same question: separation of layers. What is that supposed to mean? Why do I want to separate my layers? Or, more to the point, how does it benefit me to create an additional layer that is different to the relational layer and yet is supposed to be a canonical mapping from that layer?
Furthermore, (@johhny's as-yet unanswered point) how does this protect us from change? If the database changes, the ORM layer will almost certainly have to follow suit. In fact, the default practice of not modeling join tables at the object level makes this even worse, because when the table inevitably grows some extra columns, you not only add some extra fields to the object model, but you also change its topology and force a bunch of code rewrite! This is a disaster for change management and is a classic example of how a misguided view of relations (thinking they represent objects) courts disaster. This simply wouldn't have occurred if you just mapped the relational database directly into congruent notions in the programming language (think LINQ-to-SQL, and not LINQ-to-EF).
The biggest unanswered question in this space — the elephant in the room — is: what problem is ORM supposed be solving? And don't say "object-relational impedance mismatch". That's just another hand-waving fob-off. Explain why there is an impedance mismatch, and why it should be the database that comes to the language rather than the language going to the database. My explanation is that most programming languages suck at expressing and working with relational data, but that this is a historical reality that is beginning to slip away (LINQ-to-SQL was the first baby-step in that direction), not a fundamental principle on which to base sound architecture.
There's a reason that ORMs have become so complex, with lazy-loading, caching, a bewildering array of persistence and ownership semantics, etc. And there's a reason that for all this extra pounding away at the keyboard, they still fail to efficiently solve basic problems like, "Which pairs of members share more than one group?" The relational model was conceived at a time when network and heirarchical models were buckling at the knees under the weight of such problems; it was a breath of fresh air. Now we all seem to yearn to go back to our old sandpit full of cat-pee, and we think we've invented something new (as long as we hold our noses).
(I fully expect to be liberally down-voted on this answer. But please leave a comment when you do. I don't mind being told I'm wrong, as long as I know why.)
EDIT: Thank you @Chris for taking the time to comment. It gives me some concrete points to address... (Note that while I frequently address @Chris below, I am not trying to take him to task specifically; his responses are typical of the kinds of comments I hear all the time when discussing the subject. So I hope he doesn't take my criticisms as a personal affront; they are not intended that way, and I do genuinely appreciate the time he took to respond.)
First off, let me clear up some misconceptions evident in @Chris's comments and answer.
- I do not advocate raw SQL in code, for all the obvious reasons, and some not so obvious ones (e.g., SQL is neither an algebra nor a calculus, which makes functional decomposition virtually impossible).
- I do not advocate monolithic application design. Layers are, in general, a good thing.
- I do not advocate polluting object models with lots of line-noise such as special fields, methods, and attributes. Frankly, however, this is a strawman, since domain/object models only exist in the ORM universe. Now, I know LINQ-to-SQL has all these classes with lots of noisy bits in them, but they are just behind-the-scenes plumbing; you don't edit that code, and you generally shouldn't even look at it.
Now some objections to the objections:
- The assertion that applications can be built independently of the database is unfounded. By and large, ORMs are just a canonical mapping onto the data layer (Tables Foo and Bar become classes Foo and Bar, and table FooBar becomes some kind of torrid affair between classes Foo and Bar). There isn't much wiggle room in this mapping, and so any change to the data model will almost certainly require a corresponding change to the object model. This is a good thing in my view, since an object that radically diverged from the corresponding database model would be nothing more than an additional maintenance headache for all concerned.
- Once the illusion that ORMs engender data-model-independence is discarded, all protestations about the evils of direct coupling to the data model become moot. But I'd like to pursue this a little further than to simply dismiss it. Coupling is an essential feature of system design. At some point, decisions and assumptions have to be made. You can't program everything using a single "Things" table. You have to decide that your domain contains certain specific concepts and then create schemas and code that respect those concepts, treat them as first-class citizens, hard-code them. The idea that applications ought to be independent of the database is misguided. The database is (or ought to be) the purest representation of a business's knowledge (I know that this isn't always the case, and I will address this later). Coupling to this representation ought to provide the strongest guarantee of resilience, since such a data model will only change when the business itself undergoes some intrinsic change. In short, coupling to a well-designed database schema is a very good thing.
- Layering is not an end in its own right. It is good because it achieves some specified goal. The preceding points show that layering between the database and the app in the way ORM does is neither effective nor necessary to achieve the real goal of resilience to change. This is achieved through good database design.
- @Chris asserts that letting the database dictate things stymies OO design. This is true enough, but it is only interesting if OO design is the best way to model knowledge. The nearly complete failure of OODBMSs in the marketplace hints that this is not the case. The relational model, with its predicate-logic foundation, possesses the same expressive power as OO design without incurring the graph-theoretic complexities of OO models.
- @Chris's objections to the relational model on the grounds that it doesn't solve today's problems (hence the NoSQL movement) is completely off the mark. NoSQL means "No SQL", not, "No relational model". Unfortunately even proponents of the NoSQL movement seem to be quite clueless in this regard. SQL has deep flaws, many of which can be traced to its radical departure from the relational model. To say that we should abandon the relational model because SQL sucks is a rather blatant case of throwing the baby out with the bathwater.
- Failure to use an ORM does not triple the effort of building an application. This is a ludicrous claim, and even @Chris seems to be holding the back door open on it with a backhanded compliment to the codegen alternative. Codegen tools such as LINQ-to-SQL's sqlmetal are a perfect solution for anyone who isn't wedded to the dogma that the application's data model absolutely has to be different to the database's data model.
My own experience with ORMs has been that they work great in tutorials and cause endless pain and frustration in the real world. With LINQ-to-SQL fixing many of the problems that motivated ORMs in the first place, I see no reason to put myself through that kind of torture.
One major problem remains: the current crop of SQL databases doesn't offer any meaningful degree of control over the separation of physical and logical layers. The mapping from a table to stuff on the disk is largely fixed and entirely under the control of the SQL DBMS. This was not part of the plan for the relational model, which explicitly separated the two, and allowed for the definition of a consistent logical representation of data that could be stored on disk in a completely different structure than was suggested by the logical model. For instance, a system (or dba) would be free to physically denormalise — for performance reasons — a highly normalised logical model. Because SQL engines don't allow this separation of concerns, it is common to denormalise or otherwise torture the logical model through sheer necessity. As a result, logical models can't always be exactly as they should, and so the ideal of using the database as the purest representation of knowledge cannot be fully realised. In practice, however, designers generally stick to a canonical mapping from database to domain model anyway, because anything else is just too painful to maintain.
Solution 2:
In my experience with NHibernate it's a bit like Visual Basic: it makes easy problems really easy, but it also makes anything harder than that really hard or even completely impossible.
The basic idea is that ORM avoids having to write persistence code. There is a lot of duplication in such code, so it's very tempting to make that code generic, rather than specific to a particular business layer, and re-use it across projects. So far so good. For simple object hierarchies and business requirements this actually works really well. If your database changes, yes, you do have to change the ORM mapping files, but that's usually quite simple and you only have to make the change in one place - much easier than changing code that accesses the database.
The problem is that as your database and requirements get more complex the ORM finds it increasingly hard to keep up. So the ORM gets more and more complex. It also takes shortcuts, doing some things inefficiently, because it just isn't smart enough to figure out how to do it efficiently in all cases. What's more, because the whole idea is that it works transparently, you often can't see these performance problems until they become bad enough to affect the user experience. A related problem is that bugs are much harder to find, because you just don't know what's going on inside the ORM unless you debug it. (Yes, I've had to step through NHibernate code and it's no picnic!)
So you start bypassing the ORM for some things and use SQL directly instead. Of course, you then have to make that code work with the code that does use the ORM, which is more work. You end up writing code to load and save some objects manually and to somehow work that into the ORM code. Eventually you start wondering if the ORM is creating more work for you than it's saving - not to mention the performance and bug hunting headaches.
So if you're writing a very simple application, the kind you find in a tutorial, ORM will do a good job. If it's more complex than that then I think it's not worth it. Of course, for a simple application the absolute amount of time saved will be small as well. My conclusion: just don't bother with ORM. ORM is the path that leads to the dark side.
Solution 3:
Prior to reading this I often wondered if any others really bothered to understand the impact an ORM tool has on a project. @Evgeny, @Marcelo Cantos, and @Jimmy B put this to rest.
In short, they are dead on with most of the issues that surround ORM tools. There is only a couple that they either haven't covered or haven't covered enough.
First, usage of an ORM does not mean less code. It might mean less SQL, but it certainly doesn't mean less code. Along these lines, tool generated code can be wrong. Worse, compensating for bad generated code is an exercise in frustration.
Second, ORM tools do NOT mean you don't have to understand SQL. To put this another way, you must know SQL to be an effective programmer (there are exceptions like the embedded guys). The queries (both type and number of) emitted by these tools just aren't good enough from a performance or resource perspective. If you already have to know SQL, why shackle yourself?
Third, ORM tools do NOT save time. You will spend as much (if not more) time beating your ORM tool of choice into submission to take care of any application of decent size. To add insult to injury, when you are done you will see that the quality of the queries are generally worse than if you had done it yourself. Point is ORM = more time spent on the project, worse outcome.
Fourth, DBMS independence is generally a waste of time. Most applications will use exactly one DBMS over it's life; whether it's Oracle, SQL server, MySql, whatever. The application will be much better off it is able to take advantage of the features the DBMS provides. Being DBMS agnostic means that you'll have to limit yourself to sub par queries.
Fifth, not everything has to be an object. This is an important thing to note. We are often asked to show a particular set of data on a page. More often than not this means joining data from two or more tables. Should your app have to create and instantiate all of those objects just to display some data? Or are you better off just executing a query and emitting the data straight to the screen/browser in the format of your choice?
ORM's add a LOT of overhead to even the simplest things. Most ORM's will generate multiple sql queries to perform even a simple UPDATE on a table.
Sixth, and the most important issue in my mind: ORM's decrease your security. Sure you can use ORM's with s'procs; but most people don't. Nearly every app I've seen that leveraged an ORM exposed the database in such a way that if the site is cracked the entire database can be easily killed or pilfered. ORM tools generate SQL on the fly. This means they need direct table access. This means that if your app is compromised, the hacker will have direct table access. This means that you have effectively removed at least one layer of security from your application.
Solution 4:
I realize this question was posted a while ago, but I've wondered the same thing as johnny. I've seen Hibernate used on several large projects by intelligent people, and in every case, it was an unmitigated disaster. In my experience, the single most complicated and performance impacting area of most enterprise applications is between business layer and the database layer. This is part of why we add a data access layer to try to encapsulate the database interactions into manageable chunks.
The biggest arguments I've seen to use ORM are that they "reduce hand written code" and provide an abstraction layer that separates business logic from the data access. I assert that neither of these are true in practice except in very simple cases.
To make Hibernate (and most other ORM tools) work, we either create a hibernate mapping file which documents all the database and object interactions, or we use annotations to document the same relationships. In one case we moved our code to an xml configuration file that is harder to test, yet no less complex. In the other, we distribute the logic of how to interact with the database all over the domain model. The point is, while we've written less actual "code", moving code to configuration files or annotations != "less code". It simply moves the complexity from the code, where we can control it directly and mitigate it, to a third party tool with far less control.
The ORM abstraction layer that is supposed to separate the business/domain layer from the database layer tends to have more subtle effects that counteract this "separation". How many projects have you seen where the ORM layer affects the design of the object model and/or the database in a way that would be considered less than ideal? To use a poor physics analogy, assume your business layer, ORM layer, and database layer all have mass. The mere existence of the ORM layer between the other to tends exert a force that changes and warps the other layers. Have you had to introduce a primary key object that doesn't quite fit in your business model because the ORM layer needed it? Have you had to adjust your database structure to accommodate a particularly complex object graph model because the ORM tool can't handle it otherwise? Taken to extremes, the presence of the ORM layer can warp the entire perspective of how database interactions should work. Instead of a service layer that handles persisting object graphs, it can devolve into creating individual data access layer objects for each domain objects, as if they live in isolation. I've seen all these scenarios in varying degrees. Perhaps it's inexperience with the ORM toolset. Perhaps it's a bad database structure. I doubt it. Everything I've seen points to the ORM solution being inadequate.
If you accept my assertion that the data access layer is one of complexity, highly prone to performance bottlenecks, why should I consider adding in a tool that doesn't achieve it's objectives of less code and separation, and at the same time negatively affecting the structure of the other layers of my application?
Solution 5:
I responded thoroughly to the answer by @Marcelo Cantos however I will summurize the main benefits of using an ORM.
Persistence Ignorance (PI) and Domain Drive Design (DDD)
A ORM lends itself perfectly to both of these overall design patterns. For correct object orientated design you will work with hierarchical and polymorphic objects for expressing how data flows inside your application. With a good design you will frequently strive for POCO objects (plain old C/C# objects, I've seen other definitions of this word) because if I have a Person that has a List I should have just that. I shouldn't have a DataTable of persons and a DataTable of addresses that I somehow force to work with my application. Along with that I shouldn't need to tie tons of database specific logic to my objects, why should the FirstName field of my person need to be have something like [ColumnName("First_Name")] [Length(255)] [DataType(Types.Nvarchar)]
or any other assortment of crazy attributes or code that defines how many database exists forced into my domain design?
Less Hand Written Code
There is a substantial reduction in the number of lines of code written when you remove the need to write an Select, Insert, Update and Delete statement for every single object in your application to save it to the database. This leaves you with only the need to write queries that mean something. Also many ORMs (like NHibernate) will include a language that is layered on top of SQL which is more defined to interact with and can be less syntactically filled of hoops.
The time this shows up fullest is consider an application that has a UserTable that is linked to every single object and for some reason you HAVE to change the primary key name or type. At this point you will potentially need to alter every single stored procedure in the database where with a correctly implemented ORM solution you will only need to alter a few configuration mappings (or possibly none) and it's done.