Join operation with NOSQL
I have gone through some articles regarding Bigtable and NOSQL. It is very interesting that they avoid JOIN operations.
As a basic example, let's take Employee and Department table and assume the data is spread across multiple tables / servers.
Just want to know, if data is spread across multiple servers, how do we do JOIN or UNION operations?
Solution 1:
When you have extremely large data, you probably want to avoid joins. This is because the overhead of an individual key lookup is relatively large (the service needs to figure out which node(s) to query, and query them in parallel and wait for responses). By overhead, I mean latency, not throughput limitation.
This makes joins suck really badly as you'd need to do a lot of foreign key lookups, which would end up going to many,many different nodes (in many cases). So you'd want to avoid this as a pattern.
If it doesn't happen very often, you could probably take the hit, but if you're going to want to do a lot of them, it may be worth "denormalising" the data.
The kind of stuff which gets stored in NoSQL stores is typically pretty "abnormal" in the first place. It is not uncommon to duplicate the same data in all sorts of different places to make lookups easier.
Additionally most nosql don't (really) support secondary indexes either, which means you have to duplicate stuff if you want to query by any other criterion.
If you're storing data such as employees and departments, you're really better off with a conventional database.
Solution 2:
You would have to do multiple selects, and join the data manually in your application. See this SO post for more information. From that post:
Bigtable datasets can be queried from services like AppEngine using a language called GQL ("gee-kwal") which is a based on a subset of SQL. Conspicuously missing from GQL is any sort of JOIN command. Because of the distributed nature of a Bigtable database, performing a join between two tables would be terribly inefficient. Instead, the programmer has to implement such logic in his application, or design his application so as to not need it.
Solution 3:
Kaleb's right. You write custom code with a NoSQL solution if your data doesn't fit well into a key-value store. Map-reduce/async processing and custom view caches are common. Brian Aker gave a very funny (and satirical and biased) presentation at the Nov 2009 OpenSQLCamp http://www.youtube.com/watch?v=LhnGarRsKnA. Skip in 40 seconds to hear about joins.
Solution 4:
Natively, unfortunately, is not possible to perform a Join into a NoSQL database. This is actually one of the biggest differences between SQL and NoSQL DBs.
As @kaleb said, you would have to do multiple selections and then join the needed information "manually".
Luckily, there are ORMs frameworks such as Prisma that will allow you to "fake" the native SQL join feature.
Note: you're still performing multiple db calls under the hood, increasing the read-ops, and everything that's concerned.
" A key feature of Prisma Client is the ability to query relations between two or more models. " -> https://www.prisma.io/
example:
const getUser = await prisma.user.findUnique({
where: {
id: 19,
},
select: {
name: true,
posts: {
select: {
title: true,
},
},
},
})
In this case, the posts are stored in a different table, but Prisma is able to fetch them and join them into the User object.