I was prowling StackOverflow today and the question was asked when you would use 2 databases. I’m amazed at the number of answers of replication and sharding (some under the guise of some other description). The only answer I can think of to be correct is Different Applications/Services (shared hosting probably falls under those 2 in a weird way). I’m going to explain why a replicated database and/or a sharded database are not multiple databases.
Firstly, we must understand what a database is and is not. A database is not a computer program that runs and stores data, this is DBMS (Database Management System). A database simply is a collection of data. A database is an organized collection of data. Many Definitions of a Database include computers, however not all do, in fact Joe Celko argues that first database were precursors to written language in the Middle East (see Data and Databases Chapter 1). It may be true that database are organized for quick retrieval of information in a certain way, however they are not necessarily designed for quick retrieval of all information, in any way.
So what are some example’s of a Non Computer Database? A phone book is a good example of a database. It’s really fast to look someone up if you know there last name, but not if all you have is their phone number and you want their address. You’ll Also note that it’s got various ways of searching if you’re looking for businesses. It is not a good example of a normalized database, but I’m not sure that’s possible with a paper database that allows you to find the same information in multiple ways. A phone book is simply a collection of Names, Addresses, and Phone Numbers, and Categories (tags).
So wouldn’t two phone books be two databases? well if they are published by different vendors, or at different times, then yes. However, if you have two copies of the same phone book, then you have two instances of that database, that have been ‘replicated’ from a ‘master’ copy. You can’t update one of these ‘slave’ copy’s and expect them to replicate, you must update the ‘master’ and publish the updates. Unlike computer Phone books are extremely slow to push updates. Let’s say you want to call a friend, Steven, over for pizza, and your roommate, John, wants to try a new pizza place. John can look at one copy of the phone book to look up the addresses and phone numbers of to the new pizza places you haven’t tried, while you look up Steven’s number in your copy of the book. This is exactly what happens with replication, different requesters doing lookups from different databases.
Now let’s talk sharding, phone books aren’t sharded right? wrong! they are phone books are sharded by location. The entire phone book database for even 1 state would be too big for them to distribute, so they split it up into smaller regions. Only the phone book company has the entire list. So that makes it kind of a bad example. Let’s talk about encyclopedia’s instead. They are a much better example since you don’t generally buy one encyclopedia, you buy the whole set (which is also replicated and you can think of it’s replication in the same way as phone books). Each encyclopedia volume tells on the side which letters it contains, so when you open it you know you’re in the right vicinity. This is how a shard works, your database got too big an unmanageable as a whole so you split it up into ‘shards’ or ‘partitions’ to make it faster to search and easier to handle and store.
Now to clarify my answer to that original question. You wouldn’t put a phone number (phone entry not included) in an encyclopedia even though I’m sure Bill Clinton has one, you might list it in the phone book though. So now you see when you would use a different database. Very clearly you would use one when the type of data is different enough, when it’s for different organizations, or when it’s for kind of shared service.
I hope these analogies to computer databases have helped explain the concepts.