On the 13th of April, Quentin Hocquet and I (formerly CTO and CEO at Infinit, respectively) flew to Austin, Texas to participate to Docker's yearly conference: DockerCon'17.
First and foremost, we wanted to share the slides and video of the presentation for all the ones who couldn't attend to enjoy (hopefully). Through this talk, I explain why we've designed Infinit's clustering technology this way, how it differentiate from other ones and what benefits it brings to the table:
During the four or so days at DockerCon, I've met with numerous people, from partners, customers and users. Such exchanges helped me get feedback and better understand everyone's expectations when it comes to storage. What came out of those discussions is that most people want storage to be taken care of, being the number one challenge to the containerization of their applications.
However, it also appeared that most people don't really know what type of data they intend to store, hence what they should use. This is especially true for enterprises that have thousands of legacy applications.
When it comes to cloud native developers, the number one application in need of a resilient and scalable storage backend seems to clearly be the database. Over the years at Infinit, we've always taken extra care explaining the specifics of a database when it comes to storage and as such what should be used.
What makes a database specific compared to other distributed applications is that the data stored is only accessed by a single node: a database frontend listens on a port, handles incoming requests and synchronizes its in-memory state on disk. Should the database be scaled out by spawning another frontend on another node, this new node would also synchronize its in-memory state on disk, but that disk would be completely isolated from the first node's.
Many other distributed applications need the state to be shared between multiple nodes. This is not the case for a database for which state is node specific.
This brings me to how such a state should be managed. In theory, one should always rely on the database's software to scale and handle redundancy. As such, if your database, such as MongoDB, provides a replication feature, we advise you to rely on it because the database application has more context to replicate data and optimize placement, load balancing etc. rather than rely on the storage backend.
We however understand that configuring and tuning a database is not everyone's hobby. Assuming throughput is not a problem and that all you want is for your database's state to be persisted, then you could use a resilient storage backend. Now the type of storage you decide to use will directly impact your database's performance.
Let us now come back to my previous and bold statement "it appears that most people don't really know what type of data they intend to store". The Infinit storage platform has been designed to provide multiple policy-customizable storage logics on top of a scalable and resilient storage clustering mechanism. By using the Infinit storage platform, you can decide to use one or more storage logic, most likely between an object storage, a block device or a (distributed) file system.
In the context of a database, most people I've talked to were drawn to the idea of deploying a distributed file system to back their database, because of the compliance to the POSIX standards. A distributed file system is able to support concurrent accesses from multiple nodes and handle conflicts. As such, since such multi-node concurrent accesses are not possible when it comes to a database (whose state is specific to a single node), using a distributed file system would be too expensive and you would end up paying the price for high-end functionalities that are unnecessary for a database.
A block device on the other end is, in theory (depending on the file system it is formatted in), only accessed from a single node. It therefore perfectly matches the requirements of a database application. In addition and because of this limitation, a block device will be more efficient that a distributed file system, doing away with locking, conflicts, cache invalidations etc.
As a summary, databases are likely to be the first and most deployed application requiring persistent storage. But because people have different needs, I believe that operators will want to have fine-grained control over their deployments and should as such achieve persistent state through the database's redundancy capability itself. As for developers, they don't want to waste time configuring and tuning a database to replicate itself. Those users could in this case rely on a block device storage logic, accessible through an interface such as NBD, iSCSI or else, so as to achieve persistent storage without much effort.
Get started with Infinit in less than 10 minutes! Test it!