Wednesday, May 27, 2020

The Engineering of a Ceph Cluster Solution: Change the Way You Store Data




We here at 45Drives are big fans of Ceph (as you may know). Ceph is the premier open-source clustering software. It works by linking multiple servers together into a single unit that can provide object, file, and block storage. Storage clustering enables important features that you may want access to, but are architecturally impossible with single server solutions. Clusters are also becoming more accessible than ever. 45Drives takes a bit different of an approach to selling and supporting clusters than legacy vendors. But before we talk about that, what makes clusters so appealing? 



Scaling

With single-server solutions, unexpected increases in data production or simply not buying big enough can leave you looking for a new server to solve your woes. No IT professional wants to tell their co-workers to change their workflow to accommodate not having enough storage. Also, adding new individual servers to a network each time you fill the last one will cause a complex web of non-integrated systems. It's important to note that, each time you add a new individual server - you are adding a new bottleneck, as the majority of the new storage interactions will be with that new server.

Ceph clusters scale horizontally, meaning you add more servers to your cluster parallel to the others (they can scale capacity by adding more drives too if you have open bays in your servers). Ceph also scales to the exabyte level, giving effectively infinite scalability for almost every use case. It also means your storage remains a single solution, preventing any increase in complexity from being forced to manage multiple separate data-storage solutions.

Another great part of Ceph’s scaling is that performance will increase with capacity as you add more servers, making it difficult to completely outgrow as long as you can add more computing power and space as needed. A properly set up and maintained Ceph cluster is one of the soundest long-term methods to store data available today - but also protect and keep your data accessible in the future.

High-Availability

No one likes staying at work late to fix server issues, or getting multiple phone calls from management and colleagues about how they can’t work because servers are down.

The highly-available nature of Ceph is what makes IT folks the happiest - believe me, we hear it from our own all the time. Ceph removes single points of failure to be extremely resilient and difficult to take offline. Even when performing updates, your organization will still have data access — cutting out those nights your IT people need to come in at 1AM to minimize the impact of downtime. Ceph self-balances - when new nodes are added, data is redistributed to be efficiently stored across all servers (this can be limited and set to happen at certain times to minimize performance cost). It also will redistribute data in the event of drive failures to ensure no data is lost.


Simplified Large Scale Management

I often wonder if when people think "open-source", an image pops into their head of someone crouching over their computer, typing into a command line at supersonic speeds well into the late hours of the night.

A lot of people are under the impression that there is a steep learning curve with open-source. It could be a historical thing, but some people have a habit of conflating open-source - with complex. When talking about using Ceph for multi-server storage deployments, the opposite is true.

Ceph has an amazing enterprise dashboard that provides a single centralized place for viewing and managing all your storage. The dashboard gives you a global view in your cluster including system hardware details, real-time alerting, and detailed metric display. The dashboard allows you to create NFS exports directly, replace hard-drives via the OSD screen, edit your configuration, and more - all without ever accessing the command line. 

Although the command line is there for those who prefer it. The dashboard is integrated into the Ceph development, so as new features are added to Ceph they also appear on the dashboard without any wait.

(You can check out our Tech Tip Walkthrough video of Ceph’s dashboard if you want to see the ins-and-outs of exactly what it would be like to use.)

Objections Against Open-Source

Over the years, as we are open-source enthusiasts, we’ve heard a lot from customers who have some preconceptions about open-source software. This makes perfect sense, as a lot of the decisions we’ve made as a company have been aimed to enable our customers to leverage open-source despite the traditional issues of open-source deployments.

First, how do the other guys deploy storage hardware? Well, with legacy enterprise-storage vendors, the company develops a hardware design and proprietary software that is loaded onto it. They sell their servers bundled with very restrictive support contracts that are voided if you start poking your nose into where they think it shouldn’t be. The quality of the products is quite high, but so is the price tag to go along with it (they also charge software licensing fees).

Traditional open-source storage deployments have come with more risk and a steeper learning curve. Instead of a fully supported complete solution, users buy or build their own server and are responsible for any issues they have with it. Support for the software component is often possible, as companies like Red Hat have high-quality support offerings for open-source software. The real issue with that sort of software-only-support is from not having a single point to accountability in case of mishaps. When something goes wrong, you can get boomeranged back and forth from people telling you it is a hardware issue or a software issue, leading to costly downtime lasting longer than necessary.

45Drives aims to address these issues by offering open-source data storage solutions with support for the software and hardware. We provide customers a single point of responsibility so they know who to blame if anything breaks (though we do try to make sure nothing does). We are fully committed to having your storage up and running. We have the most flexible support in the industry. We will only be involved as much as you want us to be, but we won’t leave you on your own — even if you’re doing something cool and unique with your storage.

Clusters are Not Just for Huge Datasets

Something I don’t think a lot of our customers are aware of, is just how accessible clusters are. Even for organizations without a large amount of data yet. The scalability, availability, and future-proofed nature of clusters come into play whether you have a large multi-petabyte deployment, or a less than 150TB entry-level cluster. 






















The price compared to legacy vendors compete with single servers, with features that are architecturally impossible for those legacy vendor systems. You can get a (USD)$20,000 cluster that can provide you a solution that will continually grow with your business and will give you access to possibly the most well-rounded storage software ever designed.

Ceph is capable of all this cool stuff because of its “software-defined” nature. Effectively everything in Ceph is abstracted, which means nothing is locked in. That's why it is so flexible with allowing SSD’s and HDD’s; block, object, and file; erasure coding and replication — all within the same cluster across the parameters you require. For example, you could replace every component in a node (server) except the drives, and get it back up and running. When you combine this with 45Drives’ open platform design, you have complete peace of mind for data security. 

Our goal with our customers is to make sure storage is never the issue.

No comments:

Post a Comment