Storage in an unstructured world – the role of object storage

Most of us actually already use object storage; If you watch Netflix, work on Google Docs, share files via Dropbox, view pictures on Instagram, or check Twitter, you’re an object storage user.

Beyond these familiar names, it’s used across the enterprise space for applications that utilise massive amounts of unstructured data, including media storage, enterprise data backup and archive, data analytics and file distribution and sharing. But where does it fit in the storage industry landscape, and how does it differ from other mainstream technologies such as block and file storage?

>See also: Businesses committed to Office 365 but networks needs improvement

Block and file basics

Block storage is the oldest and simplest form of data storage. Here, data is stored in fixed-sized chunks called – you guessed it – ‘blocks.’ By itself, a block typically only houses a portion of the data.

The application makes SCSI calls to find the correct address of the blocks, then organises them to form the complete file. Because the data is piecemeal, the address is the only identifying part of a block.

There is no metadata associated with blocks. This structure leads to faster performance when the application and storage are local, but can lead to more latency the further apart they are. The granular control offered with block storage makes it an ideal fit for applications that require high performance, such as transactional or database applications.

File storage has been around for considerably longer than object storage, and is something most people are familiar with. You name your files/data, place them in folders, and can nest them under more folders to form a set path.

In this way, files are organised into a hierarchy, with directories and sub-directories. Each file also has a limited set of metadata associated with it, such as the file name, the date it was created, and the date it was last modified.

>See also: Enterprise navigation in the dark era of cyber attacks and cyber security

This works very well up to a point, but as capacity grows the file model becomes burdensome for two reasons. First, performance suffers beyond a certain capacity. The NAS system itself has limited processing power, making the processor a bottleneck. Performance also suffers with the massive database – the file lookup tables — that accompany capacity growth.

Second, filer management becomes a challenge as capacity grows. Because filers do indeed get full, NAS proliferation is inevitable. When your first NAS runs out of capacity, you add a second and a third device, leading to the dreaded “storage silos.”

Management workload grows to at least double and triple where you started. Anyone who’s experienced this appreciates the old saying, “I loved my first NAS but hated my tenth.”

The object storage story

Object storage may be less familiar to some, but it’s similar to file storage in that an “object” can be a file. Unlike file storage though, a file can also be broken up into multiple objects, which makes large file transfers faster and much more reliable.

An “object” includes the user data and a unique identifier, which is sort of like a “valet parking ticket” to locate the data. And there is a metadata tag which describes the object contents. This is user defined, and can include whatever descriptors are required.

>See also: An industry perspective of the evolution of storage – hyperconvergence

Objects are stored in a flat address space that eliminates the hierarchy (and the capacity limitations) of a traditional file system. That address space is distributed across multiple physical devices, or “nodes.”

Data protection is built-in, using either striping across multiple nodes (called “erasure coding”) or old-fashioned replication. Data protection can be configured to survive whatever level of outage your data demands, surviving even an entire data centre outage, if needed.

Performance grows with added nodes, thus eliminating the NAS bottleneck. In this “shared-nothing cluster,” the workload is divided across nodes.

Object storage overcomes many of the limitations of file storage. A warehouse is a useful analogy. When you first put a box of files in your NAS warehouse, it seems like you have plenty of space.

But as your data grows, you’ll fill the NAS warehouse to capacity before you know it. Furthermore, to move those boxes around, you have a limited workforce. The crew will be the same on the first day as when it’s full. Consequently, it’s inevitable that work will slow over time.

Object storage, on the other hand, is like a warehouse with no roof. You can keep adding data infinitely – the sky’s the limit. And as data is added, new workers show up, too, to make sure the goods are always moving quickly.

>See also: How can an enterprise cloud platform support all parts of the business?

For small data sets, NAS systems shine with performance. But as capacity requirements grow, object storage delivers limitless scalability, saves on management workload, guarantees data durability, and ensures continued performance.

With data growing over 50% annually, and with over 80% that being unstructured data, the momentum behind object storage is unstoppable. It is the only technology that accommodates your ever-increasing capacity, while keeping a lid on the ultimate driver of technology choice: cost.

 

Sourced by Jon Toor, CMO, Cloudian

Avatar photo

Nick Ismail

Nick Ismail is a former editor for Information Age (from 2018 to 2022) before moving on to become Global Head of Brand Journalism at HCLTech. He has a particular interest in smart technologies, AI and...

Related Topics