Studies suggest that organizations that have multiple copies of data buy, administer and use two to fifty times the amount of storage space they’d need with data reduplication. It’s no wonder than data redundancy is a major contributor to explosive growth.
At the outset, data reduplication reduced redundancy in only specific circumstances, such as full backups, VMware images and email attachments. However, duplicate data would still persevere. This is mainly because of the multiplication of test and development data across an organization over time. Backup, archiving, and replication create numerous copies that can be found throughout an organization. Add to that the fact that users often copy data to locations for their own convenience.
Organizations are now realizing these facts, and are seeing data reduplication as a mandatory and integrated element of their overall IT strategy.
Essentially there are two methods of reducing the cost of storage. First, you can use a lower-cost storage platform, but that opens numerous additional problems that I won’t go into here. Second, you can leverage a sound reduplication strategy designed to reduce required storage and data growth.
Data reduplication can reduce your data storage costs by lowering the amount of disk space required to store data – whether that be data backups or primary production data. This article highlights 5 best practices to help you select and implement the best data reduplication solution for your environment.
- Consider the broad implications of reduplication. You’ll want to consider how a reduplication strategy fits within your entire data management and storage strategy, accounting for tradeoffs in things like computational time, accuracy, index size, and the level of reduplication detected and the scalability of the solution.
- Learn what data does not deduce well. Human created data deduces differently than data created by computers, so you’ll want to consider what types of data to avoid reduplication efforts.
- Don’t obsess over space reduction ratios. The length of time that data is retained affects your space reduction ratios, but rather than increasing the number of full backups, consider increasing your backup retention period.
- Don’t use multiplexing if you’re backing up to a VTL. Multiplexing in a virtual tape library (VTL) wastes computing cycles.
- Pilot multiple systems before you select your system. This will ensure you that the reduplication solution you choose integrates best within your IT environment and the data currently in-house.



