Businesses that maintain large amounts of information are in a continuous search for new and more efficient methods of data management. This is exactly where data virtualization software comes in handy. So what is this innovative technology that a lot of people are talking about and how does it help us manage data? Let’s find out.
With the constantly increasing volume of information, data delivery has become a problem. This problem can be solved by data virtualization solutions. Surveys by data virtualization from Denodo show that only 11% of companies used data virtualization in 2011 but that this rate increased to 25% in 2015. So what is the reason for this growing use of data virtualization? In this article, we’ll cover the main aspects of data virtualization technology and the causes of its growth.
What is data virtualization? It’s a process of data management including querying, retrieving, modifying, and manipulating information in other ways without needing to know technical details such as source or format. Data virtualization uses virtualization technology to abstract data from its existing storage (a data silo) and presentation and provide a holistic view of that data regardless of the source.
The key features of data virtualization are:
- Data abstraction
- Data federation (combining multiple datasets into one)
- Semantic integration (integrating data structures without losing meaning)
- Data services
- Data unification and security.
Data virtualization provides a view of requested data in a local database or web service, and its aim is to process large amounts of data. Data virtualization software usually supports nearly any type of data including XML, flat files, SQL, Web services, MDX, unstructured data in NoSQL, and Hadoop databases.
How does data virtualization work? When a user submits a query, data virtualization software determines the optimal way to retrieve the requested data, taking into account its location. Then the data virtualization software takes the requested data, performs transformations, and returns it to the user. It’s worth mentioning that these tools don’t overload users with information such as the absolute path to the requested data or actions applied to retrieve it.
Data virtualization is an effective solution, especially for organizations that require a tool to rapidly manipulate data and have a limited budget for third-party consultancy and infrastructure development. Thanks to data virtualization, companies can have simplified and standardized access to data that’s retrieved from its original source in real-time.
Furthermore, the original data sources remain secure since they’re accessed solely through integrated data views. Data virtualization can be used to manage corporate resources in order to increase operational efficiency and response times.
Benefits of data virtualization for companies include:
- Faster access to data for decision-makers.
- Increased operational efficiency due to fast formation of data stores.
- Lower spending on data search and structuring solutions.
- Advanced analytics due to powerful data compilation.
- Reduced security risks with additional levels of access and permission management that separate original data silos from the user context.
The data virtualization market is constantly growing. Companies that use data virtualization technologies see the benefits in cost savings on data integration processes that allow them to connect shared data assets. Gartner predicts that 35% of enterprises worldwide will use this technology for their data integration processes by 2020. Let’s discuss the reasons for this increasing adoption.
Traditional data centers require focused data management, a stable network, and many system resources. All these components form a heavy system load and increase corporate expenses. Data virtualization allows companies to implement a simpler architecture in comparison with standard data warehouses. This approach leads to less data replication and, as a result, a smaller infrastructure workload.
Data virtualization is a more effective alternative for data federation that requires the use of extraction, transformation, and loading (ETL) tools. For data federation, creating physical data centers is quite time-consuming and can take up to several months. ETL tools use metadata extracted from original data sources and allow changes to quickly be made to data. Therefore, ETL tools ensure fast data aggregation and structuring.
Data virtualization unifies data by abstracting it from its location or structure. No matter where data is stored (in the cloud or on-site) and no matter if it’s structured or unstructured, you can retrieve it in one unified form. This increases the possibilities for further data processing and analysis.
Data virtualization allows both applications and users to find, read, and query data using metadata. Metadata-based querying significantly speeds up data search through virtual data services and allows you to retrieve requested information much faster than with a traditional semantic matching approach.
Data unification leads to another significant advantage that lies in how technology companies can ensure efficient data sharing. With growing amounts of data, it becomes difficult to process data of different formats and from different sources. Data virtualization allows applications to access any dataset regardless its format or location.
Cloud computing vs Virtualization
In the first quarter of 2015, Forrester listed the nine biggest data virtualization vendors worldwide. Furthermore, the research agency evaluated them according to 60 different criteria including strategy, current offerings, and market presence.
Forrester’s list of the top enterprise data virtualization vendors for Q1 2015 includes:
- Cisco Systems
- SAS Institute
- Denodo Technologies
- Red Hat
Forrester stated in 2015 that data virtualization vendors had significantly increased their cloud capabilities, scalability, and cybersecurity since the agency’s previous evaluation in 2012.
In its 2017 Market Guide for Data Virtualization, Gartner listed 22 data virtualization vendors. These vendors’ solutions offer diverse capabilities, although all of them support data virtualization technology.
- Data Virtuality (the University of Leipzig, Germany)
- Information Builders
- OpenLink Software
- Red Hat
- Rocket Software
- Stone Bond Technologies
Let’s look at some representative data virtualization solutions and their general characteristics.
The data virtualization market is occupied by large software vendors such as Informatica, IBM, and Microsoft, as well as specialized vendors such as Denodo. The tools provided by large vendors cover nearly all possible tasks related to data virtualization. The software offered by smaller companies is mostly focused on advanced automation and improved integration of data sources.
JBoss Data Virtualization is a tool created by Red Hat. This solution is aimed at providing real-time access to data extracted from different sources, creating reusable data models and making them available for customers upon request. Red Hat’s solution is cluster-aware and provides numerous data security features such as SSL encryption and role-based access control.
The Denodo data virtualization platform offers improved dynamic query optimization and provides services that handle data in various formats. It supports advanced caching and enhanced data processing techniques. The platform also ensures a high level of security by providing features such as pass-through authentication and granular data masking.
Despite Gartner not including Delphix in its market guide, we’ve still decided to briefly cover this solution and note its main differences from the top vendors. In 2015, the Delphix startup raised $75 million in its last funding round to further improve its tool. The Delphix data virtualization solution captures data from corporate applications, masks sensitive information to ensure cybersecurity compliance, manages user access, and generates data copies for users. Its specialty is creating 30-day backups that don't exceed the size of the files on disk.
Data Virtualization allows users to get a virtual view of data and access it in numerous formats with business intelligence (BI) tools or other applications. This is just a tiny part of what data virtualization solutions should be able to do, however. In this section, we’ll discuss what aspects technology vendors should consider before building data virtualization solutions.
Abstracting data from sources and publishing it to multiple data consumers in real-time allows businesses to collaborate and function iteratively, thereby considerably reducing turnaround time for data requests. However, a good data virtualization solution has to provide users with more capabilities than this. Let’s consider the most important ones.
Any data virtualization software contains a connectivity layer. This layer allows the solution to extract data across resources. The more data types, database management systems, and file systems your solution supports, the more useful it will be.
Components that ensure access to various data sources include:
- Adapters and connectors
- Cloud data stores
- Database infrastructures (such as Hadoop and NoSQL)
- Data warehouses and data marts
- Applications (BI tools, CRM, SaaS, and ERP).
You should implement various adapters in your software. For this purpose, you can create your own or license existing components.
The most effective tools use a single interface and look at metadata to provide users with data they request. Your solution should contain analytics systems in order to save your customers time on structuring and analyzing large amounts of information.
Safe data provisioning is a significant part of ensuring cybersecurity. Data provisioning is the process of making data available to users and applications. Data security includes user authentication and enforcing group and user privileges. Your solution should provide role-based and schema-level security so you can wisely manage access to data for geographically distributed users and data sources. Reliable data provisioning will protect your data from uncontrolled access and eliminate risks related to intellectual property or confidential data.
Although data virtualization offers numerous benefits, it comes with challenges too. According to a survey by Denodo, 46% of organizations that have implemented data virtualization solutions see their biggest challenge as adapting the software for departments besides IT. Of companies surveyed, 43% face particular issues with managing software performance. So what challenges can technology vendors face when they decide to build their own data virtualization solution?
Business owners can have varying dynamic data needs, and you should take this into account. Fortunately, data virtualization is flexible enough to deliver data in multiple modes depending on how it has to be represented. For example, pricing analysts may need real-time sales and turnover information on some holidays when a one-day delay is not acceptable. Highly-optimized semantic tier processes will make your software more effective. Query caching, distributing processes and using memory grids and processing grids will help ensure faster data delivery.
Whether data is internal or external to your organization, stored in the cloud, in a big data source, or on a social media platform, your data virtualization solution should be able to access it, structure it, and make it conform to existing patterns so it’s easy to use. When a company uses shared data resources, it’s quite a challenging task to create a solution that can effectively manage them. That’s why you should implement data governance capabilities to ensure efficient data analysis and error tracking, especially when data is being pulled from a variety of sources.
Data virtualization typically plays an instrumental role as an abstraction layer between old and new systems during migration of legacy systems. Therefore, your solution should contain tools for migrating from legacy systems. Users should be able to employ data virtualization for prototyping and integrate both kinds of systems when working with the parallel run architecture.
Data virtualization software development is time-consuming and requires deep expertise to create effective data virtualization tools. Professionalism, qualifications, and long-term experience in general software development are necessary skills for creating enterprise-level solutions.
Furthermore, a deep knowledge and understanding of the needs of technology enterprises will allow you to build a useful tool to help organizations process data.
Data virtualization and cloud computing are among our specialties at Apriorit. We’ve helped various technology vendors develop advanced data processing solutions. Send us your request for proposal and we’ll get back to you and discuss what we can offer for your project.