Businesses that maintain large amounts of information are in a continuous search for new and more efficient methods of data management. This is exactly where data virtualization software comes in handy. So what is this innovative technology that a lot of people are talking about and how does it help us manage data? Let’s find out.

Contents:

Definition of Data Virtualization

 How Data Virtualization Works

Advantages of Data Virtualization

Data Virtualization: Top Vendors

 Existing Data Virtualization Tools

How to Create a Data Virtualization Tool

 Necessary Features for Data Virtualization Solutions

Challenges in Data Virtualization Software Development

 Ensuring Adequate Speed and Responsiveness of the System

 Efficient Management of Shared Resources

 Providing Tools for Migration from Legacy Systems

Conclusion

 

Written by:

Vasyl Tsyktor, Market Research Specialist

 

With the constantly increasing volume of information, data delivery has become a problem. This problem can be solved by data virtualization solutions. Surveys by data virtualization from Denodo show that only 11% of companies used data virtualization in 2011 but that this rate increased to 25% in 2015. So what is the reason for this growing use of data virtualization? In this article, we’ll cover the main aspects of data virtualization technology and the causes of its growth.

Definition of Data Virtualization

What is data virtualization? It’s a process of data management including querying, retrieving, modifying, and manipulating information in other ways without needing to know technical details such as source or format. Data virtualization uses virtualization technology to abstract data from its existing storage (a data silo) and presentation and provide a holistic view of that data regardless of the source.

The key features of data virtualization are:

  • Data abstraction
  • Data federation (combining multiple datasets into one)
  • Semantic integration (integrating data structures without losing meaning)
  • Data services
  • Data unification and security.

Data virtualization provides a view of requested data in a local database or web service, and its aim is to process large amounts of data. Data virtualization software usually supports nearly any type of data including XML, flat files, SQL, Web services, MDX, unstructured data in NoSQL, and Hadoop databases.

How Data Virtualization Works

How does data virtualization work? When a user submits a query, data virtualization software determines the optimal way to retrieve the requested data, taking into account its location. Then the data virtualization software takes the requested data, performs transformations, and returns it to the user. It’s worth mentioning that these tools don’t overload users with information such as the absolute path to the requested data or actions applied to retrieve it.

Advantages of Data Virtualization

Data virtualization is an effective solution, especially for organizations that require a tool to rapidly manipulate data and have a limited budget for third-party consultancy and infrastructure development. Thanks to data virtualization, companies can have simplified and standardized access to data that’s retrieved from its original source in real-time.

Furthermore, the original data sources remain secure since they’re accessed solely through integrated data views. Data virtualization can be used to manage corporate resources in order to increase operational efficiency and response times.

Benefits of data virtualization for companies include:

  • Faster access to data for decision-makers.
  • Increased operational efficiency due to fast formation of data stores.
  • Lower spending on data search and structuring solutions.
  • Advanced analytics due to powerful data compilation.
  • Reduced security risks with additional levels of access and permission management that separate original data silos from the user context.

The data virtualization market is constantly growing. Companies that use data virtualization technologies see the benefits in cost savings on data integration processes that allow them to connect shared data assets. Gartner predicts that 35% of enterprises worldwide will use this technology for their data integration processes by 2020. Let’s discuss the reasons for this increasing adoption.

Reduced Infrastructure Workload

Traditional data centers require focused data management, a stable network, and many system resources. All these components form a heavy system load and increase corporate expenses. Data virtualization allows companies to implement a simpler architecture in comparison with standard data warehouses. This approach leads to less data replication and, as a result, a smaller infrastructure workload.

Increased Speed of Data Access

Data virtualization is a more effective alternative for data federation that requires the use of extraction, transformation, and loading (ETL) tools. For data federation, creating physical data centers is quite time-consuming and can take up to several months. ETL tools use metadata extracted from original data sources and allow changes to quickly be made to data. Therefore, ETL tools ensure fast data aggregation and structuring.

Data Unification

Data virtualization unifies data by abstracting it from its location or structure. No matter where data is stored (in the cloud or on-site) and no matter if it’s structured or unstructured, you can retrieve it in one unified form. This increases the possibilities for further data processing and analysis.

Simplified Discovery

Data virtualization allows both applications and users to find, read, and query data using metadata. Metadata-based querying significantly speeds up data search through virtual data services and allows you to retrieve requested information much faster than with a traditional semantic matching approach.

Simplified Collaboration

Data unification leads to another significant advantage that lies in how technology companies can ensure efficient data sharing. With growing amounts of data, it becomes difficult to process data of different formats and from different sources. Data virtualization allows applications to access any dataset regardless its format or location.

Data Virtualization: Top Vendors

In the first quarter of 2015, Forrester listed the nine biggest data virtualization vendors worldwide. Furthermore, the research agency evaluated them according to 60 different criteria including strategy, current offerings, and market presence.

Forrester’s list of the top enterprise data virtualization vendors for Q1 2015 includes:

  • Oracle
  • Cisco Systems
  • Microsoft
  • SAS Institute
  • IBM
  • Denodo Technologies
  • Red Hat
  • Informatica
  • SAP

Forrester stated in 2015 that data virtualization vendors had significantly increased their cloud capabilities, scalability, and cybersecurity since the agency’s previous evaluation in 2012.

In its 2017 Market Guide for Data Virtualization, Gartner listed 22 data virtualization vendors. These vendors’ solutions offer diverse capabilities, although all of them support data virtualization technology.

  • Capsenta
  • Cisco
  • Data Virtuality (the University of Leipzig, Germany)
  • Denodo
  • IBM
  • Informatica
  • Information Builders
  • K2View
  • OpenLink Software
  • Oracle
  • Progress
  • Red Hat
  • Rocket Software
  • SAP
  • SAS
  • Stone Bond Technologies
  • Attunity
  • Cirro
  • Microsoft
  • Palantir
  • Talend
  • VirtDB

Let’s look at some representative data virtualization solutions and their general characteristics.

Existing Data Virtualization Tools

The data virtualization market is occupied by large software vendors such as Informatica, IBM, and Microsoft, as well as specialized vendors such as Denodo. The tools provided by large vendors cover nearly all possible tasks related to data virtualization. The software offered by smaller companies is mostly focused on advanced automation and improved integration of data sources.

Red Hat

JBoss Data Virtualization is a tool created by Red Hat. This solution is aimed at providing real-time access to data extracted from different sources, creating reusable data models and making them available for customers upon request. Red Hat’s solution is cluster-aware and provides numerous data security features such as SSL encryption and role-based access control.

Denodo

The Denodo data virtualization platform offers improved dynamic query optimization and provides services that handle data in various formats. It supports advanced caching and enhanced data processing techniques. The platform also ensures a high level of security by providing features such as pass-through authentication and granular data masking.

Delphix

Despite Gartner not including Delphix in its market guide, we’ve still decided to briefly cover this solution and note its main differences from the top vendors. In 2015, the Delphix startup raised $75 million in its last funding round to further improve its tool. The Delphix data virtualization solution captures data from corporate applications, masks sensitive information to ensure cybersecurity compliance, manages user access, and generates data copies for users. Its specialty is creating 30-day backups that don't exceed the size of the files on disk.

How To Create a Data Virtualization Tool

Data Virtualization allows users to get a virtual view of data and access it in numerous formats with business intelligence (BI) tools or other applications. This is just a tiny part of what data virtualization solutions should be able to do, however. In this section, we’ll discuss what aspects technology vendors should consider before building data virtualization solutions.

Necessary Features for Data Virtualization Solutions

Abstracting data from sources and publishing it to multiple data consumers in real-time allows businesses to collaborate and function iteratively, thereby considerably reducing turnaround time for data requests. However, a good data virtualization solution has to provide users with more capabilities than this. Let’s consider the most important ones.

Connectivity with Various Data Sources

Any data virtualization software contains a connectivity layer. This layer allows the solution to extract data across resources. The more data types, database management systems, and file systems your solution supports, the more useful it will be.

Components that ensure access to various data sources include:

  • Adapters and connectors
  • Cloud data stores
  • Database infrastructures (such as Hadoop and NoSQL)
  • Mainframes
  • Data warehouses and data marts
  • Applications (BI tools, CRM, SaaS, and ERP). 

You should implement various adapters in your software. For this purpose, you can create your own or license existing components.

Semantic Data Analysis

The most effective tools use a single interface and look at metadata to provide users with data they request. Your solution should contain analytics systems in order to save your customers time on structuring and analyzing large amounts of information.

Efficient Data Provisioning

Safe data provisioning is a significant part of ensuring cybersecurity. Data provisioning is the process of making data available to users and applications. Data security includes user authentication and enforcing group and user privileges. Your solution should provide role-based and schema-level security so you can wisely manage access to data for geographically distributed users and data sources. Reliable data provisioning will protect your data from uncontrolled access and eliminate risks related to intellectual property or confidential data.

Challenges in Data Virtualization Software Development

Although data virtualization offers numerous benefits, it comes with challenges too. According to a survey by Denodo, 46% of organizations that have implemented data virtualization solutions see their biggest challenge as adapting the software for departments besides IT. Of companies surveyed, 43% face particular issues with managing software performance. So what challenges can technology vendors face when they decide to build their own data virtualization solution?

Ensuring Adequate Speed and Responsiveness of the System

Business owners can have varying dynamic data needs, and you should take this into account. Fortunately, data virtualization is flexible enough to deliver data in multiple modes depending on how it has to be represented. For example, pricing analysts may need real-time sales and turnover information on some holidays when a one-day delay is not acceptable. Highly-optimized semantic tier processes will make your software more effective. Query caching, distributing processes and using memory grids and processing grids will help ensure faster data delivery.

Efficient Management of Shared Resources

Whether data is internal or external to your organization, stored in the cloud, in a big data source, or on a social media platform, your data virtualization solution should be able to access it, structure it, and make it conform to existing patterns so it’s easy to use. When a company uses shared data resources, it’s quite a challenging task to create a solution that can effectively manage them. That’s why you should implement data governance capabilities to ensure efficient data analysis and error tracking, especially when data is being pulled from a variety of sources.

Providing Tools for Migration from Legacy Systems

Data virtualization typically plays an instrumental role as an abstraction layer between old and new systems during migration of legacy systems. Therefore, your solution should contain tools for migrating from legacy systems. Users should be able to employ data virtualization for prototyping and integrate both kinds of systems when working with the parallel run architecture.

Conclusion

Data virtualization software development is time-consuming and requires deep expertise to create effective data virtualization tools. Professionalism, qualifications, and long-term experience in general software development are necessary skills for creating enterprise-level solutions.

Furthermore, a deep knowledge and understanding of the needs of technology enterprises will allow you to build a useful tool to help organizations process data.

Data virtualization and cloud computing are among our specialties at Apriorit. We’ve helped various technology vendors develop advanced data processing solutions. Send us your request for proposal and we’ll get back to you and discuss what we can offer for your project.

One of the most noticeable recent trends in web development is the single-page application, or SPA for short. This concept swept like a hurricane through an industry longing for new and exciting ways to make user experiences slicker.

Software industry is actively growing for many years with new technology startups appearing each day. Besides major changes it brings to the world economy and traditional industries, this rapid and constant growth affects software engineering environment itself and in particular, labor market.

Project management outsourcing, especially on the Information Technology (IT) side, has been a burgeoning industry for some time. External vendors leverage a range of expertise, and for the client having capable engineers and development personnel when creating a new software application or maintaining an existing one is literally worth its weight in gold.

Despite the fact that technology itself was available for a while, cloud services and related SaaS platform development only relatively recently entered our business and daily lives. Cloud boom a couple of years ago saw many companies moving their whole IT infrastructures to the cloud. At the same time, consumer cloud products, such as cloud storages, became very popular. Nowadays, using cloud computing technology is a pre-requisite for many developers who wish to create applications that will stay competitive in the modern market, while at the same time a huge number of IT companies, including some of the largest in the world, make resources of their own datacenters available to the consumer via cloud technology. It also triggers a rapid growth of the market of SaaS development services.

Cloud computing and virtualization are two main terms that people encounter when looking to optimize and modernize IT infrastructure of their organization. Both terms are often used in conjunction with one another and sometimes, erroneously, even interchangeably. In reality both virtualization and cloud computing are two very different concepts each with their own set of advantages and drawbacks, designed to tackle different challenges, although one is often used as a part of the other.

As the 2016 year begins, we can read a series of traditionally published recent trend analysis and predictions made by industry experts after watching the IT sector, analyzing statistics, and conducting surveys. Global IT outsourcing trend analysis is represented by CIO magazine, KPMG Shared Service & Outsourcing Institute, Gartner and others. Let’s try to analyze what said findings mean for the software R&D service providers and what specific software development outsourcing trends we can mark out.

The wide popularity of agile methodology in current software development is hard to overestimate. Advent of agile techniques allowed to save costs and greatly shortened time to market for many companies. However, one of the basic principle of agile methodology is importance of face-to-face communications, which doesn’t jell well with teams where members are geographically dispersed. Management of agile distributed teams is always a struggle, but reality of the situation is that most companies employ a distributed team in one form or the other, either through the use of outsourcing, or simply by the virtue of some people working from home or from different city.

The white paper describes the technology of code protection for Linux applications, which is based on the so-called “Nanomite” approach applied previously for Windows systems.

It is one of the modern antidebugging methods that can be also effectively applied for the process antidumping.

Apriorit Code Protection for Linux is provided as commercial SDK with various types of licensing.

 

Project Description

The project was written for Linux OS 32-bit applications. But the principles can easily be implemented for other operating systems, so further development is planned.

First, we will take a look at creating a custom debugger for Linux. After that, we will move on to the implementation of nanomites. Binutils and Perl are used for the compilation of the project.

We apply the combination of two techniques: Nanomites and Debug Blocker.

Nanomites are code segments, containing some key application logic, marked with specific markers in source files. Protector cuts such segments out from the protected program for packing. When unpacking, they are obfuscated, written to the allocated memory, and jumps replace them in the original code. The table of conditional and unconditional jumps is built, and it contains not only nanomite jumps abut also some non-existent "trash" ones. Such "completness" is a serious obstruction to recover this table.

Debug Blocker implements parent process protection. Protected program is started as a child process, and protector - parent process - attaches to it for debug. Thus, for a third party, it is possible to debug only parent process. Combined with nanomite technology, Debug Blocker creates reliable protection for an application, making its debugging and reversing very complicated and time-consuming.

Read more about Nanomite Technology in our white paper Nanomite and Debug Blocker Technologies: Scheme, Pros, and Cons

Both techniques were successfully used in commercial Windows protectors. Apriorit Code Protection is the first product to implement them for Linux application protection.

General Idea

Apriorit Code Protection Scheme

Apriorit Code Protection includes 2 main components:

  1. Nanomites: a static library that contains the debugger process logic.
  2. Nanomites Debugger: a debugger executable file, it is compiled with the Nanomites library.

Also we provide Nanomites Demo: a demo application protected by nanomites.

There’s also a script collection for adding the nanomites to an application and for creating nanomites tables.

Protected Application Creation Sequence

An application with an –S key for creating an assembler listing is created;

The assembler listing is analyzed with Perl script. All jump and call instructions (e.g., jmp, jz, jne, call, etc.) are processed and replaced with instructionOffsetLabel(N): int 3;

After that, the user application, which consists of modified assembler listings, is compiled.

With the help of a Perl script, a compiled application is parsed and the table of nanomites is built.

Debugger Library Description

Our debugger is based on the ptrace (process trace) system call, which exists in some Unix-like systems (including Linux, FreeBSD, Mac OS X). It allows tracing or debugging the selected process. We can say that ptrace provides the full control over a process: we may change the application execution flow, display and change values in memory or registry states. It should be mentioned that it provides us no additional permissions: possible actions are limited by the permissions of a started process. Moreover, when a program with setuid bit is traced, this bit doesn’t work as the privileges are not escalated.

After the demo application is processed with scripts, it is not independent anymore, and if it is started without a debugger, the «segmentation fault» appears at once. The debugger starts the demo application from now on. For this purpose, a child process is created in the debugger, and then parent process attaches to it. All debugging events from the child process are processed in a cycle. It includes all jump events; parent process analyzes nanomite table and flag table to perform correct action.

The Advantages of Apriorit Solution Compared to Armadillo

Armadillo (also known as SoftwarePassport) is a commercial protector developed for Windows application protection. It introduced nanomite approach, and also uses Debug Blocker technology (protection by parent process).

In Armadillo, the binary code is modified. That’s why when a 2-5 bytes long jump instruction is replaced with a shorter 1 byte long int 3 (0xcc) instruction, some free space remains. Correspondingly, we need to write the original jump instruction over int 3 to restore a nanomite.

We change the code on the sources level in our approach. That’s why the nanomite will be 1 byte long. Correspondingly, we won’t be able to restore the nanomite by writing the original instruction over it. And we cannot extend the code in the place of the nanomite as all relative jumps would be broken. But there is a way to restore our nanomites, for example the following.

A Way to Recover Apriorit Nanomites

A hacker can create an additional section in the executable file, then find the nanomite and obtain its jump instruction and jump address.

Then the restoration goes as follows:

Nanomite Recover

Such solution is complex in implementation. Firstly, a disassembler engine is required for automation, secondly, the moved instructions may contain jump instructions with relative jumps, which will require corrections.

Learn more about Linux Anti-debugging SDK!

Subscribe to updates