Created in 2010, Mozilla’s Rust programming language is fast increasing in popularity. Compared to other languages, Rust ensures better performance and improved software security. We’d like to share our knowledge about practical applications of Rust features. This article is the second part of our Rust Programming Language Tutorial, and is written for developers who want to use Rust for software programming.

This part of the Rust programming tutorial describes Rust features that guarantee memory safety. The Rust language achieves memory safety by using ownership and borrowing, mutability and aliasing, option types for pointers, and initialized variables.

Written by:

Alexey Lozovsky,

Software Designer

 

Contents:

Guaranteed Memory Safety

   Ownership

   Borowing

   Mutability and Aliasing

   Option Types instead of Null Pointers

   No Uninitialized Variables

Conclusion

Guaranteed Memory Safety

Memory safety is the most prized and advertised feature of Rust. In short, Rust guarantees the absence (or at least the detectability) of various memory-related issues:

  • segmentation faults
  • use-after-free and double-free bugs
  • dangling pointers
  • null dereferences
  • unsafe concurrent modification
  • buffer overflows

These issues are declared as undefined behaviors in C++, but programmers are mostly on their own to avoid them. On the other hand, in Rust, memory-related issues are either immediately reported as compile-time errors or if not then safety is enforced with runtime checks.

Ownership

The core innovation of Rust is ownership and borrowing, closely related to the notion of object lifetime. Every object has a lifetime: the time span in which the object is available for the program to use. There’s also an owner for each object: an entity which is responsible for ending the lifetime of the object. For example, local variables are owned by the function scope. The variable (and the object it owns) dies when execution leaves the scope.

1   fn f() {
2       let v = Foo::new(); // ----+ v's lifetime
3                           //     |
4       /* some code */     //     |
5   }                       // <---+

In this case, the object Foo is owned by the variable v and will die at line 5, when function f() returns.

Ownership can be transferred by moving the object (which is performed by default when the variable is assigned or used):

1  fn f() {
2      let v = Foo::new();     // ----+ v's lifetime
3       {                       //     |
4           let u = v;          // <---X---+ u's lifetime
5                               //         |
6           do_something(u);    // <-------X
7       }                       //
8   }                           //

Initially, the variable v would be alive for lines 2 through 7, but its lifetime ends at line 4 where v is assigned to u. At that point we can’t use v anymore (or a compiler error will occur). But the object Foo isn’t dead yet; it merely has a new owner u that is alive for lines 4 through 6. However, at line 6 the ownership of Foo is transferred to the function do_something(). That function will destroy Foo as soon as it returns.

Borrowing

But what if you don’t want to transfer ownership to the function? Then you need to use references to pass a pointer to an object instead:

1   fn f() {
2       let v = Foo::new();     // ---+ v's lifetime
3                               //    |
4       do_something(&v);       // :--|----.
5                               //    |     } v's borrowed
6       do_something_else(&v);  // :--|----'
7   }                           // <--+

In this case, the function is said to borrow the object Foo via references. It can access the object, but the function doesn’t own it (i.e. it can’t destroy it). References are objects themselves, with lifetimes of their own. In the example above, a separate reference to v is created for each function call, and that reference is transferred to and owned by the function call, similar to the variable u above.

It’s expected that a reference will be alive for at least as long as the object it refers to. This notion is implicit in C++ references, but Rust makes it an explicit part of the reference type:

fn do_something<’a>(v: &’a Foo) {
    // something with v
}

The argument v is in fact a reference to Foo with the lifetime ‘a, where ‘a is defined by the function do_something() as the duration of its call.

C++ can handle simple cases like this just as well. But what if we want to return a reference? What lifetime should the reference have? Obviously, not longer than the object it refers to. However, since lifetimes aren’t part of C++ reference types, the following code is syntactically correct for C++ and will compile just fine:

const Foo& some_call(const Foo& v)
{
    Foo w;
 
    /* 10 lines of complex code using v and w */
 
    return w; // accidentally returns w instead of v
}

Though this code is syntactically correct, however, it is semantically incorrect and has undefined behavior if the caller of some_call() actually uses the returned reference. Such errors may be hard to spot in casual code review and generally require an external static code analyzer to detect.

Consider the equivalent code in Rust:

fn some_call(v: &Foo) -> &Foo {// ------------------+ expected
    let w = Foo::new();        // ---+ w's lifetime | lifetime
                               //    |              | of the
    return &w;                 // <--+              | returned
}                              //                   | value
                               // <-----------------+

The returned reference is expected to have the same lifetime as the argument v, which is expected to live longer than the function call. However, the variable w lives only for the duration of some_call(), so references to it can’t live longer than that. The borrow checker detects this conflict and complains with a compilation error instead of letting the issue go unnoticed.

error[E0597]: `w` does not live long enough
  --> src/main.rs:10:13
   |
10 |     return &w;
   |             ^ does not live long enough
11 | }
   | - borrowed value only lives until here
   |

The compiler is able to detect this error because it tracks lifetimes explicitly and thus knows exactly how long values must live for the references to still be valid and safe to use. It’s also worth noting that you don’t have to explicitly spell out all lifetimes for all references. In many cases, like in the example above, the compiler is able to automatically infer lifetimes, freeing the programmer from the burden of manual specification.

Mutability and Aliasing

Another feature of the Rust borrow checker is alias analysis, which prevents unsafe memory modification. Two pointers (or references) are said to alias if they point to the same object. Let’s look at the following Rust example:

Foo c;
Foo *a = &c;
const Foo *b = &c;

Here, pointers a and b are aliases of the Foo object owned by c. Modifications performed via a will be visible when b is dereferenced. Usually, aliasing doesn’t cause errors, but there are some cases where it might.

Consider the memcpy() function. It can and is used for copying data, but it’s known to be unsafe and can cause memory corruption when applied to overlapping regions:


char array[5] = { 1, 2, 3, 4, 5 }; const char *a = &array[0]; char *b = &array[2]; memcpy(a, b, 3);

In the sample above, the first three elements are now undefined because their values depend on the order in which memcpy() performs the copying:

{ 3, 4, 5, 4, 5 }    // if the elements are copied from left to right
{ 5, 5, 5, 4, 5 }    // if the elements are copied from right to left
  

The ultimate issue here is that the program contains two aliasing references to the same object (the array), one of which is non-constant. If such programs were syntactically incorrect then memcpy() (and any other function with pointer arguments as well) would always be safe to use.

Rust makes it possible by enforcing the following rules of borrowing:

  1. At any given time, you can have either but not both of:
    • one mutable reference
    • any number of immutable references
  2. References must always be valid.

The second rule relates to ownership, which was discussed in the previous section. The first rule is the real novelty of Rust.

It’s obviously safe to have multiple aliasing pointers to the same object if none of them can be used to modify the object (i.e. they are constant references). If there are two mutable references, however, then modifications can conflict with each other. Also, if there is a const-reference A and a mutable reference B, then presumably the constant object as seen via A can in fact change if modifications are made via B. But it’s perfectly safe if only one mutable reference to the object is allowed to exist in the program. The Rust borrow checker enforces these rules during compilation, effectively making each reference act as a read-write lock for the object.

The following is the equivalent of memcpy() as shown above:

let mut array = [1, 2, 3, 4, 5];
let a = &mut array[0..2];
let b = &    array[2..4];
a.copy_from_slice(b);

This won’t compile in Rust, and will throw the following error:

error[E0502]: cannot borrow `array` as immutable because it is also borrowed as mutable
 --> src/main.rs:4:14
  |
3 |     let a = &mut array[0..2];
  |                  ----- mutable borrow occurs here
4 |     let b = &array[2..4];
  |              ^^^^^ immutable borrow occurs here
5 |     a.copy_from_slice(b);
6 | }
  | - mutable borrow ends here

This error signifies the restrictions imposed by the borrow checker. Multiple immutable references are fine. One mutable reference is fine. Different references to different objects are fine. However, you can’t simultaneously have a mutable and an immutable reference to the same object because this is possibly unsafe and can lead to memory corruption.

Not only does this restriction prevent possible human errors, but it in fact enables the compiler to perform some optimizations that are normally not possible in the presence of aliasing. The compiler is then free to use registers more aggressively, avoiding redundant memory access and leading to increased performance.

Option Types instead of Null Pointers

Another common issue related to pointers and references is null pointer dereferencing. Tony Hoare calls the invention of the null pointer value his billion-dollar mistake, and an increasing number of languages are including mechanisms to prevent it (for example, Nullable types in Java and C#, std::optional type since C++17).

Rust uses the same approach as C++ for references: they always point to an existing object. There are no null references and hence no possible issues with them. However, smart pointers aren’t references and may not point to objects; there are also cases when you might like to pass a reference to an object and make no reference.

Instead of using nullable pointers, Rust has the Option type. This type has two constructors:

Some (value) – to declare some value

None – to declare the absence of a value

None is functionally equivalent to a null pointer (and in fact has the same representation), while Some carries a value (for example, a reference to some object).

The main advantage of Option before pointers is that it’s not possible to accidentally dereference None, and thus null pointer dereferencing errors are eliminated. To use the value stored in Option, you need to use safe access patterns:

match option_value {
    Some(value) => {
        // use the contained value
    }
    None => {
        // handle absence of value
    }
}
 
if let Some(value) = option_value {
    // use the contained value
}
 
let value = option_value.unwrap(); // throws an exception if option_value is None

Every use of Option acts as a clear marker, so that no object may be present above, as it requires handling in both cases. Furthermore, the Option type has many utility methods that make it more convenient to use:

// Falling back to a default value:
let foo_enabled: bool = configuration.foo_enabled.unwrap_or(true);
 
// Applying a conversion if the Option contains a value
// or leaving it None if Option is None:
let maybe_length: Option<usize> = maybe_string.map(|s| s.len());
 
// Options can be compared for equality and ordering (given that
// the wrapped values can be compared).
let maybe_number_a = Some(1);
let maybe_number_b = Some(9);
let maybe_number_c = None;
assert_eq!(maybe_number_a < maybe_number_b, true);
assert_eq!(maybe_number_a < maybe_number_c, false); // None is less than Some
assert_eq!(maybe_number_c < maybe_number_b, true);
assert_eq!(maybe_number_a != maybe_number_c, true);

No Uninitialized Variables

Another possible issue with so-called plain old types in C++ is usage of uninitialized variables. Rust requires variables to be initialized before they are used. Most commonly, this is done when variables are declared:

let array_of_ten_zeros = [0; 10];

But it’s also possible to first declare a variable and then initialize it later:

let mut x;
// x is left truly uninitialized here; the assembly
// will not contain any actual memory assignment
 
loop {
    if something_happens() {
        x = 1;
        println!(“{}”, x); // This is okay because x is initialized now
    }
 
    println!(“{}”, x);     // But this line will cause a compilation error
                           // because x may still not be initialized here
 
    if some_condition() {
        x = 2;
        break;
    }
    if another_condition() {
        x = 3;
        break;
    }
}
 
// The compiler knows that it is not possible to exit the loop
// without initializing x, so this is totally safe:
println!(“{}”, x);

Whatever you do, keep in mind that with Rust if you forget to initialize a variable and then accidentally use a garbage value, you’ll get a compilation error. All structure fields must be initialized at construction time as well:

let foo = Foo {
    bar: 5,
    baz: 10,
};

If a field is added to the structure at some point later than all existing constructors, it will generate compilation errors that must be fixed.

Conclusion

At Apriorit, we work with many popular software programming languages. Based on our experience with Rust, we have created this tutorial to help you get familiar with the basics of the language.

In this second part of our Rust programming language tutorial, we provided a detailed overview of Rust features that ensure memory safety, such as ownership and borrowing, mutability and aliasing, option types for pointers, and initialized variables. All these features allow programmers to avoid undefined behaviors that are typical in C++.

If you want to learn more about other features of the Rust language, then check out our next article.

The Internet of Things has been one of the fastest developing technology trends in recent years. However, increasing security concerns make many potential adopters refrain from using IoT devices. This article covers the challenges of Internet of Things security and possible ways to ensure the security of IoT systems. It’s written for developers of IoT products who want to ensure proper safety.

 

Written by

Anna Bryk,

Market Research Specialist

 

Contents:

What is IoT?

The Importance of IoT Security

5 Most Common Cybersecurity Challenges with IoT Systems

    1. Firmware vulnerabilities

    2. Insecure communications

    3. Data leaks from IoT systems

    4. Susceptibility to malware and other abuses

    5. Potential for service disruption

7 Most Common Types of Attacks in IoT Devices

How to Ensure Security of IoT Systems

    1. Securing devices

    2. Securing networks

    3. Securing data

Conclusion

What is IoT?

The Internet of Things, or IoT, is a network of smart devices that connect to each other in order to exchange data via the internet without human intervention. Devices in IoT systems range from simple sensors to DNA analysis devices and refrigerators. IoT technology requires several smart devices that interact closely with each other. The architecture for IoT systems requires wireless networks and a cloud database for communication.

IoT devices can perform different functions, but their main purpose is collecting, storing, and processing data about the environment through sensors in order to pass along this data to other devices.

To ensure data processing, IoT systems include the following components:

  •  Smart devices with embedded processors
  •  Gateways with edge processors (smartphones, hubs, or servers)
  •  Cloud or data centers with remote servers that exchange data through wireless connections

Internet of Things technologies are used in nearly all spheres: in homes, manufacturing, the automotive industry, healthcare, energy, agriculture, and building automation.

The Importance of IoT Security

While IoT devices make our lives easier and more comfortable, they also face risks of cyber attacks. The security of IoT systems remains a blind spot in most enterprises and households, but it’s naive to underestimate the possible threats from these devices being compromised.

In 2014, Hewlett Packard sponsored a survey which revealed that seven of ten popular smart devices were vulnerable to potential attacks. Most of the security risks identified were connected with unencrypted data, collection of personal data, vulnerable user interfaces, and unsafe communications.

While the market for IoT devices is rapidly growing, the risks of possible attacks are also increasing. According to Gartner, the Internet of Things will include 26 billion connected devices and reach $840.5 million of IoT security spendings by 2020. At the same time, more than 25 percent of identified attacks on enterprises will involve IoT systems, stimulating companies to increase their budgets for IoT security.

Meanwhile, the ways in which hackers can use IoT systems to compromise sensitive information are growing. For instance, some IoT baby monitors contain vulnerabilities allowing unauthorized monitoring, while smartwatches can inform hackers about your location, health data, and even what you’re typing.

Needless to say, since some IoT devices are used for healthcare or human protection, their security can be crucial for people’s lives.

5 Most Common Cybersecurity Challenges with IoT Systems

The main challenge with ensuring the security of IoT systems is that most traditional IoT devices are resource-constrained and have limited features, so they can’t run traditional security functions. Here’s a list of other common challenges in IoT security:

1. Firmware vulnerabilities

Many IoT devices become vulnerable to cyber attacks because their firmware isn’t updated. Even if firmware is attack-resistant when the device first goes online, vulnerabilities may be discovered over time. Thus, devices become less secure without constant firmware updates.

Automatic updates should be enabled by default; even if new firmware updates are issued, not all consumers implement them manually.

2. Insecure communications

Most existing security functions were initially designed for desktop computers and are difficult to implement on resource-constrained IoT devices. Thus, such security vulnerabilities as data leaks and unencrypted data are still common among IoT devices.

Hackers can easily perform man-in-the-middle attacks to compromise an update procedure and take control of your device if it doesn’t use encryption and authentication mechanisms. Attackers can even install malware or change a device’s functionality. Moreover, cleartext messages sent by your device can be captured by other devices and become available to attackers.

Additionally, connected devices that aren’t isolated are susceptible to attacks from each other. For instance, if attackers gain access to one device in your home network, they can carry out malicious activity to compromise other devices.

3. Data leaks from IoT systems

By capturing unencrypted messages from your product, hackers can get access to information about your location, bank accounts, health, and more. However, insecure communication isn’t the only way in which attackers can gather personal information about users. All data is transferred via the cloud, and cloud-hosted services can also experience external attacks. Thus, data leaks are possible both from devices themselves and in the cloud.

4. Susceptibility to malware and other abuse

IoT devices are vulnerable to malware that can be used by attackers to change functions, collect personal data, and launch other attacks. Moreover, devices can come infected with viruses out of the box if manufacturers don’t ensure adequate security of the supplied software. And by the time consumers begin using an IoT device, new vulnerabilities in the firmware may already have been discovered, requiring an update to make the device secure again. Vulnerabilities such as weak authentication or unused and unsecured ports that are open by default may also expose a device to abuse.

5. Potential for service disruption

One of the security challenges with IoT devices is the risk of service interruption caused either by physical damage or the loss of network connectivity or cloud support.

There’s a risk that a device may be stolen, compromised, or even physically damaged. Moreover, its connection to the network may be interrupted because of Wi-Fi or radio interference or a power outage. As for cloud connections, there are many reasons why your device may lose its link to the cloud. There can be errors in cloud software, loss of internet connection, or simply a decision by a user to stop using a cloud-based application.

Inoperable or damaged products may lead to improper operation of the local network, property damage, and intrusions.

7 Most Common Types of Attacks on IoT Devices

Because of their security vulnerabilities, IoT systems can be susceptible to various cyber attacks. Here’s a list of the most common types of attacks on IoT devices.

  1. Denial-of-service attacks. IoT devices have limited processing power, making them highly vulnerable to denial-of-service attacks. During a DoS attack, a device’s ability to respond to legitimate requests is compromised due to a flood of fake traffic.
  2. Denial-of-sleep attacks. Sensors connected to a wireless network should continuously monitor the environment, so they’re often powered by batteries that don’t require frequent charging. Capacity is preserved by keeping the device in sleep mode most of the time. Sleep and awake times are controlled according to the communication needs of different protocols, such as MAC. Attackers may exploit vulnerabilities of the MAC protocol to carry out a denial-of-sleep attack. This type of attack drains battery power and thus disables the sensor.
  3. Man-in-the-middle attacks. Unencrypted communications or poorly protected IoT networks can be exploited by attackers who insert traffic between devices and cloud-based applications.
  4. Device spoofing. This is possible when a device has improperly implemented digital signatures and encryption. For instance, a poor public key infrastructure (PKI) may be exploited by hackers to “spoof” a network device and disrupt IoT deployments.
  5. Malware. Given a lack of software updates, attackers can install malware on a device and use it to perform malicious activity. Such devices as broadband routers, point-of-sale systems, and health devices are susceptible to malicious software.
  6. Physical intrusion. Though most attacks are performed remotely, physical intrusion of a device is also possible if it’s stolen. Attackers can tamper with device components to make them operate in an unintended way.
  7. Application-based attacks. These types of attacks are possible when there are security vulnerabilities in device firmware or software used on embedded systems or weaknesses in cloud servers or backend applications.

How to Ensure Security of IoT Systems

The main problem with ensuring security of IoT systems is that traditional security technologies aren’t designed for power-constrained devices, low-bandwidth networks, and resource-limited platforms. Moreover, security functions of IoT products may also increase their cost and development time, which is definitely not a driver of business.

IoT security best practices seek to increase the security of three main components of IoT systems: devices, networks, and data.

1. Securing devices

  • Tamper-resistant hardware. IoT devices may be stolen by attackers in order to tamp with them or access sensitive data. To prevent this, it’s necessary to make your product tamper-proof. You can ensure physical security by using port locks or camera covers as well as by applying strong boot-level passwords or taking other approaches that will disable the product in case of tampering.
  • Provide patches and updates. While manufacturers are compensated only at the moment of sale, ongoing maintenance of devices requires additional costs. However, the proper security of your products can be ensured only with constant updates and patches. It’s best to establish automatic and mandatory security updates that require no actions from consumers. Inform consumers about the timespan during which you’ll support the product and tell users what they should do after the end of this period.
  • Penetration testing and dynamic code analysis. Penetration testing is your main tool for finding vulnerabilities in firmware and software of IoT products and reducing the attack surface as much as possible. Static code analysis is initially used to find the most obvious flaws, but to dig up well-hidden vulnerabilities you’ll need to use dynamic testing. Dynamic testing is performed over compiled code that runs as it would during normal operation, allowing you to test code in situations close to real-life use cases. Since IoT devices extensively interact with their environment and communicate with other devices, the use of dynamic code analysis within penetration testing is extremely important.
  • Data protection. IoT devices should also ensure data safety during performance and after the exploitation of your product. Make sure that cryptographic keys are stored in nonvolatile device memory. Additionally, you can offer to dispose of used products or provide a way to discard them without exposing sensitive data.
  • Performance requirements. The performance of processors and microcontrollers in IoT devices should meet certain requirements in order to ensure proper usability. For example, they should use little power but offer high processing capability. Moreover, devices should ensure authorization, data encryption, and wireless connections. Whenever possible, your IoT product should also be able to perform its functions even if its connection to the internet is temporarily disrupted.

2. Securing networks

  • Strong authentication. This can be achieved by using unique default credentials. When naming or addressing your products, use the latest protocols to ensure their functionality for a long time. If possible, provide your product with two-factor authentication – for instance, using a sophisticated password and a security code.
  • Encryption and secure protocols. Communication between devices also requires security protection. However, cryptographic algorithms should be adapted to the limited capacities of IoT devices. Transport Layer Security (TLS) or Lightweight Cryptography (LWC) can be applied for these purposes. An IoT architecture allows you to use wireless or wired technologies such as RFID, Bluetooth, Cellular, ZigBee, Z-Wave, Thread, and Ethernet. Moreover, you can ensure network security with optimized protocols such as IPsec and SSL.
  • Minimize device bandwidth. Limit network traffic to the amount necessary for functioning of the IoT device. If possible, program the device to limit hardware and kernel-level bandwidth and reveal suspicious traffic. This will protect your product from possible DoS attacks. The product should also be programmed to reboot and clear code in case malware is detected, since malware can be used to hijack the device and use it as part of a botnet to perform DDoS attacks.
  • Divide networks into segments. Implement next-generation firewall security by separating big networks into several smaller ones. For this purpose, use ranges of IP addresses or VLANs. For secure internet connections, implement a VPN in your IoT system.

3. Securing data

  • Protect sensitive information. Install unique default passwords for each product or require immediate password updates on first use of the device. Use authentication to ensure that only authorized users have access to data. Moreover, install a reset mechanism to allow clearing of sensitive data and configuration settings if the user decides to return or resell the product.
  • Collect only necessary data. Ensure that your IoT product collects only data necessary for its operation. This will reduce the risk of data leakage and protect consumers’ privacy.
  • Secure custom network communication. For better security, restrict your product’s communication. Don’t rely entirely on the network firewall, and ensure secure communication by making your product invisible via inbound connections by default. Moreover, use encryption methods and protocols optimized to the needs of IoT systems.

Conclusion

Developers of IoT devices should think about the security of their products starting from the development stage. However, it’s hard to find experienced professionals who can adopt security technologies to the needs of IoT devices. Developing secure IoT products requires the skills of hardware security engineers, engineers with a lot of experience designing secure software, and quality assurance specialists with a lot of experience in penetration testing. Our Apriorit team provides kernel and driver development for Linux as well as services to ensure digital security and software quality. We would be glad to become your long-term partner in IoT development.

Though it’s quite difficult to create a programming language better than C, C++, or Java, Mozilla has managed to develop a language that can ensure better security and privacy on the internet. Rust, which only appeared in 2010, has already become one of the most-loved languages by programmers. Thanks to its innovative features, the language allows novice developers to mitigate security vulnerabilities and benefit from faster software performance.

This Rust programming language tutorial based on our experience at Apriorit will provide you with a deep look into Rust features and their practical application. This four-article series will be useful for programmers who wish to know more about the options that the Rust language provides.

 

Written by:

Alexey Lozovsky,

Software Designer

 

Contents:

Introduction

Summary of Features

Rust Language Features

    Zero-Cost Abstractions

    Move semantics

Conclusion

Introduction

Rust is focused on safety, speed, and concurrency. Its design allows you to develop software with great performance by controlling a low-level language using the powerful abstractions of a high-level language. This makes Rust both a safer alternative to languages like C and C++ and a faster alternative to languages like Python and Ruby.

The majority of safety checks and memory management decisions are performed by the Rust compiler so the program’s runtime performance isn’t slowed down by them. This makes Rust a great choice for use cases where more secure languages like Java aren’t good:

  • Programs with predictable resource requirements
  • Embedded software
  • Low-level code like device drivers

Rust can be used for web applications as well as for backend operations due to the many libraries that are available through the Cargo package registry.

Summary of Features

Before describing the features of Rust, we’d like to mention some issues that the language successfully manages.

Issue Rust’s Solution
Preferring code duplication to abstraction due to high cost of virtual method calls Zero-cost abstraction mechanisms
Use-after-free, double-free bugs, dangling pointers Smart pointers and references avoid these issues by design

Compile-time restrictions on raw pointer usage
Null dereference errors Optional types as a safe alternative to nullable pointers
Buffer overflow errors Range checks performed at runtime

Checks are avoided where the compiler can prove they’re unnecessary
Data races Built-in static analysis detects and prevents possible data races at compilation time
Uninitialized variables Compiler requires all variables to be initialized before first use

All types have defined default values
Legacy design of utility types heavily used by the standard library Built-in, composable, structured types: tuples, structures, enumerations

Pattern matching allows convenient use of structured types

The standard library fully embraces available pattern matching to provide easy-to-use interfaces
Embedded and bare-metal programming place high restrictions on runtime environment Minimal runtime size (which can be reduced even further)

Absence of built-in garbage collector, thread scheduler, or virtual machine
Using existing libraries written in C and other languages Only header declarations are needed to call C functions from Rust, or vice versa

No overhead in calling C functions from Rust or calling Rust functions from C

Now let’s look more closely at the features provided by the Rust programming language and see how they’re useful for developing system software.

Rust Language Features

In the first article of this Rust language programming tutorial, we’ll describe such two key features as zero-cost abstractions and move semantics.

Zero-Cost Abstractions

Zero-cost (or zero-overhead) abstractions are one of the most important features explored by C++. Bjarne Stroustrup, the creator of C++, describes them as follows:

“What you don’t use, you don’t pay for.” And further: “What you do use, you couldn’t hand code any better.”

Abstraction is a great tool used by Rust developers to deal with complex code. Generally, abstraction comes with runtime costs because abstracted code is less efficient than specific code. However, with clever language design and compiler optimizations, some abstractions can be made to have effectively zero runtime cost. The usual sources of these optimizations are static polymorphism (templates) and aggressive inlining, both of which Rust embraces fully.

Iterators are an example of commonly used (and thus heavily optimized) abstractions that they decouple algorithms for sequences of values from the concrete containers holding those values. Rust iterators provide many built-in combinators for manipulating data sequences, enabling concise expressions of a programmer’s intent. Consider the following code:

// Here we have two sequences of data. These could be stored in vectors
// or linked lists or whatever. Here we have _slices_ (references to arrays):
let data1 = &[3, 1, 4, 1, 5, 9, 2, 6];
let data2 = &[2, 7, 1, 8, 2, 8, 1, 8];
 
// Let’s compute some valuable results from them!
let numbers =
    // By iterating over the first array:
    data1.iter()            // {3,      1,      4,      ...}
    // Then zipping this iterator with an iterator over another array,
    // resulting in an iterator over pairs of numbers:
    .zip(data2.iter())      // {(3, 2), (1, 7), (4, 1), ...}
    // After that we map each pair into the product of its elements
    // via a lambda function and get an iterator over products:
    .map(|(a, b)| a * b)    // {6,      7,      4,      ...}
    // Given that, we filter some of the results with a predicate:
    .filter(|n| *n > 5)     // {6,      7,              ...}
    // And take no more than 4 of the entire sequence which is produced
    // by the iterator constructed to this point:
    .take(4)
    // Finally, we collect the results into a vector. This is
    // the point where the iteration is actually performed:
    .collect::<Vec<_>>();
 
// And here is what we can see if we print out the resulting vector:
println!("{:?}", numbers);  // ===> [6, 7, 8, 10]

Combinators use high-level concepts such as closures and lambda functions that have significant costs if compiled natively. However, due to optimizations powered by LLVM, this code compiles as efficiently as the explicit hand-coded version shown here:

use std::cmp::min;
 
let mut numbers = Vec::new();
 
for i in 0..min(data1.len(), data2.len()) {
    let n = data1[i] * data2[i];
 
    if n > 5 {
        numbers.push(n);
    }
 
    if numbers.len() == 4 {
        break;
    }
}

While this version is more explicit in what it does, the code using combinators is easier to understand and maintain. Switching the type of container where values are collected requires changes in only one line with combinators versus three in the expanded version. Adding new conditions and transformations is also less error-prone.

Iterators are Rust examples of “couldn’t hand code better” parts. Smart pointers are an example of the “don’t pay for what you don’t use” approach in Rust.

The C++ standard library has a shared_ptr template class that’s used to express shared ownership of an object. Internally, it uses reference counting to keep track of an object’s lifetime. An object is destroyed when its last shared_ptr is destroyed and the count drops to zero.

Note that objects may be shared between threads, so we need to avoid data races in reference count updates. One thread must not destroy an object while it’s still in use by another thread. And two threads must not concurrently destroy the same object. Thread safety can be ensured by using atomic operations to update the reference counter.

However, some objects (e.g. tree nodes) may need shared ownership but may not need to be shared between threads. Atomic operations are unnecessary overhead in this case. It may be possible to implement some non_atomic_shared_ptr class, but accidentally sharing it between threads (for example, as part of some larger data structure) can lead to hard-to-track bugs. Therefore, the designers of the Standard Template Library chose not to provide a single-threaded option.

On the other hand, Rust is able to distinguish these use cases safely and provides two reference-counted wrappers: Rc for single-threaded use and Arc with an atomic counter. The cherry on top is the ability of the Rust compiler to ensure at compilation time that Rcs are never shared between threads (more on this later). Therefore, it’s not possible to accidentally share data that isn’t meant to be shared and we can be freed from the unnecessary overhead of atomic operations.

Move Semantics

C++11 has brought move semantics into the language. This is a source of countless optimizations and safety improvements in libraries and programs by avoiding unnecessary copying of temporary values, enabling safe storage of non-copyable objects like mutexes in containers, and more.

Rust recognizes the success of move semantics and embraces them by default. That is, all values are in fact moved when they’re assigned to a different variable:

let foo = Foo::new();
let bar = foo;          // the Foo is now in bar

The punchline here is that after the move, you generally can’t use the previous location of the value (foo in our case) because no value remains there. But C++ doesn’t make this an error. Instead, it declares foo to have an unspecified value (defined by the move constructor). In some cases, you can still safely use the variable (like with primitive types). In other cases, you shouldn’t (like with mutexes).

Some compilers may issue a diagnostic warning if you do something wrong. But the standard doesn’t require C++ compilers to do so, as use-after-move may be perfectly safe. Or it may not be and might instead lead to an undefined behavior. It’s the programmer’s responsibility to know when use-after-move breaks and to avoid writing programs that break.

On the other hand, Rust has a more advanced type system and it’s a compilation error to use a value after it has been moved, no matter how complex the control flow or data structure:

error[E0382]: use of moved value: `foo`
  --> src/main.rs:13:1
   |
11 | let bar = foo;
   |     --- value moved here
12 |
13 | foo.some_method();
   | ^^^ value used here after move
   |

Thus, use-after-move errors aren’t possible in Rust.

In fact, the Rust type system allows programmers to safely encode more use cases than they can with C++. Consider converting between various value representations. Let’s say you have a string in UTF-8 and you want to convert it to a corresponding vector of bytes for further processing. You don’t need the original string afterwards. In C++, the only safe option is to copy the whole string using the vector copy constructor:

std::string string = “Hello, world!”;
std::vector<uint8_t> bytes(string.begin(), string.end());

However, Rust allows you to move the internal buffer of the string into a new vector, making the conversion efficient and disallowing use of the original string afterwards:

let string = String::from_str(“Hello, world!”);
let bytes = string.into_bytes();        // string may not be used now

Now, you may think that it’s dumb to move all values by default. For example, when doing arithmetic we expect that we can reuse the results of intermediate calculations and that an individual constant may be used more than once in the program. Rust makes it possible to copy a value implicitly when it’s assigned to a new variable, based on its type. Numbers are an example of such copyable type, and any user-defined type can also be marked as copyable with the #[derive(Copy)] attribute.

Conclusion

Considering the increasing popularity of the Rust programming language, we wanted to share our experience in software development using Rust. In the second article of this series, we’ll describe the language’s memory safety features including ownership, borrowing, mutability, null pointer alternatives, and the absence of uninitialized variables.

This is the second part in our series File System Virtualization – The New Perspective.

In this part, we’ll demonstrate how to implement a virtual disk plugin for one of the most popular cloud storage services, Google Drive. This plugin will provide you with transparent access to cloud files and folders.

Note: This plugin requires our previously implemented virtual disk service.

 

Written by:

Anton Akhmetshyn

Software Designer

 

Contents:

Plugin API

Virtual Disk Interface

    Google Drive API

    Sample API Implementation

    Local File Caching

Plugin in Action

Conclusion

References

Plugin API

First of all, we’ll need a common cloud plugin interface that can be used by all future plugins that we decide to implement:

class ICloudWorker
{
public:
    virtual ~ICloudWorker(){}
 
    virtual std::wstring GetPluginName() = 0;
    virtual std::shared_ptr<ICloudItem> GetRoot() = 0;
    virtual void GetSpaceInformation(...) = 0;
};
 

Since a cloud file can represent either an actual file or a directory, we can work with such files using the generic ICloudItem interface:

class ICloudItem
{
public:
    virtual ~ICloudItem() {}
 
    virtual ItemType GetType() const = 0;
    virtual const std::string& GetId() const = 0;
    virtual __int64 GetVersion() const = 0;
    virtual const std::wstring& GetName() const = 0;
    virtual const std::string& GetCloudName() const = 0;
    virtual __int64 GetCreationTime() const = 0;
    virtual __int64 GetLastAccessTime() const = 0;
    virtual __int64 GetLastWriteTime() const = 0;
    virtual __int64 GetChangeTime() const = 0;
    virtual __int64 GetSize() const = 0;
    virtual void Rename(...) = 0;
    virtual void SetInfo(...) = 0;
    virtual void SetSize(...) = 0;
    virtual void Update(...) = 0;
};

Plugins take the form of a Windows dll file. This allows you to load and unload them on demand through a virtual disk service. Here’s a possible wrapper for such a plugin:

class CProtocolLibHolder
{
public:
    explicit CProtocolLibHolder(LPCWSTR libraryFileName);
    virtual ~CProtocolLibHolder();
 
    IProtocolManager *GetProtocol(IServiceManager *pServiceManager);
 
private:
    void LoadProtocolLib();
    void FreeProtocolLib();
};

And here’s an example of LoadProtocolLib method implementation:

void CProtocolWrapper::CProtocolLibHolder::LoadProtocolLib()
{
    _ASSERTE(NULL == m_hModule);
    if (NULL == m_hModule)
    {
        // load library
        m_hModule = ::LoadLibrary(m_LibraryFileName);
        if (NULL == m_hModule)
        {
            throw cmn::WinException("Failed to load module");
        }
 
        try
        {
            // get pointers to functions
            (FARPROC&)m_pfnGetProtocol = ::GetProcAddress(m_hModule, GET_PROTOCOL_PROC_NAME);
            if (NULL == m_pfnGetProtocol)
            {
                throw cmn::WinException("Failed to get proc address");
            }
 
            InitFunc *m_pfnInit = NULL;
            (FARPROC&)m_pfnInit = ::GetProcAddress(m_hModule, INIT_PROC_NAME);
            if(m_pfnInit)
                m_pfnInit(m_hModule, CMfDiskService::GetLog());
        }
        catch(const cmn::WinException&)
        {
            this->FreeProtocolLib();
            throw;
        }
    }
}

The dll plugin needs to export only three functions, and it works as a wrapper for ICloudWorker:

namespace google
{
    class Protocol : public cmn::CommonProtocolManager
    {
    protected:
        virtual std::unique_ptr<cmn::ICloudWorker> CreateCloudWorker(MF_CONNECT_INFO* info);
    };
}
 
void InitProtocolLib(...)
{
    // Do module initialization
}
 
void UninitProtocolLib(...)
{
    // Do module cleanup
}
 
IProtocolManager* GetProtocol(IServiceManager*)
{
    try
    {
        return new google::Protocol();
    }
    catch (const std::exception&)
    {
        return NULL;
    }
}

Virtual Disk Interface

When you create a plugin API, you’ll need to implement a Google Drive protocol for your disk service. Using the ICloudWorker interface, you can create a single common protocol class without the need to create separate implementations for each cloud plugin you may add in the future.

Google Drive API

Google Drive provides a RESTful API for interacting with files in the cloud. Both API versions 2 and 3 are currently supported. You can learn more in Google’s API Reference. We’ll cover only part of this material for the sake of simplicity.

Bindings exist for some common programming languages such as Java, JavaScript, C#, Objective-C, PHP, and Python. Unfortunately, C++ isn’t among them so you’ll need to write some boilerplate code too. Tools such as curl or the more modern cpprestsdk can significantly reduce the amount of routine work.

You should authorize every request coming to the Drive API using OAuth2 supported by cpprestsdk. That’s beyond the scope of this article, though.

The Drive API supports the following file and folder requests:

  • create
  • update
  • copy
  • delete
  • list
  • get
  • emptyTrash
  • export
  • watch
  • generateIds

Your Virtual Disk plugin must implement at least the bare minimum of requests required to correct file I/O handling. These requests are create, update, delete, get, and list.

Sample API Implementation

Once you define the necessary API functions, you can start implementing them.

You can represent a Drive cloud item using either a file or folder class.

class FileItem : public BaseCloudItem<FileItem>
{
public:
    FileItem(const Metadata& metadata, Operations& operations);
 
    virtual void Download(...);
    virtual void Upload(...);
};
class FolderItem : public BaseCloudItem<FolderItem>
{
public:
    FolderItem(const Metadata& metadata, Operations& operations);
 
    virtual boost::shared_ptr<cmn::IChildrenList> GetChildrenList(...);
 
    virtual cmn::CloudItemPtr CreateItem(...);
    virtual void DeleteItem(...);
    virtual std::string RenameItem(...);
    virtual std::string MoveItem(...);
};

Every item also has associated metadata serialized into the JSON object.

class Metadata
{
public:
    Metadata(const Json::Value& metadata);
 
    bool IsFolder() const;
    bool IsDownloadable() const;
 
    const std::string& GetId() const;
    const std::string& GetTitleWindows() const;
    const std::string& GetTitleOriginal() const;
    unsigned __int64 GetSize() const;
 
    unsigned __int64 GetCreatedDate() const;
    unsigned __int64 GetLastViewedDate() const;
    unsigned __int64 GetModifiedDate() const;
 
    __int64 GetVersion() const;
    const std::string& GetDownloadUrl() const;
    bool IsTrashed() const;
 
private:
    ...
};

Finally, you can write the code for actual Drive API requests. You should use the get operation for this. All other operations can be implemented similarly. This example uses curl to handle the HTTP protocol:

void google::Operations::get(const Metadata& metadata, HANDLE file)
{
    const std::string& url = metadata.GetDownloadUrl();
    std::wstring fileName(utils::Utf8ToUtf16(metadata.GetTitleWindows()));
    cmn::Request request = m_requestCreator->Create();
    cmn::FileWriteResultHandler handler(file);
    request.Perform(url, &handler);
 
    if (!::SetEndOfFile(file))
    {
        throw cmn::WinException("Failed to set end of file");
    }
}
 
const std::string& google::Metadata::GetDownloadUrl() const
{
    return ParseString(m_metadata, "downloadUrl");
}
 
void cmn::Request::Perform(const char* url, IRequestResultHandler* resultHandler)
{
    RequestData data(m_curl->m_curl.get(), m_eventHandler.get(), &m_header, resultHandler, &m_errorHandler);
 
    CURL_SETOPT_CHECK(m_curl->m_curl.get(), CURLOPT_WRITEDATA, &data);
    CURL_SETOPT_CHECK(m_curl->m_curl.get(), CURLOPT_HEADERDATA, &data);
    CURL_SETOPT_CHECK(m_curl->m_curl.get(), CURLOPT_HTTPHEADER, m_curl->m_headers);
    CURL_SETOPT_CHECK(m_curl->m_curl.get(), CURLOPT_URL, url);
 
    PerformLoop();
}

Local File Caching

Since the Drive API doesn’t support partial file downloads, you should synchronize all files locally before making changes. That way all file system operations will be performed in the cache once the file download is complete.

class ICloudCache
{
public:
    virtual ~ICloudCache () {}
 
    virtual bool Exists() const;
    virtual void Create();
    virtual void Delete();
 
    virtual void Disconnect() = 0;
 
    virtual void CacheCreateFile(...) = 0;
    virtual void CacheCloseFile(...) = 0;
    virtual void CacheReleaseFile(...) = 0;
    virtual void CacheQueryFileInfo(...) = 0;
    virtual void CacheSetFileSize(...) = 0;
    virtual void CacheSetFileBasicInfo(...) = 0;
    virtual void CacheDeleteFile(...) = 0;
    virtual void CacheRenameFile(...) = 0;
    virtual void CacheQueryDirContents(...) = 0;
    virtual void CacheReadFile(...) = 0;
    virtual void CacheWriteFile(...) = 0;
    virtual void CacheQueryVolumeInfo(...) = 0;
    virtual void CacheSetVolumeInfo(...) = 0;
    virtual void CacheLockFile(...) = 0;
    virtual void CacheOnFileDeleted(...) = 0;
};

Here’s a possible implementation for CacheCreateFile:

void cache::FileFolderCache::CacheCreateFile(
    const std::wstring& relativePath,
    DWORD attributes,
    DWORD createDisposition,
    ACCESS_MASK access,
    WORD shareAccess,
    DWORD* createInfo)
{
    HANDLE handle = NULL;
    DWORD result = m_ntdll.NtCreateFile(
        GetFullPath(relativePath),
        attributes,
        createDisposition,
        access,
        shareAccess,
        &handle,
        createInfo);
 
    if (result != ERROR_SUCCESS)
    {
        throw cmn::WinException("Failed to create file", result);
    }
}

After applying changes to a file, you should synchronize it back to the cloud. You should do this asynchronously to ensure that multiple changes in the same file are consolidated and to reduce the number of network I/O operations.

void cache::FileFolderCache::CacheWriteFile(
    FILE_HANDLE hFileHandle,
    LARGE_INTEGER byteOffset,
    DWORD dwLength,
    PVOID pvBuffer,
    PDWORD pdwBytesWritten)
{
    // Write data into physical file
    cmn::ItemWrapperPtr item = m_fileWorker.WriteFile(
        hFileHandle, byteOffset, dwLength, pvBuffer, pdwBytesWritten);
 
    if (item == NULL)
    {
        return;
    }
 
    //
    // Queue file for further cloud synchronization
    //
    cmn::CloudItemInfo info(item->GetId(), item->GetType(), m_paths.GetPath(hFileHandle), item->GetVersion());
 
    m_contextHolder.AddItemAsChanged(info);
}

To reduce latency and improve the user experience, the cache also utilizes the readahead technique: the plugin downloads all cloud files in advance starting from the drive root. At the time the system accesses the necessary file, it has already been located in the local storage.

Plugin in Action

When your plugin is complete, you can test it.

But first, you should enable the Drive API in the Google Developers Console. You can do this by following these simple steps:

  1. Use this wizard to create or select a project in the Google Developers Console and automatically enable the API. Click continue and then go to credentials.
  2. On the add credentials to your project page, click cancel.
  3. At the top of the page, select the OAuth consent screen tab. Select the email address option, enter a product name if not already set, and click save.
  4. Select the credentials tab, click the create credentials button, and select OAuth client ID.
  5. Select the other application type, enter the name “Virtual Disk plugin,” and click create.
  6. Click OK to dismiss the dialog box that opens.
  7. Save both the client ID and secret for use in the mounting tool.

Now you can mount and access the disk:

Virtual Disk plugin

Your cloud storage will now be visible as a local disk in Windows Explorer:

Cloud Storage in Windows Explorer

Cloud Storage in Windows Explorer 2

You can seamlessly access all disk files.

Disk files

Procmon.exe Properties

Conclusion

In this article, we’ve shown the steps required to implement a cloud storage plugin for a virtual disk using file system virtualization. This plugin is designed to work with Google Drive, but what we’ve created is a solid interface foundation allowing for quick integration with other cloud plugins as well: DropBox, Box, Adobe Creative Cloud, and more.

References

  1. https://www.apriorit.com/dev-blog/438-file-system-virtualization
  2. https://developers.google.com/drive/v3/reference/
  3. https://curl.haxx.se/docs/
  4. https://msdn.microsoft.com/en-us/

As SaaS solutions become more popular, companies need to pay more attention to data protection. Important corporate data stored in the cloud should be protected as reliably as data stored on-site. This article will be useful for developers who are working on their own cloud backup solution.

 

Written by:

Vasyl Tsyktor,

Market Research Specialist

 

Contents:

Importance of a Data Backup Service for SaaS

Why Cloud Services Need Backup

Overview of Existing Solutions

Building Your Own Cloud Backup Solution

Requirements for Cloud Backup Solutions

Conclusion

Importance of a Data Backup Service for SaaS

One of the most significant benefits of cloud storage is that a cloud service vendor is responsible for data management and is in control of your SaaS data security and backups. While this might be true in most cases, however, not all cloud service providers can assist in recovering company’s data if a single change made by one of employees causes data corruption or even loss.

In a report published in February 2017, Gartner predicts that the public cloud market will grow by about $38 billion by the end of 2017 to reach an estimated $246.8 billion, and will hit $287.8 billion in 2018. Gartner suggests that cloud services, including Software as a Service (SaaS), which represents nearly 18.8% of the market (about $46.331 billion), are now in a period of stabilization.

With the growing popularity of SaaS solutions, companies that maintain data in the cloud are getting interested in cloud-to-cloud backup tools for the following reasons:

  • Cloud-to-cloud backup solutions are easy to implement. Cloud-based software doesn’t require large initial infrastructure investment and can be easily deployed with just an agent installation.
  • Cloud-to-cloud backup solutions have predictable costs. With no big upfront costs, companies can focus on maintaining current operational costs regardless their chosen backup solution.
  • Cloud-to-cloud backup solutions are simple to manage. Since a service provider is responsible for data management, the only thing a company has to worry about is backing its servers.

There are a few concerns about cloud-to-cloud backups, like the following:

  • Pricing and licensing schemes for cloud-to-cloud backup systems can be quite complex. If a company needs to back up a lot of services, it may find itself paying a premium for licences, since basic offerings only include support for one (e.g. Microsoft Office 365 backup) or a handful of services.
  • There isn’t a service level agreement. The absence of a service level agreement can lead to situations when the state of cybersecurity of a cloud provider cannot be reliably verified.
  • Slow internet connections. Small companies often use slow internet connections, which makes downloading large amounts of data quite difficult.

Why Cloud Services Need Backup

Cloud-to-cloud backup ensures that data stored on distributed cloud-based platforms – such as Salesforce, Microsoft Office 365, and Google Apps – is safe. Cloud-to-cloud backup solutions allow you to easily recover data from any time. Google Apps, for instance, allows data restoration only within 25 days according to an all-or-nothing principle. Therefore, there’s a need on the market for new cloud-to-cloud solutions.

Companies often mistakenly believe that their SaaS provider also has to be their cloud backup provider since they’ve created their data in the cloud, as their vendor presumably creates backups for its own purposes.
Salesforce, for example, provides a special service called Data Recovery that allows organizations to recover their data at any time. But to use Data Recovery, clients need to pay $10,000. Salesforce states that this high cost is based on the large amount of time (20 days or more) of manual work that is necessary to restore this data.

This is why a company can’t fully entrust critical cloud-based data backups to their cloud service provider. Cloud-to-cloud backup for enterprise ensures that a client will always have a copy of data that’s stored in the cloud in case something goes wrong. Unfortunately, businesses are used to learning lessons the hard way. Companies realize how much data recovery can cost when they partner with an unreliable SaaS provider, deploy a SaaS model they can’t fully control, or face a situation where an employee makes a single unneeded change in Salesforce. Alternatively, they could just have a copy of their own data ready and resolve the situation in minutes. The need for effective cloud-to-cloud backup tools is obvious.

Overview of Existing Solutions

Cloud-to-cloud backup solutions let you implement scalable, manageable, and dependable cloud-based data backups. When developing a particular SaaS system or striving to improve your SaaS backup strategy, you should follow the same principles you would when managing on-site deployments. When developing your cloud-to-cloud backup and data protection solution, consider the following:

  • Performance. Before offering backup software, first test its performance. You might need to develop a backup speed testing tool if it has no one. This tool should be able to perform upload and download speed tests as well as latency tests.
  • How saved backups work for actual recovery. Test to make sure that backups will work in an emergency.
  • SSAE 16(Statement on Standards for Attestation Engagements) compliance. This is a mandatory standard for US service organizations for reporting their system and security controls and is comparable to international standard ISAE 3402.  
  • Pricing. The costs of a backup for cloud services involves comparing total expenses, not only the starter price. Vendors usually charge for the average amount of stored data per year, though there might be exceptions.
  • The ability of restore and backup processes to meet recovery time objectives (RTO) and recovery point objectives (RPO) for your company’s customers. Meet RTO and RPO requirements for third-party SaaS applications.
  • Customer support. The ability of customer support provided by the cloud-to-cloud backup and recovery vendor to meet your corporate requirements.
  • Privileged Users. Specify whether backup exceptions is possible to set for particular users.

Let’s consider some examples of SaaS backup solutions.

Backupify

Backupify supports Salesforce and Google Apps and provides capabilities to protect and restore cloud server files to SaaS developers, application providers, and other companies via its own custom API. Backupify also can back up from Microsoft Office 365 and Box.

Spanning

Spanning is a SaaS data backup service that creates daily backups of data stored in Google Apps, G-Suite, and Salesforce. With Spanning, however, users cannot change the time for automatic backups and can only manually create a copy of data. Spanning assures that backed up data is accessible as long as an account is active.

Asigra

Asigra was the first vendor to integrate cloud-to-cloud backup functionality in a multi-functional backup app. Asigra offers its Asigra Cloud Backup software only through their partners.

It was first added to the software platform in 2012. At the time, Asigra was only able to backup Salesforce data. Since then, they have also added the ability to restore data created in both Microsoft Office 365 and Google Apps. With Asigra’s Cloud to Cloud Backup, administrators can integrate and manage backup data from cloud services along with client’s servers, virtualized environments and endpoint devices. Companies can store backup data on either on-site or in the cloud of any given partner. Admins can schedule automated backups at regular periods and retain their cloud-based data to comply with company regulations.

OwnBackup

OwnBackup is another backup service for SaaS applications. OwnBackup allows administrators to take a snapshot of their database and compare it with another backup created at another point in time. Administrators can then decide which version to recover. The solution is designed to recover only corrupted data instead of all data, which ensures fast recovery of the complete database.

Other cloud-to-cloud backup solutions include Syscloud and CloudAlly. These tools allow users to create copies of data stored on cloud-based apps and transfer that data to another cloud storage provider. These copies usually contain audit logs and metadata that help admins easily find credentials.

Building Your Own Cloud Backup Solution

When you decide to create your own backup software for SaaS, you should first determine the set of features you need. The following features are in high demand and are part of the most competitive cloud-to-cloud backup solutions currently on the market:

  • Data encryption prior to transfer – Encrypting data before transferring prevents access by unauthorized users.
  • Deduplication – Data deduplication is a compression technique to avoid data repeating. It allows companies to optimize their storage resources and decrease bandwidth requirements.
  • Hybrid cloud backup – Cached backups stored on a company’s premises reduce the time needed to restore data.
  • Extracting and saving cloud-based data to physical devices – Storing cloud-based data on a physical disk on-site reduces time for both initial backup and data restoration.
  • Ongoing backups (incremental forever) – Perform one initial database backup and then save ongoing backups with active users in the database instead of backing up the whole database every time. This reduces the amount of data coming and going across a company’s network.
  • Sub-file-level backups – This feature reduces the volume of data that needs to be copied by only backing up changed parts within individual files and works best with large files.
  • Bandwidth options – Zipping data and scheduling backups to avoid impacts on users within a corporate network.

Requirements for Cloud Backup Solutions

In addition to necessary features, your cloud-to-cloud backup solution should meet certain requirements to ensure the efficiency of data backups and restores. Let’s cover these requirements in detail.

Regulatory Сompliance

Your cloud-to-cloud data backup and recovery solution should ensure compliance with regulations such as the Health Insurance Portability and Accountability Act (HIPAA). Even though your company may not currently work with health service providers, you may in the future. Therefore, you should consider implementing data security measures to ensure compliance.

Data Backup Frequency

Define how often users should back up their data. Should users be able to set a custom schedule or should they use a regular schedule? Or both? Your solution should also let users manually make backups at any time.

Effective Search

It’s hard to remember the name of each file stored in a database. Therefore, your cloud-to-cloud data backup solution should have a convenient search feature that will help your users quickly find files.

Conclusion

Generating backups is necessary to protect data created in SaaS services from corruption or loss. Since there are various SaaS backup services on the market, you should consider the pricing and features of existing services in order to develop a competitive solution. Our team at Apriorit can help you develop a cloud-to-cloud data backup tool that will meet the industry standards while fully reflecting your business requirements. We’ve been successfully delivering high-performance SaaS platforms for many years to technology vendors all over the world, and we have a lot of expertise in cloud computing. Send us your request for proposal and let’s talk.

How to Build a Data Backup Service for SaaS

SOCI is a free database access library that’s written in C++. The library itself is distributed via SourceForge and the documentation can be found both on the official website and SourceForge.

Initially designed as a C++ alternative to Oracle Call Interface, the scope of SOCI has greatly expanded over time. This library now serves as an easy way to access databases with C++ and supports different types of databases including MySQL, ODBC, Oracle, PostgreSQL, and SQLite3 with plans to add support for other popular database types as well.

As a company specializing in C++ development, we often use SOCI when we need to work with MySQL in C++, as it allows us to essentially create SQL queries within standard C++ code. In this article, we’ll show you a practical example of how to use SOCI with MySQL in order to create a simple SOCI MySQL backend.

 

Written by:

Sergey Stepanchuk,

Software Developer at System Programming Team

 

Contents:

Creating a DBTableBase Class

Creating a Table

Insert

Update

Delete

Transaction

Select

Conclusion

References

 

Creating a DbTableBase Class

One of the prerequisites to implementing a full-fledged application with MySQL and SOCI is having a class that contains Insert, Update, and Delete functions, among others. Let’s name this class DBTableBase.

<typename RowType="soci::row">
class DBTableBase
{
    explicit DBTableBase(soci::session& session, const std::string& tableName)
        : m_session(session)
        , m_tableName(tableName)
{
}
   /* Other methods */
}

Our DBTableBase class must contain a member with the name of the:

std::string m_tableName; 

It must also include the main interface for communicating with SOCI:

soci::session& m_session;

Now we need to implement this template class for each table in the database that inherits the base class. In our example, it will look like this:

class Students: public DBTableBase<Student>
{
     Students(soci::session& session):
        DBTableBase(session, g_studentsTableName)
    {
    }
   /* Other methods */
}

Creating a Table

As an example, let’s create a small database called Students that will contain information on current students at a university. Let’s add id, name, last_name, and faculty fields.

Table 1

id name last name faculty
... ... ... ...

In order to create the table, first we need to define the structure of the fields within it. The best way to do this is with BOOST_FUSION_DEFINE_STRUCT, which defines and adapts the structure as a model of Random Access Sequence.

typedef boost::optional<int64_t> IdType;
BOOST_FUSION_DEFINE_STRUCT((db), Student,
(db::IdType, id)
(std::string, name)
(std::string, last_name)
(std::string, faculty)
)

The id field is set with the help of boost::optional. This is necessary to determine how this field is initialized. If the id field hasn’t been initialized, it will contain the value boost::none. This function is very useful for fields that can contain a null value (keeping in mind that in MySQL, null and 0 are different).

Next, we’ll define several constants:

const char g_nameId[] = "id";
const char g_nameIdType[] = "BIGINT PRIMARY KEY AUTO_INCREMENT";
const char g_notNull[] = "NOT NULL";
const char g_studentsTableName[] = "Students_table";
const std::string g_nameType("VARCHAR(255)");

Then we’ll create a Students object:

Students students(session);

And then create the table:

students.CreateTable();

Next, with the help of SOCI, we can construct a query to create the Students table:

void Students::CreateTable() const
{
    m_session << BeginCreateTableQuery()
        << ", " << GetColName<1>() << " " << g_nameType << g_notNull
  << ", " << GetColName<2>() << " " << g_nameType << g_notNull
  << ", " << GetColName<3>() << " " << g_nameType << g_notNull
  << ", " << DeleteCascadeQuery(GetColName<3>(), "Faculty_table")
  << EndCreateTableQuery();
}

The method DBTableBase::BeginCreateTableQuery can be used as follows:

std::string DBTableBase::BeginCreateTableQuery() const
{
    return std::string("CREATE TABLE" + m_tableName + " (" + g_nameId + " " + g_nameIdType;
}
 

The GetColName template function is defined like this:

template<int col>
static constexpr std::string DBTableBase::GetColName()
{
    return boost::fusion::extension::struct_member_name<RowType, col>::call();
}
 

This construct will return the name of the specified column.

If you need to set a column for cascade deletion, you can use this:

std::string DBTableBase::DeleteCascadeQuery(const std::string& colName, const std::string& tableName) const
{
    return std::string("CONSTRAINT fk_") + m_tableName + "_"+ colName + " FOREIGN KEY(" + colName +") REFERENCES " + tableName + "(" + g_nameId + ") ON DELETE CASCADE";
} 
 

At the end of our query, we need to specify:

std::string DBTableBase::EndCreateTableQuery() const
{
    return ");";
}
 

We’ve now finished creating the Students_table.

Insert

Next, we need to fill this table with data. Let’s look at how we can insert a few lines into our Students table so it looks like this:

Table 2

id name last_name faculty
0 William Taylor FEIT
1 Mary Davies FEIT
2 Jack Smith FEIT

First, let’s create three objects in db::Student:

db::Student studentTaylor(db::IdType(), "William", "Taylor", "FEIT");
db::Student studentDavies(db::IdType(), "Mary", "Davies", "FEIT");
db::Student studentSmith(db::IdType(), "Jack", "Smith", "FEIT");

Then we’ll call the Insert function to add them to DB:

students.Insert(studentTaylor);
students.Insert(studentDavies);
students.Insert(studentSmith);

Here’s the implementation of the Insert function:

int64_t DBTableBase::Insert(const RowType& row) const
{       	 
    std::stringstream query;
    query << "INSERT INTO " << m_tableName << " (";
    std::vector<std::string> colNames = GetColNames<RowType>();
    std::stringstream colsNames;
    std::stringstream colsValues;
    for (const auto& col : colNames)
    {
        if (!colsNames.str().empty())
        {
            colsNames << ", ";
            colsValues << ", ";
        }
        colsNames << col;
        colsValues << ":" << col;
    }
    query << colsNames.str() << ") VALUES (" << colsValues.str() << ")";
    DB_TRY
    {
        m_session <<query.str(), soci::use(row);
        int64_t id = 0;
        m_session.get_last_insert_id(m_tableName, id);
        return id;
    }
    DB_CATCH
    return 0;
}

Here’s the implementation of the GetColNames function:

static std::vector<std::string> DBTableBase::GetColNames()
{
    std::vector<std::string> res;
    GetColNames<RowType>::Call(res);
    return res;
}

And here’s the implementation of the Call function:

template<typename RowType, int index = boost::fusion::size(*(RowType*)nullptr) - 1>
struct GetColNames
{
    static void Call(std::vector<std::string>& val)
    {
        GetColNames<RowType, index - 1>::Call(val);
        std::string name = boost::fusion::extension::struct_member_name<RowType, index>::call();
        val.push_back(name);
    }
};
 

Update

Now let’s consider a situation when we’ve entered incorrect data into the table. For example, we’ve entered the first name of a student by the last name Davies as Emily instead of Mary. To correct this mistake, we need to update the entry for studentDavies.

studentDavies.name = "Emily";
students.Update(studentDavies);
 

Here’s the implementation of Update:

bool DBTableBase::Update(const RowType& row) const
{
    std::stringstream query;
    query << "UPDATE " << m_tableName << " SET ";
    std::vector<std::string> colNames = GetColNames();
    for (const auto& col : colNames)
    {
        if (col != colNames.front())
        {
            query << ", ";
        }
        query << col << " = :" << col;
    }
    query << " WHERE " << g_nameId << " = " << row.id;
    m_session << query.str(), soci::use(row);
    return true;
}

After this operation, we can see the change in the table:

Table 3

id name last_name faculty
0 William Taylor FEIT
1 Emily Davies FEIT
2 Jack Smith FEIT

Delete

We can use the following command to delete students’ information by id:

students.Delete(0);
students.Delete(1);
students.Delete(2);

where:

bool DBTableBase::Delete(int64_t id) const
{
    m_session << "DELETE FROM " << m_tableName << " WHERE " << g_nameId << " = :id", soci::use(id);
    return true;
}

After this operation, we’ll have an empty table.

Transaction

If you have a series of operations that need to be performed together where an error in one operation must cause the cancellation of all previous operations, then you need to use soci::transaction:

std::unique_ptr<soci::transaction> transaction(new soci::transaction(m_session));
try
{
    students.Insert(studentTaylor);
    students.Insert(studentDavies);
    students.Insert(studentSmith);
    transaction->commit();
}
catch(const std::exception& ex)
{
    std::cout << “Transaction error” << ex.what() << std::endl;
    transaction->rollback();
}

If an error occurs when adding a student to the database, all insert operations will be cancelled.

Select

Let’s consider a situation when we need to get the records for a student with id 2. In order to do this with SOCI, we need to use:

soci::rowset<db::Student> student = students.SelectByID(2);

As a result, we’ll get the following table:

Table 4

id name last_name faculty
2 Jack Smith FEIT

We can implement the SelectByID function as follows:

soci::rowset<db::Student> DBTableBase::SelectByID(const int64_t id) const
{
    std::stringstream query;
    query << "SELECT id  FROM " << m_tableName << " WHERE id = " << id;
    return m_session.prepare << query.str();
}

Let’s try to make a more complex query now – for example, let’s say we need to select all students with the name William Taylor:

Table 5

id name last_name faculty
0 William Willson FEIT
1 William Taylor FEIT
2 Taylor Davies FEIT
3 Jack Taylor FEIT
typedef std::map<std::string, std::stringstream> KeyValuePairs;
KeyValuePairs value;
value["name"] << "William";
value["last_name"] << "Taylor";
soci::rowset<db::Student> rows = students.SelectByValues(value);

By executing this query, we’ll get:

Table 6

id name last_name faculty
1 William Taylor FEIT

We can implement an easy way to get the desired string using several search parameters with the help of this function:

soci::rowset<rowtype> DBTableBase::SelectByValues(const KeyValuePairs& pairs) const
{
    std::stringstream query;
    query << "SELECT * FROM " << m_tableName << " WHERE ";
    bool first = true;
    for (const auto& pair: pairs)
    {
        if (first)
        {
            first = false;
        }
        else
        {
            query << " AND ";
        }
        query <<  pair.first << " = \"" << pair.second.str() << "\"";
    }
    return m_session.prepare << query.str();
}

Conclusion

SOCI is one of the most popular and refined database access library examples out there. Our article shows only a few ways in which you can use SOCI to work with databases, but it’s a nice illustration of the basic methodology. We hope that this information will prove useful and will encourage you to look more into SOCI in the future.

References

  1. SOCI - The C++ Database Access Library
  2. Boost Fusion Define Struct
  3. Boost.Optional
  4. MySQL

Almost every company needs a solution for protecting its sensitive data and detecting suspicious activity in real time. Besides, when an incident occurs, companies want to be able to provide digital evidence in the courtroom. A security solution that just records and stores a host’s activity logs isn’t enough. Security vendors now have to worry about both security information and event management (SIEM) and digital forensics. This article covers the implementation of forensic features in SIEM solutions and the key requirements to ensure admissibility of data in court.

Written by:

Anton Kukoba,

Security Research Leader

 

Contents:

Digital Forensics and SIEM Solutions

10 Requirements for Forensic Features in SIEM Solutions

    1. No Intrusion

    2. Integrity

    3. Accuracy

    4. Justification

    5. No Assumptions

    6. High Performance

    7. Retention

    8. Data Relevance

    9. Timestamps

    10. Commensurable Results

Conclusion

Digital Forensics and SIEM Solutions

Digital forensics is the process of providing evidence from electronic devices in order to reconstruct past events. This process includes collecting, identifying, and validating digital data to ensure its integrity and admissibility in court.

The field of digital forensics is divided into a few main branches depending on the types of devices to which the forensics are applied:

  • Cloud forensics
  • Computer forensics
  • Mobile forensics
  • Network forensics

With the increase in the number of digital devices used for business purposes, nearly every company feels the need to be able to perform digital forensics. Digital forensics capabilities help companies determine what has happened within their networks and systems and better protect their sensitive data.

While forensic products are specifically developed for certified forensic investigators, capabilities of security information and event management systems can ensure that data collected by security staff is provided in a forensic format for further analysis.

SIEM systems allow companies to collect and analyze log data in a central location from all devices/appliances and hosts and get notified about abnormal events immediately. Modern SIEM products can also correlate events in internal systems, calculate risks, and generate reports showing patterns in chaotic log data. These systems can also store and archive log data as well as parse it into events and have a query mechanism for better log construction. All these features of an enterprise SIEM solution are crucial for investigating suspicious activity and finding data breaches.

The main reason that companies choose to implement a SIEM solution at enterprise is that centralized logging and event management must comply with many security standards such as the Health Insurance Portability and Accountability Act (HIPAA), Gramm–Leach–Bliley Act (GLBA), Payment Card Industry Data Security Standard (PCI DSS), and Sarbanes–Oxley Act (SOX).

Moreover, properly stored and protected logs can be useful for internal incident investigations and can even be admissible as evidence in court.

In 2006, the National Institute of Standards and Technology (NIST) issued its Special Publication 800-86, known as the Guide to Integrating Forensic Techniques into Incident Response, which contained recommendations for establishing forensic capabilities in security solutions. In addition, the NIST Guide to Computer Security Log Management (Special Publication 800-92) provides recommendations for effective log management.

SIEM systems record and store logs collected from various devices and hosts in a local network, and this information can contain the digital fingerprints of a cyber crime. Malicious activity leaves traces in event log data and syslog data that can be detected with SIEM technologies. Forensic analysis of log data allows security staff to figure out how and when a security breach occurred as well as to determine what systems and sensitive information were compromised and which users violated security protocols.

10 Requirements for Forensic Features in SIEM Solutions

Integrating forensic features into your security solution will make it more effective and valuable for your customers. Thus, in addition to traditional log management and event correlation, you should also collect real-time data for forensic analysis.

However, developing a SIEM solution with forensic capabilities is not an easy task, as digital evidence collected by your product should be admissible in court. Court admissibility can be achieved only if forensic features can ensure adequate protection of log data. Consequently, forensic features should also meet the requirements for forensic products used by certified forensic investigators. Here are 10 requirements for forensic features in SIEM solutions.

1. No Intrusion

The forensic features of your security solution must ensure that collected data is not tampered with in any way. Typically, this is achieved by storing a copy of unmodified log entries as well as normal events in a backend database. Moreover, a SIEM solution should also have built-in functions for periodic backups and restores.

Your system should also have intrusion prevention mechanisms that can block the actions of an attacker who’s attempting to corrupt logs. If it’s problematic to guarantee that data hasn’t been modified, then the system report should provide information for forensic purposes about changes made during data collection and export. Your SIEM solution should also limit access to stored data. For instance, it should support role-based access so that only authorized security staff can access certain data.

2. Integrity

Your SIEM functionality should also allow collecting data and storing it for further forensic analysis in a tamper-proof form. This is usually achieved by using integrity mechanisms, such as running hash checks on blocks of stored log data. Historical log data must be secured either with a checksum in the form of a popular hash – MD5, SHA1, SHA2, etc. – or with a digital signature.

3. Accuracy

SIEM tools should collect all possible information about all activities that occur across the network and also provide information about failures while capturing and processing log data. Ensuring the stable operation of your security solution is crucial if you want to collect information that’s admissible in court and valid for proving compliance. If your SIEM system crashes, your log data will no longer be accurate and may be rejected as evidence. The accuracy of abnormal behavior detection is also important for incident response. All these considerations impose high requirements on the competence of your quality assurance team.

4. Justification

While justification of data for forensic products is achieved through information about a file’s physical path and the offset to the data in this file, your security solution should be able to justify the information it presents for forensic analysis. There must be a way to describe what’s being logged and why as well as how log data is captured, stored, and analyzed.

5. No Assumptions

Often, in pursuit of usability, development teams optimize the log data shown in their system reports.  However, there may be challenges in linking events collected by different log sources. For example, different devices may generate logs for the same user that contain different content. Thus, a SIEM system may simultaneously receive the same data but recorded as different values. For instance, one source many record the IP address of the user while another source records the name of the user but not their IP address. But what if that person used someone else’s computer to perform malicious activity? Thus, the initial log data must be collected, no matter how inconvenient it may seem. Otherwise, invalid assumptions may lead incident investigations in the wrong direction. Assumptions may still be used to provide hints for security staff or forensic investigators, but the conclusions should be made by investigators themselves.

6. High Performance

The amount of information that a SIEM solution needs to process grows dramatically each year. Today, it’s terabytes of data every day. Your SIEM implementation should be able to process an increasing number of separate events per second, so these systems require complex algorithms to process data as fast as possible. This is especially important for forensic features because forensic analysis requires the collection of all possible information, as it all may be necessary for an investigation.

7. Retention

Long-term centralized storage of historical data is necessary to ensure the correlation of data over time and to retain data for forensic analysis. Consider also the amount of database storage, as some regulations may require data to be available for a particular length of time.

8. Data Relevance

Your security solution should include features that allow users to reduce the volume of data provided for forensic analysis by filtering system logs. Moreover, you should be able to narrow down data by keywords or times. While all information should be collected without any assumptions, filters are important for forensic analysis. Filters provide users with only relevant data that relates to the incident under investigation.

9. Timestamps

Timestamps are the most sensible and valuable bit of information that’s extracted from log data. Timestamps are essential for linking the events recorded by your security solution to real-world facts. When analyzing logs from different sources, keep in mind that a host’s internal clock can be inaccurate; consequently, logs may have incorrect timestamps. However, it’s vital to have timestamps that are as precise as possible. Timestamps may be shown truncated to seconds in the UI, but if information is available with microsecond precision, it must be stored with such precision and this information about the original timestamp value should be available for forensic analysis.

The next important aspect is time zone. Information about the time zone that a timestamp is associated with must also be extracted and saved for forensic purposes. If there’s no time zone information available, the logs will be considered unreliable. Inaccurate time zones and timestamps prevent security staff or a forensic investigator from putting the facts on the timeline in the correct order and reconstructing the sequence of events.

10. Commensurable Results

Court investigators are usually pretty conservative about the forensic features and tools they use. Each new solution with forensic features will be compared against the existing ones to make sure that it’s accurate enough and can be trusted. For this reason, the forensic analysis results provided by your security solution should be shown in a familiar way. For instance, if a standard de facto application with forensic features can export query results in PDF format, then your solution should also be able to export query results in PDF and should be able to produce the similar results for forensics even better. Otherwise, your innovation may be rejected because it can’t be compared to results from known tools. For this reason, forensic innovations should be introduced carefully as alternative representations to the familiar way of displaying results.

Conclusion

Developing forensic features can greatly improve the value and effectiveness of your SIEM system, but forensic capabilities must meet certain requirements to ensure the admissibility of captured data in court. To successfully use forensic features, your SIEM solution should ensure proper handling of log data as well as crash resistance, enough space for data storage, and useful filtering of results.

Fortunately, one of Apriorit’s specialties is digital and enterprise security technologies. We would be glad to assist you in building your own solution by applying our experience in all levels of data encryption and system monitoring.

Qt is a popular cross-platform application framework developed by the Qt Company and distributed via both proprietary and open source licenses. Qt is designed for creating applications that can run on different platforms without the need to change codebase for each platform. It’s most often used for creating multi-platform applications and graphical user interfaces (GUIs).

This tutorial will show you how to create a simple Qt application with the Qt Quick graphical user interface. We’ll use CMake as our project format and C++ as the language for our source code. Our example application is simple in design and consists of just one window.

An increasing number of unknown threats are forcing the cybersecurity companies to find new approaches to data protection. Market experts are looking to the use of artificial intelligence and machine learning algorithms for cybersecurity as one of the ways to withstand modern cyber attacks. This article focuses on how artificial intelligence (AI) and machine learning (ML) can improve cybersecurity and explains some of the challenges that development teams may face when implementing AI and ML in their security solutions.  

 

Written by:

Anna Bryk,

Market Research Specialist

 

Contents:

Why Do We Need AI and ML in Cybersecurity?

How Can AI and ML Improve Cybersecurity?

New Approach to Meeting Cybersecurity Challenges

5 Promising Cybersecurity Startups Based on AI and ML

4 Things You Need to Consider when Developing an AI-based Security Solution

Conclusion

Why Do We Need AI and ML in Cybersecurity?

Nowadays, companies use a great number of devices in their operations, and thus the amount of data they need to monitor is also massive. Hackers are looking for any security loopholes to exploit, and cybersecurity vendors are still losing this race.

Traditional security applications only inform about suspicious activity and known attacks that have already happened. Besides, these systems provide too many undifferentiated alerts that need to be interpreted by people. However, the increasing volume of data that needs to be analyzed makes it impossible for people to do all the monitoring.

Thus, there’s a need to computerize security analysis actions previously performed by humans. While signature-based methods are no longer effective, developers are trying to create more complicated security platforms with a new approach for cyber attack detection and response. Machine learning and artificial intelligence for cybersecurity promise to process high volumes of data to detect suspicious activity quickly.

How Can AI and ML Improve Cybersecurity?

AI for cybersecurity

Though sophisticated tools are used for cybersecurity protection, data breaches still happen. Moreover, there’s a risk that detection, investigation, and remediation of damage by security managers will take weeks or even months. Fortunately, cybersecurity solutions based on machine learning raise the prospect of detecting an attack in real time and reducing the time for remediation.

So what is artificial intelligence? AI is often defined as the science of making machines replicate human intelligence. AI involves a great variety of technologies, some of which already exist and others are still under development. Examples of AI applications include intelligent personal assistants like Apple’s Siri, game-playing programs like AlphaGo, and question-answering computer systems for business analytics, such as IBM Watson.

What is machine learning? ML is a subfield of AI that uses mathematical algorithms to find patterns in data and learn from those patterns just like people do. In cybersecurity, ML can detect anomalous behavior of users and systems as well as learn from existing threats and predict unknown threats. The main algorithms used in ML for cybersecurity are based on supervised and unsupervised learning.

Implementing AI and ML in cybersecurity can significantly help in developing more effective security solutions that are able to better protect companies against existing and unknown threats. AI-driven security applications can react to suspicious activity in real time and prevent attacks before they happen. Such solutions can quickly process huge unstructured and hybrid datasets. In addition, advanced technologies reduce the time for investigating attacks and produce fewer false positives. Thanks to ML algorithms, security systems can be self-learning and can augment human decision making.

New Approach to Meeting Cybersecurity Challenges

Developers of many security information and event management (SIEM) applications are trying to implement machine learning. SIEM solutions include event and log management, behavioral analysis, and real-time monitoring of databases and applications. In case of suspicious activity, SIEM applications alert security managers and block access.

Unfortunately, the increased amount of data that needs to be monitored is almost impossible for people to process. And analytics based on predefined rules doesn’t meet the needs of today’s market. Nowadays, it’s not enough just to detect typical anomalies based on simple rules; security solutions should also automatically analyze a huge number of events and provide security managers with few false positives. Advanced analytics can automate the analysis of huge datasets using machine learning algorithms.

According to Gartner’s definition, advanced analytics (AA) is the autonomous processing of data with the help of AI techniques and tools that find deeper correlations, make predictions, and provide recommendations. AA uses a next-generation ML technology called deep learning. Deep learning algorithms can process large volumes of data (or big data) using neural networks that simulate the activity of the human brain. In the case of cybersecurity, big data means a huge number of system objects and user activity indicators, all of which are processed with advanced analytics. Big data integration in cybersecurity is only possible with the implementation of AI and ML.

Advanced analytics for security are implemented in user and entity behavior analytics (UEBA) solutions as additional functionality for SIEM applications. There are also stand-alone UEBA applications on the market. UEBA security software supplies companies with advanced analytics for both user behavior data and data collected from networks, endpoints, and applications. UEBA is a type of ML model that is deployed to recognize and withstand sophisticated cyber attacks.

There are two categories of advanced analytics, each providing different outputs after implementing big data and requiring different levels of human involvement: predictive and prescriptive.

  • Predictive analytics predicts what will happen in the future. Using sophisticated statistics and machine learning techniques, predictive analytics analyzes historical and current data to predict what you should expect, though what actions you should take in response is up to you.

Undeniably, predictive analytics will make your security solution more effective at detecting attacks before data leakage or outside intrusion happens. However, predictive analytics will not provide you with an answer to how you should react if a client’s network is under attack.

  • Prescriptive analytics answers what you should do given particular expected outcomes. Prescriptive analytics not only predicts future events but also analyzes possible outcomes and suggests what actions you should take in order to achieve the best results.

Implementing prescriptive analytics in security solutions will help you detect an attack before it happens. Moreover, a security application based on prescriptive analytics will provide a client with detailed instructions on what they should do in each particular case. For instance, if a user tries to send sensitive data to an external server, the system will advise executing a firewall rule in order to break the connection.

Currently, there are few security solutions on the market that support predictive analytics. One that does is Bottomline Technologies. However, there is no deployment of prescriptive analytics now that could empower AI to enterprise integration and automatically react to threats.

5 Promising Cybersecurity Startups Based on AI and ML

Gartner predicts the further integration of ML and AI in cybersecurity solutions within the next five years. By 2018, Gartner expects that 25% of cybersecurity solutions will involve some form of AI and ML for attack detection and response. Moreover, more than 50% of traditional security solutions will be supplied with UEBA functionality.

In recent years, the cybersecurity industry has been booming with fast-evolving startups based on AI and ML. Understanding modern threats, businesses are gladly investing in promising startups that take an intelligent approach to cybersecurity like Darktrace, CrowdStrike, Hexadite, Cylance, and Amazon Macie, which is former Harvest.AI.

  1. AI cybersecurity startup Darktrace

    Darktrace has recently raised $75 million for developing technology that uses ML algorithms to detect and stop attacks written not only by people but by machines as well. Its Enterprise Immune System is based on AI and unsupervised learning that works similarly to the immune system in the human body. The Enterprise Immune System can identify abnormal activity on a network and inform security managers about the intrusion. Apart from this, the system is also trained to take instant measures for blocking or slowing down an attack.

  2. AI cybersecurity startup CrowdStrike

    CrowdStrike is another cybersecurity startup that already has a billion-dollar valuation. This startup positions its product as next-generation antivirus software that ensures endpoint protection along with endpoint detection and response using supervised ML algorithms for malware detection. The company doesn’t rely solely on ML but also on signatureless AI and indicator-of-attack for preventing unknown attacks in real time.

  3. AI cybersecurity startup Hexadite

    This summer, Microsoft bought Hexadite, an AI-based startup, for $100 million. Hexadite’s technology implements AI and ML algorithms to separate false positives from real malware. The system uses advanced user behavioral analytics and security alerts from other vendors and security solutions to predict potential attacks. In case of malicious activity, the system automatically blocks activity to limits the damage.

  4. AI cybersecurity startup CylanceProtect

    CylancePROTECT is an endpoint protection startup that uses AI and ML algorithms to understand a hacker’s logic. Cylance implemented patented machine learning techniques that are able to detect known malware along with zero-day attacks and block their execution.

  5. AI cybersecurity startup Amazon Macie

    Harvest.AI was recently acquired by Amazon and became the prototype for Amazon Macie, which inherits its application of ML and AI. Amazon Macie helps companies monitor the flow of sensitive data and protect against data leakage. It processes user behavior analytics and blocks ongoing attacks before data is lost.

4 Things You Need to Consider when Developing an AI-based Security Solution

If you want to integrate AI in your startup, you should consider the challenges you may face in developing a UEBA solution that relies on artificial intelligence and machine learning technologies.

  1. Data about known malware and attacks. AI-based security solutions use big data about malicious activity to train their models. Determine how you will acquire the necessary data and from what devices you will receive data for user and entity behavior analytics.
  2. Enterprise security rules. If you want to deploy prescriptive analytics, you should be aware of the security rules for each particular enterprise. A system can propose effective solutions only when it fully understands the operations of a company.
  3. Lack of computing power. AI and ML algorithms require extensive computing capabilities. Even in the case of promising offline results, online testing of your security system may face difficulties because of a lack of computing power.
  4. Lack of experienced data scientists and analysts. These two types of experts are extremely necessary for developing an AI-based security solution. While data scientists should be familiar with data analysis, computer science, and statistical modeling, data analysts should have deep knowledge of mathematics and experience using analytical tools to extract insights from big data. If you’re looking for an outside vendor, find out whether the development team has previously deployed advanced analytics for cybersecurity.

Conclusion

AI with ML solves cybersecurity problems by computerizing the analytical process. Though there’s beena boom of AI-based security startups, none of them realizes the full potential of AI and ML for cybersecurity yet. Implementing AI and ML requires deploying advanced technologies that are not so easy to develop. Apriorit has advanced skills in cybersecurity systems along with experience in data processing and software development. Our team can help you successfully overcome all the complexities of embedding artificial intelligence in your enterprise app.

Subscribe to updates