Created in 2010, Mozilla’s Rust programming language is fast increasing in popularity. Compared to other languages, Rust ensures better performance and improved software security. We’d like to share our knowledge about practical applications of Rust features. This article is the second part of our Rust Programming Language Tutorial, and is written for developers who want to use Rust for software programming.

This part of the Rust programming tutorial describes Rust features that guarantee memory safety. The Rust language achieves memory safety by using ownership and borrowing, mutability and aliasing, option types for pointers, and initialized variables.

Written by:

Alexey Lozovsky,

Software Designer in System Programming Team

 

Contents:

Guaranteed Memory Safety

   Ownership

   Borowing

   Mutability and Aliasing

   Option Types instead of Null Pointers

   No Uninitialized Variables

Conclusion

Guaranteed Memory Safety

Memory safety is the most prized and advertised feature of Rust. In short, Rust guarantees the absence (or at least the detectability) of various memory-related issues:

  • segmentation faults
  • use-after-free and double-free bugs
  • dangling pointers
  • null dereferences
  • unsafe concurrent modification
  • buffer overflows

These issues are declared as undefined behaviors in C++, but programmers are mostly on their own to avoid them. On the other hand, in Rust, memory-related issues are either immediately reported as compile-time errors or if not then safety is enforced with runtime checks.

Ownership

The core innovation of Rust is ownership and borrowing, closely related to the notion of object lifetime. Every object has a lifetime: the time span in which the object is available for the program to use. There’s also an owner for each object: an entity which is responsible for ending the lifetime of the object. For example, local variables are owned by the function scope. The variable (and the object it owns) dies when execution leaves the scope.

1   fn f() {
2       let v = Foo::new(); // ----+ v's lifetime
3                           //     |
4       /* some code */     //     |
5   }                       // <---+

In this case, the object Foo is owned by the variable v and will die at line 5, when function f() returns.

Ownership can be transferred by moving the object (which is performed by default when the variable is assigned or used):

1  fn f() {
2      let v = Foo::new();     // ----+ v's lifetime
3       {                       //     |
4           let u = v;          // <---X---+ u's lifetime
5                               //         |
6           do_something(u);    // <-------X
7       }                       //
8   }                           //

Initially, the variable v would be alive for lines 2 through 7, but its lifetime ends at line 4 where v is assigned to u. At that point we can’t use v anymore (or a compiler error will occur). But the object Foo isn’t dead yet; it merely has a new owner u that is alive for lines 4 through 6. However, at line 6 the ownership of Foo is transferred to the function do_something(). That function will destroy Foo as soon as it returns.

Borrowing

But what if you don’t want to transfer ownership to the function? Then you need to use references to pass a pointer to an object instead:

1   fn f() {
2       let v = Foo::new();     // ---+ v's lifetime
3                               //    |
4       do_something(&v);       // :--|----.
5                               //    |     } v's borrowed
6       do_something_else(&v);  // :--|----'
7   }                           // <--+

In this case, the function is said to borrow the object Foo via references. It can access the object, but the function doesn’t own it (i.e. it can’t destroy it). References are objects themselves, with lifetimes of their own. In the example above, a separate reference to v is created for each function call, and that reference is transferred to and owned by the function call, similar to the variable u above.

It’s expected that a reference will be alive for at least as long as the object it refers to. This notion is implicit in C++ references, but Rust makes it an explicit part of the reference type:

fn do_something<’a>(v: &’a Foo) {
    // something with v
}

The argument v is in fact a reference to Foo with the lifetime ‘a, where ‘a is defined by the function do_something() as the duration of its call.

C++ can handle simple cases like this just as well. But what if we want to return a reference? What lifetime should the reference have? Obviously, not longer than the object it refers to. However, since lifetimes aren’t part of C++ reference types, the following code is syntactically correct for C++ and will compile just fine:

const Foo& some_call(const Foo& v)
{
    Foo w;
 
    /* 10 lines of complex code using v and w */
 
    return w; // accidentally returns w instead of v
}

Though this code is syntactically correct, however, it is semantically incorrect and has undefined behavior if the caller of some_call() actually uses the returned reference. Such errors may be hard to spot in casual code review and generally require an external static code analyzer to detect.

Consider the equivalent code in Rust:

fn some_call(v: &Foo) -> &Foo {// ------------------+ expected
    let w = Foo::new();        // ---+ w's lifetime | lifetime
                               //    |              | of the
    return &w;                 // <--+              | returned
}                              //                   | value
                               // <-----------------+

The returned reference is expected to have the same lifetime as the argument v, which is expected to live longer than the function call. However, the variable w lives only for the duration of some_call(), so references to it can’t live longer than that. The borrow checker detects this conflict and complains with a compilation error instead of letting the issue go unnoticed.

error[E0597]: `w` does not live long enough
  --> src/main.rs:10:13
   |
10 |     return &w;
   |             ^ does not live long enough
11 | }
   | - borrowed value only lives until here
   |

The compiler is able to detect this error because it tracks lifetimes explicitly and thus knows exactly how long values must live for the references to still be valid and safe to use. It’s also worth noting that you don’t have to explicitly spell out all lifetimes for all references. In many cases, like in the example above, the compiler is able to automatically infer lifetimes, freeing the programmer from the burden of manual specification.

Mutability and Aliasing

Another feature of the Rust borrow checker is alias analysis, which prevents unsafe memory modification. Two pointers (or references) are said to alias if they point to the same object. Let’s look at the following Rust example:

Foo c;
Foo *a = &c;
const Foo *b = &c;

Here, pointers a and b are aliases of the Foo object owned by c. Modifications performed via a will be visible when b is dereferenced. Usually, aliasing doesn’t cause errors, but there are some cases where it might.

Consider the memcpy() function. It can and is used for copying data, but it’s known to be unsafe and can cause memory corruption when applied to overlapping regions:


char array[5] = { 1, 2, 3, 4, 5 }; const char *a = &array[0]; char *b = &array[2]; memcpy(a, b, 3);

In the sample above, the first three elements are now undefined because their values depend on the order in which memcpy() performs the copying:

{ 3, 4, 5, 4, 5 }    // if the elements are copied from left to right
{ 5, 5, 5, 4, 5 }    // if the elements are copied from right to left
  

The ultimate issue here is that the program contains two aliasing references to the same object (the array), one of which is non-constant. If such programs were syntactically incorrect then memcpy() (and any other function with pointer arguments as well) would always be safe to use.

Rust makes it possible by enforcing the following rules of borrowing:

  1. At any given time, you can have either but not both of:
    • one mutable reference
    • any number of immutable references
  2. References must always be valid.

The second rule relates to ownership, which was discussed in the previous section. The first rule is the real novelty of Rust.

It’s obviously safe to have multiple aliasing pointers to the same object if none of them can be used to modify the object (i.e. they are constant references). If there are two mutable references, however, then modifications can conflict with each other. Also, if there is a const-reference A and a mutable reference B, then presumably the constant object as seen via A can in fact change if modifications are made via B. But it’s perfectly safe if only one mutable reference to the object is allowed to exist in the program. The Rust borrow checker enforces these rules during compilation, effectively making each reference act as a read-write lock for the object.

The following is the equivalent of memcpy() as shown above:

let mut array = [1, 2, 3, 4, 5];
let a = &mut array[0..2];
let b = &    array[2..4];
a.copy_from_slice(b);

This won’t compile in Rust, and will throw the following error:

error[E0502]: cannot borrow `array` as immutable because it is also borrowed as mutable
 --> src/main.rs:4:14
  |
3 |     let a = &mut array[0..2];
  |                  ----- mutable borrow occurs here
4 |     let b = &array[2..4];
  |              ^^^^^ immutable borrow occurs here
5 |     a.copy_from_slice(b);
6 | }
  | - mutable borrow ends here

This error signifies the restrictions imposed by the borrow checker. Multiple immutable references are fine. One mutable reference is fine. Different references to different objects are fine. However, you can’t simultaneously have a mutable and an immutable reference to the same object because this is possibly unsafe and can lead to memory corruption.

Not only does this restriction prevent possible human errors, but it in fact enables the compiler to perform some optimizations that are normally not possible in the presence of aliasing. The compiler is then free to use registers more aggressively, avoiding redundant memory access and leading to increased performance.

Option Types instead of Null Pointers

Another common issue related to pointers and references is null pointer dereferencing. Tony Hoare calls the invention of the null pointer value his billion-dollar mistake, and an increasing number of languages are including mechanisms to prevent it (for example, Nullable types in Java and C#, std::optional type since C++17).

Rust uses the same approach as C++ for references: they always point to an existing object. There are no null references and hence no possible issues with them. However, smart pointers aren’t references and may not point to objects; there are also cases when you might like to pass a reference to an object and make no reference.

Instead of using nullable pointers, Rust has the Option type. This type has two constructors:

Some (value) – to declare some value

None – to declare the absence of a value

None is functionally equivalent to a null pointer (and in fact has the same representation), while Some carries a value (for example, a reference to some object).

The main advantage of Option before pointers is that it’s not possible to accidentally dereference None, and thus null pointer dereferencing errors are eliminated. To use the value stored in Option, you need to use safe access patterns:

match option_value {
    Some(value) => {
        // use the contained value
    }
    None => {
        // handle absence of value
    }
}
 
if let Some(value) = option_value {
    // use the contained value
}
 
let value = option_value.unwrap(); // throws an exception if option_value is None

Every use of Option acts as a clear marker, so that no object may be present above, as it requires handling in both cases. Furthermore, the Option type has many utility methods that make it more convenient to use:

// Falling back to a default value:
let foo_enabled: bool = configuration.foo_enabled.unwrap_or(true);
 
// Applying a conversion if the Option contains a value
// or leaving it None if Option is None:
let maybe_length: Option<usize> = maybe_string.map(|s| s.len());
 
// Options can be compared for equality and ordering (given that
// the wrapped values can be compared).
let maybe_number_a = Some(1);
let maybe_number_b = Some(9);
let maybe_number_c = None;
assert_eq!(maybe_number_a < maybe_number_b, true);
assert_eq!(maybe_number_a < maybe_number_c, false); // None is less than Some
assert_eq!(maybe_number_c < maybe_number_b, true);
assert_eq!(maybe_number_a != maybe_number_c, true);

No Uninitialized Variables

Another possible issue with so-called plain old types in C++ is usage of uninitialized variables. Rust requires variables to be initialized before they are used. Most commonly, this is done when variables are declared:

let array_of_ten_zeros = [0; 10];

But it’s also possible to first declare a variable and then initialize it later:

let mut x;
// x is left truly uninitialized here; the assembly
// will not contain any actual memory assignment
 
loop {
    if something_happens() {
        x = 1;
        println!(“{}”, x); // This is okay because x is initialized now
    }
 
    println!(“{}”, x);     // But this line will cause a compilation error
                           // because x may still not be initialized here
 
    if some_condition() {
        x = 2;
        break;
    }
    if another_condition() {
        x = 3;
        break;
    }
}
 
// The compiler knows that it is not possible to exit the loop
// without initializing x, so this is totally safe:
println!(“{}”, x);

Whatever you do, keep in mind that with Rust if you forget to initialize a variable and then accidentally use a garbage value, you’ll get a compilation error. All structure fields must be initialized at construction time as well:

let foo = Foo {
    bar: 5,
    baz: 10,
};

If a field is added to the structure at some point later than all existing constructors, it will generate compilation errors that must be fixed.

Conclusion

At Apriorit, we work with many popular software programming languages. Based on our experience with Rust, we have created this tutorial to help you get familiar with the basics of the language.

In this second part of our Rust programming language tutorial, we provided a detailed overview of Rust features that ensure memory safety, such as ownership and borrowing, mutability and aliasing, option types for pointers, and initialized variables. All these features allow programmers to avoid undefined behaviors that are typical in C++.

If you want to learn more about other features of the Rust language, then check out our next article.

Subscribe to updates