Created in 2010, Mozilla’s Rust programming language is fast increasing in popularity. Compared to other languages, Rust ensures better performance and improved software security. We’d like to share our knowledge about practical applications of Rust features. This article is the second part of our Rust Programming Language Tutorial, and is written for developers who want to use Rust for software programming.
This part of the Rust programming tutorial describes Rust features that guarantee memory safety. The Rust language achieves memory safety by using ownership and borrowing, mutability and aliasing, option types for pointers, and initialized variables.
Software Designer in System Programming Team
Memory safety is the most prized and advertised feature of Rust. In short, Rust guarantees the absence (or at least the detectability) of various memory-related issues:
- segmentation faults
- use-after-free and double-free bugs
- dangling pointers
- null dereferences
- unsafe concurrent modification
- buffer overflows
These issues are declared as undefined behaviors in C++, but programmers are mostly on their own to avoid them. On the other hand, in Rust, memory-related issues are either immediately reported as compile-time errors or if not then safety is enforced with runtime checks.
The core innovation of Rust is ownership and borrowing, closely related to the notion of object lifetime. Every object has a lifetime: the time span in which the object is available for the program to use. There’s also an owner for each object: an entity which is responsible for ending the lifetime of the object. For example, local variables are owned by the function scope. The variable (and the object it owns) dies when execution leaves the scope.
In this case, the object Foo is owned by the variable v and will die at line 5, when function
Ownership can be transferred by moving the object (which is performed by default when the variable is assigned or used):
Initially, the variable v would be alive for lines 2 through 7, but its lifetime ends at line 4 where v is assigned to u. At that point we can’t use v anymore (or a compiler error will occur). But the object Foo isn’t dead yet; it merely has a new owner u that is alive for lines 4 through 6. However, at line 6 the ownership of Foo is transferred to the function
do_something(). That function will destroy Foo as soon as it returns.
But what if you don’t want to transfer ownership to the function? Then you need to use references to pass a pointer to an object instead:
In this case, the function is said to borrow the object Foo via references. It can access the object, but the function doesn’t own it (i.e. it can’t destroy it). References are objects themselves, with lifetimes of their own. In the example above, a separate reference to v is created for each function call, and that reference is transferred to and owned by the function call, similar to the variable u above.
It’s expected that a reference will be alive for at least as long as the object it refers to. This notion is implicit in C++ references, but Rust makes it an explicit part of the reference type:
The argument v is in fact a reference to Foo with the lifetime ‘a, where ‘a is defined by the function
do_something() as the duration of its call.
C++ can handle simple cases like this just as well. But what if we want to return a reference? What lifetime should the reference have? Obviously, not longer than the object it refers to. However, since lifetimes aren’t part of C++ reference types, the following code is syntactically correct for C++ and will compile just fine:
Though this code is syntactically correct, however, it is semantically incorrect and has undefined behavior if the caller of
some_call() actually uses the returned reference. Such errors may be hard to spot in casual code review and generally require an external static code analyzer to detect.
Consider the equivalent code in Rust:
The returned reference is expected to have the same lifetime as the argument v, which is expected to live longer than the function call. However, the variable w lives only for the duration of
some_call(), so references to it can’t live longer than that. The borrow checker detects this conflict and complains with a compilation error instead of letting the issue go unnoticed.
The compiler is able to detect this error because it tracks lifetimes explicitly and thus knows exactly how long values must live for the references to still be valid and safe to use. It’s also worth noting that you don’t have to explicitly spell out all lifetimes for all references. In many cases, like in the example above, the compiler is able to automatically infer lifetimes, freeing the programmer from the burden of manual specification.
Another feature of the Rust borrow checker is alias analysis, which prevents unsafe memory modification. Two pointers (or references) are said to alias if they point to the same object. Let’s look at the following Rust example:
Here, pointers a and b are aliases of the Foo object owned by c. Modifications performed via a will be visible when b is dereferenced. Usually, aliasing doesn’t cause errors, but there are some cases where it might.
memcpy() function. It can and is used for copying data, but it’s known to be unsafe and can cause memory corruption when applied to overlapping regions:
In the sample above, the first three elements are now undefined because their values depend on the order in which
memcpy() performs the copying:
The ultimate issue here is that the program contains two aliasing references to the same object (the array), one of which is non-constant. If such programs were syntactically incorrect then
memcpy() (and any other function with pointer arguments as well) would always be safe to use.
Rust makes it possible by enforcing the following rules of borrowing:
- At any given time, you can have either but not both of:
- one mutable reference
- any number of immutable references
- References must always be valid.
The second rule relates to ownership, which was discussed in the previous section. The first rule is the real novelty of Rust.
It’s obviously safe to have multiple aliasing pointers to the same object if none of them can be used to modify the object (i.e. they are constant references). If there are two mutable references, however, then modifications can conflict with each other. Also, if there is a const-reference A and a mutable reference B, then presumably the constant object as seen via A can in fact change if modifications are made via B. But it’s perfectly safe if only one mutable reference to the object is allowed to exist in the program. The Rust borrow checker enforces these rules during compilation, effectively making each reference act as a read-write lock for the object.
The following is the equivalent of
memcpy() as shown above:
This won’t compile in Rust, and will throw the following error:
This error signifies the restrictions imposed by the borrow checker. Multiple immutable references are fine. One mutable reference is fine. Different references to different objects are fine. However, you can’t simultaneously have a mutable and an immutable reference to the same object because this is possibly unsafe and can lead to memory corruption.
Not only does this restriction prevent possible human errors, but it in fact enables the compiler to perform some optimizations that are normally not possible in the presence of aliasing. The compiler is then free to use registers more aggressively, avoiding redundant memory access and leading to increased performance.
Another common issue related to pointers and references is null pointer dereferencing. Tony Hoare calls the invention of the null pointer value his billion-dollar mistake, and an increasing number of languages are including mechanisms to prevent it (for example, Nullable types in Java and C#,
std::optional type since C++17).
Rust uses the same approach as C++ for references: they always point to an existing object. There are no null references and hence no possible issues with them. However, smart pointers aren’t references and may not point to objects; there are also cases when you might like to pass a reference to an object and make no reference.
Instead of using nullable pointers, Rust has the Option type. This type has two constructors:
Some (value) – to declare some value
None – to declare the absence of a value
None is functionally equivalent to a null pointer (and in fact has the same representation), while
Some carries a value (for example, a reference to some object).
The main advantage of
Option before pointers is that it’s not possible to accidentally dereference
None, and thus null pointer dereferencing errors are eliminated. To use the value stored in
Option, you need to use safe access patterns:
Every use of
Option acts as a clear marker, so that no object may be present above, as it requires handling in both cases. Furthermore, the
Option type has many utility methods that make it more convenient to use:
Another possible issue with so-called plain old types in C++ is usage of uninitialized variables. Rust requires variables to be initialized before they are used. Most commonly, this is done when variables are declared:
But it’s also possible to first declare a variable and then initialize it later:
Whatever you do, keep in mind that with Rust if you forget to initialize a variable and then accidentally use a garbage value, you’ll get a compilation error. All structure fields must be initialized at construction time as well:
If a field is added to the structure at some point later than all existing constructors, it will generate compilation errors that must be fixed.
At Apriorit, we work with many popular software programming languages. Based on our experience with Rust, we have created this tutorial to help you get familiar with the basics of the language.
In this second part of our Rust programming language tutorial, we provided a detailed overview of Rust features that ensure memory safety, such as ownership and borrowing, mutability and aliasing, option types for pointers, and initialized variables. All these features allow programmers to avoid undefined behaviors that are typical in C++.
If you want to learn more about other features of the Rust language, then check out our next article.