Rust is becoming more widespread among developers who want to create fast and safe software. Apriorit works with the Rust programming language, and our experience is the basis for this tutorial. This article is the last part of our Rust programming language tutorial, which is useful for anyone who wants to get familiar with the basics of Rust.

In the first part of our tutorial, we describe such features as zero-cost abstractions and move semantics. The second part is dedicated to Rust features that guarantee memory safety, and the third part covers such features as threads without data races and trait-based generics.

In this last part, we’ll tell you about pattern matching, automatically deducing types using type inference, and ensuring minimal runtime in the Rust language. We’ll also explain how you can easily call C from Rust.  

 

Written by:

Alexey Lozovsky,

Software Designer in System Programming Team


 

Contents:

Pattern Matching

Type Inference

Minimal Runtime

Efficient C Bindings

    Calling C from Rust

    The Libc Crate and Unsafe Blocks

    Beyond Primitive Types

    Calling Rust from C

Conclusion

Pattern Matching

Similar to C++, Rust has enumeration types:

enum Month {
    January, February, March, April, May, June, July,
    August, September, October, November, December,
}

It also has a multiple-choice construction to operate on them:

match month {
    Month::December | Month::January | Month::February
        => println!(“It’s winter!”),
    Month::March | Month::April | Month::May
        => println!(“It’s spring!”),
    Month::June | Month::July | Month::August
        => println!(“It’s summer!”),
    Month::September | Month::October | Month::November
        => println!(“It’s autumn!”),
}

However, match has more features than a simple switch. The most crucial difference is that matching must be exhaustive: the match clause must handle all possible values of the expressions being matched. This eliminates a typical error in which switch statements break when an enumeration is extended later with new values. Of course, there’s also a default catch-all option that matches any value:

match number {
    0..9 => println!(“small number”),
    10..100 if number % 2 == 0 => {
        println!(“big even number”);
    }
    _ => println!(“some other number”),
}

Another important feature of Rust enumerations is that they can carry values, implementing discriminated unions safely.

enum Color {
    Red, Green, Blue,
    RGB(u32, u32, u32),
    CMYK(u32, u32, u32, u32),
}

Pattern matching can be used to match against possible options and extract values stored in a union:

match some_color {
    Color::Red => println(“Pure red”),
    Color::Green => println(“Pure green”),
    Color::Blue => println(“Pure blue”),
    Color::RGB(red, green, blue) => {
        println(“Red: {}, green: {}, blue: {}”, red, green, blue);
    }
    Color::CMYK(cyan, magenta, yellow, black) => {
        println(“Cyan: {}, magenta: {}, yellow: {}, black: {}”,
            cyan, magenta, yellow, black);
    }
}

Unlike C and C++ unions, Rust makes it impossible to choose an incorrect branch when unpacking a union.

Type Inference

Rust uses a static type system, which means that types of variables, function arguments, structure fields, and so on must be known at compile time; the compiler will check that correct types are used everywhere. However, Rust also uses type inference, which allows the compiler to automatically deduce types based on how variables are used.

This is very convenient because you no longer need to explicitly state types, which in some cases may be cumbersome (or impossible) to write. The auto keyword in C++ serves the same purpose:

std::vector<std::map<std::string, std::vector<Object>>> some_map;
 
// Iterator types can easily become a mess:
for (const auto &it : some_map)
{
    /* ... */
}
 
// Lambda functions can only be used with auto;
// their exact type cannot be expressed in C++
auto compare_by_cost = [](const Foo &lhs, const Foo &rhs) { return a.cost < b.cost };

However, Rust also considers future uses of a variable to deduce its type – not only the initializer – allowing programmers to write code like this:

let v = 10;               // v’s type is some integer (based on the constant),
                          // but the exact type (i32, u8, etc.) is not yet known
 
let mut vec = Vec::new(); // vec’s type is some Vec<T>, where T may be anything
 
vec.push(v);              // after this line, the compiler knows that T == v’s type
 
let s = v + vec.len();    // vec.len() returns “usize”, so this must be the type
                          // of v (as another addend) and s (as a sum), and vec
                          // is now also known to have type Vec<usize>
 
println!(“{}: {:?}”, s, vec); // prints 11: [10]

Rust uses the widely known and thoroughly researched Hindley-Milner inference algorithm. This algorithm is most commonly used in functional programming languages. It can handle global type inference (inferring all types in an entire program, even the types of function arguments, returns, structure fields, etc.) But global type inference can be slow in large projects and can cause types to change with unrelated changes in the code base. Thus, Rust uses inference only for local variables. You must explicitly write types for arguments and structure fields. This strikes a good balance between expressibility, speed, and robustness. Types also make good documentation for functions, methods, and structures.

Minimal Runtime

Runtime is the language support library that’s embedded into every program and provides essential features to the Rust programming language. The Java Virtual Machine can be thought of as the runtime of the Java language, for example, as it provides features like class loading and garbage collection. The size and complexity of the runtime contributes significantly to start-up and runtime overhead. For example, the JVM requires a non-negligible amount of time to load classes, warm up the JIT compiler, collect garbage, and so on.

Rust doesn’t have any garbage collection, virtual machine bytecode interpreter, or lightweight thread scheduler running in background. The code you write is exactly what’s executed by the CPU. Some parts of the Rust standard library can be considered the “runtime,” providing support for heap allocation, backtraces, stack unwinding, and stack overflow guards. The standard library also has some minor amount of global initialization code, similar to the initialization code of a C runtime library that sets up the stack, calls global constructors, and so on before control is transferred to the main() function. (You can compile Rust programs without the standard library if you don’t need it, thus avoiding this overhead.)

In short, Rust can be used for really low-level work like bare-metal programming, device drivers, and operating system kernels:

Furthermore, the absence of a complex runtime simplifies embedding Rust modules into programs written in other languages. For example, you can easily write JNI code for Java or extensions for dynamic languages like Python, Ruby, or Lua.

Efficient C Bindings

There’s more than one programming language in the world, so it’s not surprising that you might want to use libraries written in languages other than Rust. Conventionally, libraries provide a C API because C is a ubiquitous language, the common denominator of programming languages. Rust is able to easily communicate with C APIs, without any overhead, and use its ownership system to provide significantly stronger safety guarantees for them.

Calling C from Rust

Let’s look at a simple example. Consider the following C library for adding numbers (here we take it easy and use a regular for loop, but we could do something clever with AVX instructions):

/**
 * Sum some numbers.
 *
 * @param numbers [in] pointer to the numbers to be summed
 *                      must not be NULL and must point to at least
 *                      `count` elements
 * @param count [in]    number of numbers to be summed
 *
 * @returns sum of the provided numbers.
 */
int sum_numbers(const int *numbers, size_t count)
{
    int sum = 0;
 
    for (size_t i = 0; i < count; i++)
    {
        sum += numbers[i];
    }
 
    return sum;
}

Note that some parts of this function API are described formally by the argument types, but some things are only specified in the documentation. For example, we can only infer that we can’t pass NULL for a numbers argument and that there must be at least count numbers available. And only the common sense of a C programmer tells us that the function won’t call free() for the numbers array.

Here’s how we can call this function from Rust:

extern crate libc;
 
extern {
    fn sum_numbers(numbers: *const libc::c_int, count: libc::size_t)
        -> libc::c_int;
}
 
fn main() {
    let array = [1, 2, 3, 4, 5];
    let sum = unsafe { sum_numbers(array.as_ptr(), array.len()) };
    println!(“Sum: {}”, sum); // ===> prints “15”
}

As you can see, there’s no syntactical overhead in calling an external function written in C (other than spelling out the prototype of the function). It’s just like calling a native Rust function. If you look at the generated assembly code, you can see that this function call has no runtime overhead as well:

            leaq      32(%rsp), %rdi
            movl    $5, %esi
            callq     sum_numbers@PLT
            movl    %eax, 12(%rsp)

There’s no hidden boxing and unboxing, re-allocating of the array, obligatory safety checks, or other things. We see exactly the same machine code that a C compiler would have generated for the same library function call.

The Libc Crate and Unsafe Blocks

However, there are some details in the above code that require further explanation – first of all, the libc crate. This is a wrapper library that provides types and functions of the C standard library to Rust. Here you can find all the usual types, constants, and functions:

  • libc::c_uint (unsigned int type)
  • libc::stat (struct stat structure)
  • libc::pthread_mutex_t (pthread_mutex_t typedef)
  • libc::open (open(2) system call)
  • libc::reboot (reboot(2) system call)
  • libc::EINVAL (EINVAL constant)
  • libc::SIGSEGV (SIGSEGV constant)
  • and many more, depending on the platform you compile on

Not only can you use “normal” C libraries via the Rust Foreign Function Interface – you can also readily use the system API via libc crate.

Another catch lies in the unsafe block:

  
let sum = unsafe { sum_numbers(array.as_ptr(), array.len()) };

As the sum_numbers() function is external, it doesn’t automatically provide the degree of safety provided by native Rust functions. For example, Rust will allow you to pass a NULL pointer as the first argument and this will cause an undefined behavior (just as it would in C). The function call isn’t safe, so it must be wrapped in an unsafe block which effectively says “Compiler, you have my word that this function call is safe. I have verified that the arguments are okay, that the function won’t compromise Rust safety guarantees, and that it won’t cause undefined behavior.”

Just as in C, the programmer is ultimately responsible for guaranteeing that the program doesn’t cause undefined behavior. The difference here is that with C you must manually do this at all times, in all parts of the code, for every library you use. On the other hand, in Rust you must manually verify safety only inside unsafe blocks. All other Rust code (outside unsafe blocks) is automatically safe, as routinely verified by the Rust compiler.

Herein lies the power of Rust: you can provide safe wrappers for unsafe code and thus avoid tedious, manual safety verifications in the consumer code. For example, the sum_numbers function can be wrapped like this:

fn sum_numbers(numbers: &[libc::c_int]) -> libc::c_int {
    // This is safe because Rust slices are always non-NULL
    // and are guaranteed to be long enough
    unsafe { sum_numbers(numbers.as_ptr(), numbers.len()) }
}

Now the external function has a safe interface. It can be readily used by idiomatic Rust code without unsafe blocks. Callers of the function don’t need to be aware of the actual safety requirements of its native C implementation. And it’s still as fast as the original!

Beyond Primitive Types

Aside from primitive types like libc::c_int and pointers, Rust can use other C types as well.

Rust structs can be made compatible with C structs via a #[repr] annotation:

#[repr(C)]
struct UUID 
{
    time_low:  u32,
    time_mid:  u16,
    time_high: u16,
    sequence:  u16,
    node:      [u8; 6],
};
struct UUID
{
    uint32_t time_low;
    uint16_t time_mid;
    uint16_t time_high;
    uint16_t sequence;
    uint8_t node[6];
};

Such structures can be passed by value or by pointer to C code, as they’ll have the same memory layout as their C counterparts used by a C compiler. (Obviously, the fields can only have types that C can understand.)

C unions can also be directly represented in Rust:

	 union TypePun
{
    f: f32,
    i: i32,
};
	 union TypePun
{
    float f;
    int   i;
};

As in C, unions in Rust are untagged. That is, they don’t store the runtime type of the value inside them. The programmer is responsible for accessing union fields correctly. The compiler can’t check this automatically, so Rust unions require an explicit unsafe block when accessing their fields both for reading and writing.

Simple enumerations are also compatible with C:

enum Options
{
    ONE = 0,
    TWO,
    THREE,
}
enum Options
{
    ONE,
    TWO,
    THREE,
};

However, you can’t use advanced features of Rust enum types when calling C code. For instance, you can’t directly pass Option<T> or Result<T> values to C.

Rust functions can be converted into C function pointers given that the argument types are actually compatible and the C ABI is used:

fn launch_native_thread() {
    let name = "Ferris";
    // We’re going to launch a native thread via pthread_create() from libc.
    // This is an external function, so calling it is unsafe in Rust (think
    // about exception boundaries, for example).
    unsafe {
        let mut thread = 0;
        libc::pthread_create(&mut thread, // out-argument for pthread_t
            ptr::null(),                  // in-argument of pthread_attr_t
            thread_body,                  // thread body (as a C callback)
            mem::transmute(&name)         // thread argument (requires a cast)
        );
        libc::pthread_join(thread, ptr::null_mut());
    }
}
 
// Here’s our thread body with C ABI written in Rust
extern "C" fn thread_body(arg: *mut libc::c_void) -> *mut libc::c_void {
    // We need to cast the argument back to the original reference to &str.
    // This is unsafe (from the Rust compiler’s point of view), but we know
    // what kind of data we have put into this void*
    let name: &&str = unsafe { mem::transmute(arg) };
    println!("Hello {} from Rust thread!", name);
    return ptr::null_mut();
}

Calling Rust from C

Native Rust functions and types can be made available to C code just as easily as you can call C from Rust. Let’s reverse the example with the sum_numbers() function and implement it in Rust instead:

#[no_mangle]
pub extern “C” fn sum_numbers(numbers: *const libc::c_int, count: libc::size_t)
    -> libc::c_int
{
    // Convert the C pointer-to-array into a native Rust slice of an array.
    // This is not safe per se because the “numbers” pointer may be NULL
    // and the “count” value may not match the actual array length.
    //
    // As with C, we’ll require the caller of this function to ensure
    // that these safety requirements are observed and will not check
    // them explicitly here.
    let rust_slice = unsafe { from_raw_parts(numbers, count) };
 
    // Rust slice types already have a handy method for summing their
    // elements. Let’s use it here.
    return rust_slice.sum();
}

And that’s it. The #[no_mangle] attribute prevents symbol mangling (so that the function is exported with the exact name “sum_numbers”). The extern directive specifies that the function should have the C ABI instead of the native Rust ABI. With this, any C program can link to a library written in Rust and can easily use our function:

// Declare the function prototype for C
int sum_numbers(const int *numbers, size_t count);
 
int main()
{
    int numbers[] = { 1, 2, 3, 4, 5 };
    int sum = sum_numbers(numbers, 5);
    printf(“Sum is %d\n”, sum);
}

Calling a Rust library in C is as easy as calling a native C library. There are no required conversions, no Rust VM context needs to be initialized and passed as an additional argument, and there’s no overhead aside from the regular function call.

Conclusion

As you can see, Rust ensures better safety, concurrency, and speed than other popular languages. This is achieved due to the absence of garbage collection, runtime overhead, and data races, as well as efficient binding with other languages and other useful features.

In our next article dedicated to the Rust language, we’ll compare Rust with another popular programming language: C++.

This Rust programming tutorial is based on the experience of our Apriorit software development team. We would be glad to assist you with software programming in Rust. Get in touch with us!

Subscribe to updates