Introduction #
My knowledge of Rust is truly at zero. I want to learn Rust by reading code and researching when I encounter things I don’t understand. I’m writing this to document my own thought process. It will serve both as a reference I can look back on and potentially help others who are learning Rust from scratch.
The Project #
The project I’ve chosen to read is this one. It’s a simple in-memory key-value store. I thought it would be a nice starting point since it both sets up a server and deals with aspects like concurrency/parallelization.
fn main() -> std::io::Result<()> {
let port = env::args()
.nth(1)
.expect("Port not provided")
.parse::<u16>()
.expect("Incorrect port");
let listener = TcpListener::bind(SocketAddrV4::new(Ipv4Addr::new(127, 0, 0, 1), port))?;
println!("Listening on {}", port);
let pool = ThreadPool::new(12);
let inmem = Arc::new(Mutex::new(InMem::new()));
for stream in listener.incoming() {
let arc = Arc::clone(&inmem);
pool.execute(move || handle_client(&mut stream.unwrap(), arc));
}
Ok(())
}
The code looked familiar up until the definition of the inmem variable. Here we encounter the use of Arc and
Mutex. The only thing I knew about Rust was its ownership model, and I had a hunch these two were related to that
rule. When I asked Claude about it, it explained as best it could, but since I didn’t have a complete understanding of
ownership yet, I felt the need to ask about it first.
Ownership #
To determine when a resource should be closed, each resource is assigned an owner, and when that owner dies, it takes
with it all the resources associated with it. This reminded me quite a bit of the RAII practice. In fact, Rust’s
entire ownership concept was developed drawing inspiration from this. If I were to explain RAII in a few sentences,
this practice stems from stack variables cleaning up their associated resources during unwinding. These resources could
be memory on the heap, a socket, or a database connection.
Destructors in C++ are perfect for this job:
#include <iostream>
#include <fstream>
class FileHandler {
private:
std::ofstream file;
public:
// Constructor acquires the resource
FileHandler(const std::string& filename) {
file.open(filename);
if (file.is_open()) {
std::cout << "File opened successfully.\n";
} else {
std::cout << "Failed to open file.\n";
}
}
// Write to the file
void write(const std::string& text) {
if (file.is_open()) {
file << text << std::endl;
}
}
// Destructor releases the resource
~FileHandler() {
if (file.is_open()) {
file.close();
std::cout << "File closed automatically.\n";
}
}
};
Whenever the scope where the FileHandler is defined is exited, the destructor of this FileHandler is called, thereby closing the file associated with this handler. I actually really like this object-oriented approach in C++. The defer statement in Go gives a more procedural feeling.
I think because I was slightly familiar with these concepts before, the ownership system didn’t seem complicated to me. To restate ownership here, it’s a rule specifying that a resource can have only one owner at a given time. Otherwise, it would be impossible to determine the lifecycle of this resource.
Still, thinking from first principles, I couldn’t help but ask what if it really did have multiple owners and was cleaned up when the last owner died? In that case, we’d have invented reference counting. Questions like why Rust didn’t prioritize RC over single ownership, and what would happen if it did, crossed my mind. Could the compiler be so advanced that it could provide the same degree of safety by adopting RC? Of course, I’m aware that it would make the compiler code much more complex and could significantly extend compilation times. I was just curious if it was possible.
I smiled when I saw that Arc actually stands for Atomic Reference Counting. This reminds me of the famous programming adage: “There is no problem in computer science that can’t be solved using another level of indirection.” That’s exactly what Arc does - it adds a layer of indirection to get around the strict ownership rule, allowing multiple owners through reference counting.
Threading and Concurrency #
I continued reading the code a bit more. move caught my interest here. The move keyword makes it so that the closure
takes ownership of the variables it dynamically refers to. What would happen if it didn’t? To understand this, I
preferred a simpler code snippet:
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let counter = Arc::new(Mutex::new(0));
let mut handles = vec![];
for _ in 0..10 {
let counter = Arc::clone(&counter);
let handle = thread::spawn(move || {
let mut num = counter.lock().unwrap();
*num += 1;
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
println!("Result: {}", *counter.lock().unwrap());
}
What would happen if we removed Arc and Mutex entirely from the code above? In that case, we could only give ownership
of counter to a single thread. What if we borrowed it mutably from all threads? In this case, since the spawned
threads wouldn’t know if they would outlive the parent thread, the compiler wouldn’t allow this code to compile.
I’m jumping from branch to branch a bit, but I wonder how it would be if we used structured concurrency here? Here’s an enlightening article about structured concurrency.
I found the adoption of structured concurrency in Rust to be positive:
use crossbeam::scope;
use std::sync::Mutex;
fn main() {
let counter = Mutex::new(0);
// Create a scope for structured concurrency
scope(|s| {
// Spawn 10 threads that all access the counter through the mutex
for _ in 0..10 {
s.spawn(|_| {
// We still need the mutex for synchronization between threads
let mut num = counter.lock().unwrap();
*num += 1;
});
}
// All threads are automatically joined at the end of the scope
}).unwrap();
println!("Result: {}", *counter.lock().unwrap());
}
In the original code, out of curiosity, I wanted to see if the compiler could catch it if I removed the mutex and kept only the arc.
use std::sync::{Arc, Mutex};
use std::thread;
fn main() {
let counter = Arc::new(0);
let mut handles = vec![];
for _ in 0..10 {
let counter = Arc::clone(&counter);
let handle = thread::spawn(move || {
let mut num = counter;
*num += 1;
});
handles.push(handle);
}
for handle in handles {
handle.join().unwrap();
}
println!("Result: {}", *counter);
}
Good job, Rust:
error[E0594]: cannot assign to data in an `Arc`
--> src/main.rs:12:13
|
12 | *num += 1;
| ^^^^^^^^^ cannot assign
|
= help: trait `DerefMut` is required to modify through a dereference, but it is not implemented for `Arc<i32>`
Since the Arc type doesn’t implement the DerefMut trait, we can’t use the * operator with a mutable operation. I
think trait has the same meaning as interface. There must be minor differences, which we’ll uncover later.
Now let’s see how the mutex implements this trait. let mut num = counter.lock().unwrap(); Here, not the mutex itself
but the lock method returns a type MutexGuard wrapped in a result. The reason it returns a result is that if one of
the threads with access to this mutex panics, the data stored by the mutex can become corrupted. In this case, the mutex
is marked as poisoned. It’s important for subsequent threads to know this. They can check the Err variant and try to
recover the data, or they can completely stop working.
In short, the Arc type is a wrapper type that allows us to do atomic reference counting. We use this type when we want
multiple owners in a multithreaded environment. However, when it comes to exclusive writing, we compose it with a mutex
to finish the job.
Reflections #
Looking back at what I’ve covered today, I’m surprised at how quickly I dove into some advanced Rust concepts. Starting from zero knowledge, I’ve explored:
- The ownership model - the cornerstone of Rust’s memory safety guarantees
- Relationships between Rust’s ownership and C++’s RAII pattern
- Thread safety with Arc (Atomic Reference Counting) and Mutex
- How the compiler enforces safety through traits like DerefMut
- Structured concurrency concepts in Rust
Even though I’ve just scratched the surface, I feel like I’m starting to grasp Rust’s philosophy. The language seems to make you think carefully about data ownership and sharing from the beginning, turning what could be runtime errors in other languages into compile-time checks.
For tomorrow, I want to understand more about borrowing, lifetimes, and how Rust’s type system enforces these concepts. I’m particularly interested in diving deeper into how Mutex works internally and exploring more real-world examples of thread-safe code.
This approach of learning by reading actual code seems promising. Instead of just memorizing syntax, I’m encountering concepts in their natural context and thinking through the design decisions that led to Rust’s unique features.