Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation on how objects are persisted in memory #563

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions doc/memory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
## Memory Layout in Ruby

### Overview
Ruby uses a garbage collector (GC) to manage memory. The GC automatically frees up memory that is no longer in use, preventing memory leaks and optimizing performance.
This page explains how Objects are stored in memory, and what optimizations are done by Ruby to improve efficiency.

### Object Header
The way Object are persisted in memory depends on the underlying processor architecture.
Here we are representing how memory is used for a x86_64 architecture.

Every Object has a 16-bytes header defined by the `RBasic` struct:

```
+-------------+-------------+
| flags | klass |
+-------------+-------------+
|<- 8 bytes ->|<- 8 bytes ->|
```

Where flags stores information about the Object, and klass is the reference to the Class object representing the class of the current object.

Flags are comprised of several distinct elements:
```
+----------------+------------------------------------+--------------------------+
| Object type | ruby_fl_type | reserved |
+----------------+------------------------------------+--------------------------+
|<- bits 0 to 5->|<---------- bits 5 to 30 ---------->|<---- bits 31 to 63 ----->|
```

Object type indicates the type of the Object (Class, Array, Hash, Float...).
`ruby_fl_type` are flags used by the Ruby VM and the Garbage collector. For example, when an Object is frozen, the `RUBY_FL_FREEZE` flag is set.

### Heaps

To keep Objects in memory, Ruby uses 5 fixed-size slot heaps, and one general heap.

If the Object is less than 640 bytes, then the object can be stored in one of the 5 fixed-size slot heaps. This is called _embedded_ object allocation.
The heap selected is the smallest one that can contain the object:

| Slot ID | Slot size |
| -- | -- |
| 0 | 40 bytes |
| 1 | 80 bytes |
| 2 | 160 bytes |
| 3 | 320 bytes |
| 4 | 640 bytes |

If the object is larger than 640 bytes, then the object is kept in the general heap. This is called _extended_ object allocation.

The benefit of using fixed-size slot heaps is to avoid memory fragmentation. When an object is freed, the slot can be reused by another object without fearing memory overlap.
The general heap on the other hand will become fragmented over time and need to be compacted at some point to ensure further objects can still be kept in memory.

### Optimizations
Since the smallest slot size is 40 bytes, and the object header is only 16 bytes. It means there's an additional 24 bytes free to be used by the object.

Each object type try to use these 24 bytes to persist their data when possible.
For example, the Array type will allow keeping 3 values in there:
```
+-------------+-------------+-------------+-------------+-------------+
| flags | klass | idx 0 value | idx 1 value | idx 2 value |
+-------------+-------------+-------------+-------------+-------------+
|<- 8 bytes ->|<- 8 bytes ->|<- 8 bytes ->|<- 8 bytes ->|<- 8 bytes ->|
|<----------------------------- 40 bytes ---------------------------->|
```

The astute reader might wonder where the length of that array is actually persisted. The answer is _using flags_!
For Array, Ruby uses 7 of the ruby_fl_typ flags to store the array's length.
Why 7 bits? Because with the largest fixed-size slot heap, we can embed up to 640 bytes, which means that we can store up to 78 values in such array (640 - 16 bytes for the header divided by 8-byte per value), and 78 can be represented using 7 bits.

Loading