How to Convert Load and Store Instructions to Machine Code in ARM

How to Convert Load and Store Instructions to Machine Code in ARM
Here we’ll explore how to convert load and store instructions to machine code in ARM. ARM assembly language is a low-level programming language that provides a direct interface to the ARM architecture. Understanding how to convert assembly instructions into machine code is essential for anyone working with ARM processors, especially in embedded systems or performance-critical applications. In ARM assembly, load and store instructions are used to transfer data between registers and memory. These instructions are fundamental for manipulating data in ARM architecture. The process of converting these instructions into machine code involves understanding the instruction format, bit fields, and how to encode various addressing modes.
Instruction Format Overview
ARM load/store instructions follow this general bit-field layout (32-bit):
Field | Bits | Description |
---|---|---|
cond | 31–28 | Condition: Determines if the instruction will execute. |
op | 27–26 | Opcode bits: Identify the instruction type(01). |
I | 25 | Immediate bit: 0 if offset is a shifted register, 1 if immediate. |
p | 24 | Pre/Post indexing: 1 for pre-indexed addressing. |
u | 23 | Up/Down bit: 1 for a positive offset, 0 for a negative offset. |
b | 22 | Byte/Word: 1 for byte access (LDRB/STRB), 0 for word access. |
w | 21 | Write-back: 1 if the base register is updated. |
L | 20 | Load/Store: 1 for load (LDR), 0 for store (STR). |
Rn | 19–16 | Base register. |
Rd | 15–12 | Destination (for LDR) or source (for STR) register. |
offset | 11–0 | The offset, which can be an immediate value or a register offset (possibly with shift). |
Summary of the fields:
31–28 | 27–26 | 25 | 24 | 23 | 22 | 21 | 20 | 19–16 | 15–12 | 11–0 |
---|---|---|---|---|---|---|---|---|---|---|
cond. | op | I | p | u | b | w | L | Rn | Rd | offset |
Addressing Modes in ARM Load/Store Instructions
Addressing Mode | Assembly Mnemonic | Effective Address | Final Value in R1 |
---|---|---|---|
Pre-indexed, base (unchanged) | LDR R0, [R1, #x] | R1 + x | R1 |
Pre-indexed, base (updated) | LDR R0, [R1, #x]! | R1 + x | R1 + x |
Post-indexed, base (unchanged) | LDR R0, [R1], #x | R1 | R1 + x |
Post-indexed, base (updated) | LDR R0, [R1], #x! | R1 | R1 + x |
Example 1: Encoding LDR R2, [R0, #4]
This instruction loads a word from memory into R2. The address is calculated as the sum of R0 and an immediate offset of 4.
Breakdown:
- cond (31–28):
1110
(Always execute) - op (27–26): Load/store instructions typically use an opcode value(01) that distinguishes them from data processing instructions.
- I (25):
1
(Immediate offset) - p (24):
1
(Pre-indexed addressing) - u (23):
1
(Positive offset, since #4 is positive) - b (22):
0
(Word access) - w (21):
0
(No write-back since no exclamation mark is used) - l (20):
1
(Load instruction) - Rn (19–16): Base register R0 →
0000
- Rd (15–12): Destination register R2 →
0010
- offset (11–0): Immediate 4 →
000000000100
31–28 | 27–26 | 25 | 24 | 23 | 22 | 21 | 20 | 19–16 | 15–12 | 11–0 |
---|---|---|---|---|---|---|---|---|---|---|
cond. | op | I | p | u | b | w | L | R0 | R2 | offset |
1110 | 01 | 1 | 1 | 1 | 0 | 0 | 1 | 0000 | 0010 | 000000000100 |
Full 32-bit Binary:
1110 0111 1001 0000 0010 000000000100
This corresponds to the hexadecimal value:
0xE7902004
Example 2: Encoding LDRB R2, [R0, #4]!
The LDRB instruction loads a byte from memory. Compared to Example1, the main differences are:
- The b (22) bit is set to
1
(byte access instead of word access). - The w (21) bit is set to
1
(write-back enabled due to!
). - The instruction updates R0 after loading the byte.
31–28 | 27–26 | 25 | 24 | 23 | 22 | 21 | 20 | 19–16 | 15–12 | 11–0 |
---|---|---|---|---|---|---|---|---|---|---|
cond. | op | I | p | u | b | w | L | R0 | R2 | offset |
1110 | 01 | 1 | 1 | 1 | 1 | 1 | 1 | 0000 | 0010 | 000000000100 |
Example 3: STRB R2, [R0, #4]!
For a store byte instruction with write-back, the fields change as follows:
- l (20):
0
(Store operation) - b (22):
1
(Byte access) - w (21):
1
(Write-back, indicated by the exclamation mark)
Other fields:
- Rn: R0 →
0000
- Rd: R2 →
0010
- offset: 4 →
000000000100
31–28 | 27–26 | 25 | 24 | 23 | 22 | 21 | 20 | 19–16 | 15–12 | 11–0 |
---|---|---|---|---|---|---|---|---|---|---|
cond. | op | I | p | u | b | w | L | R0 | R2 | offset |
1110 | 01 | 1 | 1 | 1 | 1 | 1 | 0 | 0000 | 0010 | 000000000100 |
Example 4: LDRB R2, [R0, #-4]
When the offset is negative (here, -4), the u bit (bit 23) is set to 0
to indicate subtraction from the base register.
- u (23):
0
for a negative offset.
31–28 | 27–26 | 25 | 24 | 23 | 22 | 21 | 20 | 19–16 | 15–12 | 11–0 |
---|---|---|---|---|---|---|---|---|---|---|
cond. | op | I | p | u | b | w | L | R0 | R2 | offset |
1110 | 01 | 1 | 1 | 0 | 1 | 0 | 1 | 0000 | 0010 | 000000000100 |
Example 5: LDR R2, [R0, R1, LSL #3]!
This example shows using a register offset with a shift. Here, the offset is not an immediate value but a shifted register.
Breakdown of the Offset Field:
For register-offset addressing with a shift:
- Bits for the shift amount (typically a 5-bit field): For #3, that is
00011
. - Bits for the shift type (2 bits): For LSL, this is
00
. - An extra bit (bit 4) is used to indicate that the shift amount comes from a register (set to
1
for register-defined shifts) or immediate (set to0
for immediate shifts). In our examples above, since we're using an immediate shift amount, this bit is set accordingly. - Rn: Base register R0 →
0000
- Rd: Destination register R2 →
0010
- Extra details: If the instruction ends with an exclamation mark (
!
), then w (write-back) should be set to1
.
For this example, let’s assume:
- I (25): remains
1
if we are using an immediate offset format; however, when combining a register with a shift, the encoding is slightly different. - For a register offset with shift, you’ll typically use the modified format:
- Instead of a pure immediate 12-bit offset, the lower bits are divided into a shift field.
- For instance, the offset might encode:
- Shift amount (bits 11–7):
00011
(for #3) - Shift type (bits 6–5):
00
(LSL) - Bit 4:
0
if using an immediate shift amount - Rm (bits 3–0): R1 →
0001
- Shift amount (bits 11–7):
Putting it all together, with write-back (w = 1) and a load (l = 1), the fields become:
-
cond (31–28):
1110
(Always execute) -
op (27–26): (Load/store opcode fields)
-
I (25): Usually
0
when the offset is specified by a register (if using register-defined shift) -
p (24):
1
for pre-indexing -
u (23):
1
(Positive offset) -
b (22):
0
(Word access, not byte) -
w (21):
1
(Write-back due to the exclamation mark) -
l (20):
1
(Load instruction) -
Rn (19–16): R0 →
0000
-
Rd (15–12): R2 →
0010
-
Offset (11–0): Encodes the register offset with shift:
- Shift amount:
00011
(for #3) - Shift type:
00
(LSL) - Bit 4:
0
(indicating immediate shift amount in this encoding scheme) - Rm: R1 →
0001
- Shift amount:
-
More details on shifting operations:
Instruction Encoding with Shift Operations
LDR R2, [R0, R1, LSL #3]!31–28 | 27–26 | 25 | 24 | 23 | 22 | 21 | 20 | 19–16 | 15–12 | 11–0 |
---|---|---|---|---|---|---|---|---|---|---|
cond. | op | I | p | u | b | w | L | R0 | R2 | offset |
1110 | 01 | 0 | 1 | 1 | 0 | 1 | 1 | 0000 | 0010 | 000110000100 |
Conclusion
In this post, we’ve covered the ARM load/store instruction format and demonstrated how different instructions are encoded:
- LDR R2, [R0, #4]! uses an immediate positive offset with write-back enabled.
- LDRB R2, [R0, #4]! changes the byte/word bit (b) to 1 and enables write-back.
- STRB R2, [R0, #4]! sets write-back (w) and changes the load/store bit (l).
- LDRB R2, [R0, #-4] uses a negative offset by setting u to 0.
- LDR R2, [R0, R1, LSL #3]! uses a register offset with a shifted register, incorporating both a shift amount and a write-back.
Understanding these bit fields and how they are arranged in the 32-bit machine code can help you better grasp how ARM processors access memory and manage data transfers. Experiment with these examples to get a deeper insight into ARM’s low-level operations. 🚀