Bytes vs. Runes
26 January 2025 / 5 min read
Bytes and Runes can be confusing as concepts for someone starting out with Go and coming from another programming language where you donβt have such aliases.
You already encountered them when learning about Numbers and Strings in Go, but letβs have a deeper look and see the differences:
Byte
Definition
- A
byte
in Go is an alias for theuint8
type, which is an unsigned 8-bit integer. - It represents a single byte of data, which can store values from
0
to255
. - The
byte
type is commonly used to handle raw binary data, ASCII characters, or any data that fits within 8 bits.
Declaration
var b byte = 65 // 65 is the ASCII code for 'A'
Underlying Representation
- A
byte
is stored as a single byte (8 bits) in memory. - It is essentially a number, but it can also represent a character in the ASCII table.
Usage
- ASCII Characters: Since ASCII characters are represented by 7 bits, a
byte
is sufficient to store any ASCII character. - Binary Data:
byte
is often used to read or write binary data, such as files or network streams.data := []byte{72, 101, 108, 108, 111} // Represents "Hello" in ASCII
- Slices of Bytes: The
[]byte
type is a slice of bytes, commonly used for string manipulation or I/O operations.str := "Hello" bytes := []byte(str) // Converts string to a slice of bytes
Key Points
- A
byte
is 8 bits wide - It can represent ASCII characters or raw binary data
- It is an alias for
uint8
Rune
Definition
- A
rune
in Go is an alias for theint32
type, which is a 32-bit signed integer. - It represents a Unicode code point, which is a unique number assigned to each character in the Unicode standard.
- A
rune
can represent any character from any language, symbol, or emoji in the Unicode standard.
Declaration
var r rune = 'A' // 'A' is a Unicode code point with value 65
var r2 rune = 'δΈ' // 'δΈ' is a Unicode code point with value 19990
Underlying Representation
- A
rune
is stored as a 32-bit integer (4 bytes) in memory. - It can represent any Unicode code point, which ranges from
0
to0x10FFFF
(1,114,111 in decimal).
Usage
-
Unicode Characters: rune is used to handle characters beyond the ASCII set, such as non-Latin scripts, symbols, and emojis.
var r rune = 'π' // 'π' is a Unicode code point with value 128522
If you try do this with byte:
var b byte = 'π' fmt.Println(b)
You get this error:
cannot use 'π' (untyped rune constant 128522) as byte value in variable declaration (overflows)
Youβd need multiple bytes to represent what can be represented with just one rune.
-
String Iteration: When iterating over a string, Go treats it as a sequence of rune values, not byte values. This ensures proper handling of multi-byte Unicode characters.
str := "Hello, δΈη" for _, r := range str { fmt.Printf("%c ", r) // Prints each character, including multi-byte ones }
-
Slices of Runes: The
[]rune
type is a slice of runes, which can be used for advanced string manipulation.str := "Hello, δΈη" runes := []rune(str) // Converts string to a slice of runes
Key Points
- A
rune
is 32 bits wide. - It represents a Unicode code point.
- It is an alias for
int32
Differences Between Byte and Rune
Feature | Byte (byte ) | Rune (rune ) |
---|---|---|
Type Alias | Alias for uint8 | Alias for int32 |
Size | 8 bits (1 byte) | 32 bits (4 bytes) |
Range | 0 to 255 | 0 to 0x10FFFF (Unicode range) |
Purpose | Represents ASCII characters or raw data | Represents Unicode code points |
Character Handling | Limited to ASCII | Supports all Unicode characters |
Memory Usage | More memory-efficient for ASCII | Requires more memory for Unicode |
String Iteration | Treats strings as a sequence of bytes | Treats strings as a sequence of runes |
Common Use Cases | Binary data, ASCII strings | Multi-language text, emojis, symbols |
Practical implications
String Representation
In Go, a string is essentially a read-only slice of bytes ([]byte
). However, when you iterate over a string, Go automatically converts it to a sequence of rune values to handle multi-byte Unicode characters correctly.
Conversion Between Byte and Rune
You can convert between byte
and rune
, but be cautious:
- Converting a
rune
to abyte
may result in data loss if the rune value exceeds 255. - Converting a
byte
to arune
is safe since all byte values fit within the rune range.
Example: Byte vs Rune in String Iteration
str := "Hello, δΈη"
// Iterating as bytes (may break multi-byte characters)
for i := 0; i < len(str); i++ {
fmt.Printf("%c ", str[i]) // May print garbage for multi-byte characters
}
// Iterating as runes (correctly handles Unicode)
for _, r := range str {
fmt.Printf("%c ", r) // Prints each character correctly
}
Youβd need multiple bytes to represent what can be represented with just one rune.
Hereβs why:
-
A
rune
in Go represents a Unicode code point, which can be any character in the Unicode standard (including multi-byte characters like emojis or non-Latin scripts). -
A
byte
is only 8 bits wide and can represent a maximum of 256 values (0β255), which is sufficient for ASCII characters but not for most Unicode characters. -
Many Unicode characters (e.g., βδΈβ, βπβ) require more than one byte to be represented in UTF-8 encoding (the default encoding for strings in Go). For example:
- The character βδΈβ is represented by 3 bytes in UTF-8.
- The emoji βπβ is represented by 4 bytes in UTF-8.
Thus, youβd need multiple bytes to represent what can be represented with just one rune because a single rune can encapsulate any Unicode character, even if that character requires multiple bytes in its UTF-8 representation.
Example:
str := "π"
fmt.Println(len(str)) // Output: 4 (bytes)
fmt.Println(len([]rune(str))) // Output: 1 (rune)
This demonstrates that the emoji βπβ is represented by 4 bytes but only 1 rune.
When to use byte & rune vs. uint8 & int32
The choice between using byte
and rune
versus uint8
and int32
in Go depends on the semantic meaning you want to convey in your code. While byte
and rune
are aliases for uint8
and int32
, respectively, they are used in different contexts to make your code more readable and expressive.
Context | Use byte or rune | Use uint8 or int32 |
---|---|---|
Text/Character Handling | Use byte for ASCII, rune for Unicode | Not appropriate |
Binary Data | Use byte for raw binary data | Not appropriate |
Numeric Data | Not appropriate | Use uint8 or int32 for pure numbers |
Semantic Clarity | Use byte /rune for text/binary contexts | Use uint8 /int32 for numeric contexts |