Bytes vs. Runes

Bytes and Runes can be confusing as concepts for someone starting out with Go and coming from another programming language where you don’t have such aliases.

You already encountered them when learning about Numbers and Strings in Go, but let’s have a deeper look and see the differences:

Byte

Definition

A byte in Go is an alias for the uint8 type, which is an unsigned 8-bit integer.
It represents a single byte of data, which can store values from 0 to 255.
The byte type is commonly used to handle raw binary data, ASCII characters, or any data that fits within 8 bits.

Declaration

var b byte = 65 // 65 is the ASCII code for 'A'

Underlying Representation

A byte is stored as a single byte (8 bits) in memory.
It is essentially a number, but it can also represent a character in the ASCII table.

Usage

ASCII Characters: Since ASCII characters are represented by 7 bits, a byte is sufficient to store any ASCII character.
Binary Data: byte is often used to read or write binary data, such as files or network streams.
```
data := []byte{72, 101, 108, 108, 111} // Represents "Hello" in ASCII
```
Slices of Bytes: The []byte type is a slice of bytes, commonly used for string manipulation or I/O operations.
```
str := "Hello"
bytes := []byte(str) // Converts string to a slice of bytes
```

Key Points

A byte is 8 bits wide
It can represent ASCII characters or raw binary data
It is an alias for uint8

Rune

Definition

A rune in Go is an alias for the int32 type, which is a 32-bit signed integer.
It represents a Unicode code point, which is a unique number assigned to each character in the Unicode standard.
A rune can represent any character from any language, symbol, or emoji in the Unicode standard.

Declaration

var r rune = 'A' // 'A' is a Unicode code point with value 65
var r2 rune = '世' // '世' is a Unicode code point with value 19990

Underlying Representation

A rune is stored as a 32-bit integer (4 bytes) in memory.
It can represent any Unicode code point, which ranges from 0 to 0x10FFFF (1,114,111 in decimal).

Usage

Unicode Characters: rune is used to handle characters beyond the ASCII set, such as non-Latin scripts, symbols, and emojis.
```
var r rune = '😊' // '😊' is a Unicode code point with value 128522
```
If you try do this with byte:
```
var b byte = '😊'
fmt.Println(b)
```
You get this error:
```
cannot use '😊' (untyped rune constant 128522) as byte value in variable declaration (overflows)
```
You’d need multiple bytes to represent what can be represented with just one rune.
String Iteration: When iterating over a string, Go treats it as a sequence of rune values, not byte values. This ensures proper handling of multi-byte Unicode characters.
```
str := "Hello, 世界"
for _, r := range str {
    fmt.Printf("%c ", r) // Prints each character, including multi-byte ones
}
```
Slices of Runes: The []rune type is a slice of runes, which can be used for advanced string manipulation.
```
str := "Hello, 世界"
runes := []rune(str) // Converts string to a slice of runes
```

Key Points

A rune is 32 bits wide.
It represents a Unicode code point.
It is an alias for int32

Differences Between Byte and Rune

Feature	Byte (`byte`)	Rune (`rune`)
Type Alias	Alias for `uint8`	Alias for `int32`
Size	8 bits (1 byte)	32 bits (4 bytes)
Range	`0` to `255`	`0` to `0x10FFFF` (Unicode range)
Purpose	Represents ASCII characters or raw data	Represents Unicode code points
Character Handling	Limited to ASCII	Supports all Unicode characters
Memory Usage	More memory-efficient for ASCII	Requires more memory for Unicode
String Iteration	Treats strings as a sequence of bytes	Treats strings as a sequence of runes
Common Use Cases	Binary data, ASCII strings	Multi-language text, emojis, symbols

Practical implications

String Representation

In Go, a string is essentially a read-only slice of bytes ([]byte). However, when you iterate over a string, Go automatically converts it to a sequence of rune values to handle multi-byte Unicode characters correctly.

Conversion Between Byte and Rune

You can convert between byte and rune, but be cautious:

Converting a rune to a byte may result in data loss if the rune value exceeds 255.
Converting a byte to a rune is safe since all byte values fit within the rune range.

Example: Byte vs Rune in String Iteration

str := "Hello, 世界"

// Iterating as bytes (may break multi-byte characters)
for i := 0; i < len(str); i++ {
    fmt.Printf("%c ", str[i]) // May print garbage for multi-byte characters
}

// Iterating as runes (correctly handles Unicode)
for _, r := range str {
    fmt.Printf("%c ", r) // Prints each character correctly
}

You’d need multiple bytes to represent what can be represented with just one rune.

Here’s why:

A rune in Go represents a Unicode code point, which can be any character in the Unicode standard (including multi-byte characters like emojis or non-Latin scripts).
A byte is only 8 bits wide and can represent a maximum of 256 values (0–255), which is sufficient for ASCII characters but not for most Unicode characters.
Many Unicode characters (e.g., ‘世’, ’😊’) require more than one byte to be represented in UTF-8 encoding (the default encoding for strings in Go). For example:
- The character ‘世’ is represented by 3 bytes in UTF-8.
- The emoji ’😊’ is represented by 4 bytes in UTF-8.

Thus, you’d need multiple bytes to represent what can be represented with just one rune because a single rune can encapsulate any Unicode character, even if that character requires multiple bytes in its UTF-8 representation.

Example:

str := "😊"
fmt.Println(len(str))         // Output: 4 (bytes)
fmt.Println(len([]rune(str))) // Output: 1 (rune)

This demonstrates that the emoji ’😊’ is represented by 4 bytes but only 1 rune.

When to use byte & rune vs. uint8 & int32

The choice between using byte and rune versus uint8 and int32 in Go depends on the semantic meaning you want to convey in your code. While byte and rune are aliases for uint8 and int32, respectively, they are used in different contexts to make your code more readable and expressive.

Context	Use `byte` or `rune`	Use `uint8` or `int32`
Text/Character Handling	Use `byte` for ASCII, `rune` for Unicode	Not appropriate
Binary Data	Use `byte` for raw binary data	Not appropriate
Numeric Data	Not appropriate	Use `uint8` or `int32` for pure numbers
Semantic Clarity	Use `byte`/`rune` for text/binary contexts	Use `uint8`/`int32` for numeric contexts