Go日記: 『Learning Go』購入して読み始めた。スライスのコピー、Strings and Runes and Bytes

2021年4月30日 11:32

『Learning Go』を読みながらの勉強ノートです。

スライスのコピー

元になるスライスを複数の参照がシェアしていて、値の上書きによる混乱が発生するかもしれない、という場合は copy することで元のスライスから独立したスライスを作ることができる。

x := []int{1, 2, 3}
y := make([]int, len(x))
n := copy(y, x)

copy は戻り値としてコピーした要素数を返すが、必要なければ、

copy(y, x)

だけでよい。

Strings and Runes and Bytes

Go は rune の集まりで string を表現してると思うかもしれないがそうじゃない。Go は任意のバイト列で string を表現している。

These bytes don’t have to be in any particular character encoding, but several Go library functions (and the for-range loop that we discuss in the next chapter) assume that a string is composed of a sequence of UTF-8-encoded code points.

Bodner, Jon. Learning Go (pp.79-80). O'Reilly Media. Kindle 版.

このバイトは特定の文字エンコードに縛られない。だけど、いくつかの Go ライブラリが UTF-8 であることを前提にしている。

文字列に対してもスライス式の記法が使える。

gore> hello := "Hello, Strings!"
"Hello, Strings!"
gore> helloSlice := hello[:]
"Hello, Strings!"

しかし、文字列の一部を書き換えたり、文字列を append で拡張しようとする操作はエラーになる。

gore> helloSlice[0] = "h"
cannot assign to helloSlice[0] (strings are immutable)
gore> helloSlice = append(helloSlice, " and Slices!")
first argument to append must be slice; have string

Since strings are immutable, they don’t have the modification problems that slices of slices do. There is a different problem, though. A string is composed of a sequence of bytes, while a code point in UTF-8 can be anywhere from one to four bytes long.

Bodner, Jon. Learning Go (pp.80-81). O'Reilly Media. Kindle 版.

文字列はイミュータブルなのでスライスのときのような上書きによる問題は起こらない。でも別の問題がある。文字列はバイト列なので UTF-8 のコードポイントによって各文字が識別できるようになっているが、これは1バイトから4バイトまでの長さをとることができる。

ASCII文字のように1文字が1バイトで完結している文字で構成された文字列を扱うだけなら、スライスの記法で狙ったとおりに文字を抜き取れる。しかし、UTF-8 全般の文字が入り込んでくる想定のコードではそうはいかない。

gore> var ohayo string = "おはよう"
"おはよう"
gore> len(ohayo)
12 // not 4

string と rune と byte は「結局みんな数字」という関係性を利用してあれこれできる。が、その事によって、初心者は次のような素朴なミスを起こしやすい。

Warning

A common bug for new Go developers is to try to make an int into a string by using a type conversion:

　var x int = 65
　var y = string( x)
　fmt.Println( y)

This results in y having the value “A,” not “65.”

Bodner, Jon. Learning Go (p.82). O'Reilly Media. Kindle 版.

数値を文字列表現に置き換えるには strconv パッケージの Itoa メソッドを使う。

gore> :import strconv
gore> var i int = 65
65
gore> var iStr string = strconv.Itoa(i)
"65"

文字列をバイト列で表現したものと rune 列で表現したものを比べてみる。

gore> var bOhayo []byte = []byte(ohayo)
[]uint8{
 0xe3, 0x81, 0x8a, 0xe3, 0x81, 0xaf, 0xe3, 0x82, 0x88, 0xe3, 0x81, 0x86,
}
gore> var rOhayo []rune = []rune(ohayo)
[]int32{
 12362,
 12399,
 12424,
 12358,
}

Go で文字列は UTF-8 のバイト列として読み書きされるのがほとんどである。

文字列を扱うときは strings や encoding/utf8 などの標準パッケージから提供される関数の使用を検討すること。

この記事が気に入ったらサポートをしてみませんか？