資料競爭檢測器

引言

資料競爭是併發系統中常見且最難除錯的 bug 型別之一。當兩個 goroutine 併發訪問同一個變數，並且其中至少一個訪問是寫入時，就會發生資料競爭。有關詳細資訊，請參閱 Go 記憶體模型。

這是一個可能導致崩潰和記憶體損壞的資料競爭示例

func main() {
	c := make(chan bool)
	m := make(map[string]string)
	go func() {
		m["1"] = "a" // First conflicting access.
		c <- true
	}()
	m["2"] = "b" // Second conflicting access.
	<-c
	for k, v := range m {
		fmt.Println(k, v)
	}
}

用法

為了幫助診斷此類 bug，Go 包含一個內建的資料競爭檢測器。要使用它，請將 -race 標誌新增到 go 命令中

$ go test -race mypkg    // to test the package
$ go run -race mysrc.go  // to run the source file
$ go build -race mycmd   // to build the command
$ go install -race mypkg // to install the package

報告格式

當競爭檢測器在程式中發現數據競爭時，它會列印一份報告。報告包含衝突訪問的堆疊跟蹤，以及涉及的 goroutine 建立時的堆疊。這是一個示例

WARNING: DATA RACE
Read by goroutine 185:
  net.(*pollServer).AddFD()
      src/net/fd_unix.go:89 +0x398
  net.(*pollServer).WaitWrite()
      src/net/fd_unix.go:247 +0x45
  net.(*netFD).Write()
      src/net/fd_unix.go:540 +0x4d4
  net.(*conn).Write()
      src/net/net.go:129 +0x101
  net.func·060()
      src/net/timeout_test.go:603 +0xaf

Previous write by goroutine 184:
  net.setWriteDeadline()
      src/net/sockopt_posix.go:135 +0xdf
  net.setDeadline()
      src/net/sockopt_posix.go:144 +0x9c
  net.(*conn).SetDeadline()
      src/net/net.go:161 +0xe3
  net.func·061()
      src/net/timeout_test.go:616 +0x3ed

Goroutine 185 (running) created at:
  net.func·061()
      src/net/timeout_test.go:609 +0x288

Goroutine 184 (running) created at:
  net.TestProlongTimeout()
      src/net/timeout_test.go:618 +0x298
  testing.tRunner()
      src/testing/testing.go:301 +0xe8

選項

GORACE 環境變數設定競爭檢測器選項。格式為

GORACE="option1=val1 option2=val2"

選項包括

log_path (預設 stderr)：競爭檢測器將其報告寫入名為 log_path.pid 的檔案。特殊名稱 stdout 和 stderr 分別導致報告寫入標準輸出和標準錯誤。
exitcode (預設 66)：檢測到競爭後退出時使用的退出狀態。
strip_path_prefix (預設 "")：從所有報告的檔案路徑中去除此字首，使報告更簡潔。
history_size (預設 1)：每個 goroutine 的記憶體訪問歷史記錄是 32K * 2**history_size elements。增加此值可以避免報告中出現“failed to restore the stack”錯誤，但會增加記憶體使用量。
halt_on_error (預設 0)：控制程式在報告第一個資料競爭後是否退出。
atexit_sleep_ms (預設 1000)：主 goroutine 在退出前睡眠的毫秒數。

示例

$ GORACE="log_path=/tmp/race/report strip_path_prefix=/my/go/sources/" go test -race

排除測試

當您使用 -race 標誌構建時，go 命令定義了額外的構建標籤 race。您可以使用該標籤在執行競爭檢測器時排除某些程式碼和測試。一些示例

// +build !race

package foo

// The test contains a data race. See issue 123.
func TestFoo(t *testing.T) {
	// ...
}

// The test fails under the race detector due to timeouts.
func TestBar(t *testing.T) {
	// ...
}

// The test takes too long under the race detector.
func TestBaz(t *testing.T) {
	// ...
}

如何使用

首先，使用競爭檢測器執行您的測試（go test -race）。競爭檢測器只發現執行時發生競爭，因此它無法發現未執行程式碼路徑中的競爭。如果您的測試覆蓋率不完整，您可以透過在實際工作負載下執行使用 -race 構建的二進位制檔案來發現更多競爭。

典型資料競爭

以下是一些典型的資料競爭。所有這些都可以透過競爭檢測器檢測到。

迴圈計數器上的競爭

func main() {
	var wg sync.WaitGroup
	wg.Add(5)
	var i int
	for i = 0; i < 5; i++ {
		go func() {
			fmt.Println(i) // Not the 'i' you are looking for.
			wg.Done()
		}()
	}
	wg.Wait()
}

函式字面量中的變數 i 與迴圈使用的變數是同一個，因此 goroutine 中的讀取與迴圈增量發生競爭。（此程式通常列印 55555，而不是 01234。）可以透過複製變數來修復該程式

func main() {
	var wg sync.WaitGroup
	wg.Add(5)
	var i int
	for i = 0; i < 5; i++ {
		go func(j int) {
			fmt.Println(j) // Good. Read local copy of the loop counter.
			wg.Done()
		}(i)
	}
	wg.Wait()
}

意外共享的變數

// ParallelWrite writes data to file1 and file2, returns the errors.
func ParallelWrite(data []byte) chan error {
	res := make(chan error, 2)
	f1, err := os.Create("file1")
	if err != nil {
		res <- err
	} else {
		go func() {
			// This err is shared with the main goroutine,
			// so the write races with the write below.
			_, err = f1.Write(data)
			res <- err
			f1.Close()
		}()
	}
	f2, err := os.Create("file2") // The second conflicting write to err.
	if err != nil {
		res <- err
	} else {
		go func() {
			_, err = f2.Write(data)
			res <- err
			f2.Close()
		}()
	}
	return res
}

修復方法是在 goroutine 中引入新變數（注意使用 :=）

			...
			_, err := f1.Write(data)
			...
			_, err := f2.Write(data)
			...

未受保護的全域性變數

如果以下程式碼從多個 goroutine 呼叫，則會導致 service 對映上的競爭。併發讀寫同一個對映是不安全的

var service map[string]net.Addr

func RegisterService(name string, addr net.Addr) {
	service[name] = addr
}

func LookupService(name string) net.Addr {
	return service[name]
}

為了使程式碼安全，請使用互斥鎖保護訪問

var (
	service   map[string]net.Addr
	serviceMu sync.Mutex
)

func RegisterService(name string, addr net.Addr) {
	serviceMu.Lock()
	defer serviceMu.Unlock()
	service[name] = addr
}

func LookupService(name string) net.Addr {
	serviceMu.Lock()
	defer serviceMu.Unlock()
	return service[name]
}

原始未受保護變數

資料競爭也可能發生在原始型別的變數上（bool、int、int64 等），如本例所示

type Watchdog struct{ last int64 }

func (w *Watchdog) KeepAlive() {
	w.last = time.Now().UnixNano() // First conflicting access.
}

func (w *Watchdog) Start() {
	go func() {
		for {
			time.Sleep(time.Second)
			// Second conflicting access.
			if w.last < time.Now().Add(-10*time.Second).UnixNano() {
				fmt.Println("No keepalives for 10 seconds. Dying.")
				os.Exit(1)
			}
		}
	}()
}

即使是這種“無害”的資料競爭也可能導致難以除錯的問題，這些問題是由記憶體訪問的非原子性、與編譯器最佳化衝突或處理器記憶體訪問的重新排序問題引起的。

解決此競爭的典型方法是使用通道或互斥鎖。為了保持無鎖行為，也可以使用 sync/atomic 包。

type Watchdog struct{ last int64 }

func (w *Watchdog) KeepAlive() {
	atomic.StoreInt64(&w.last, time.Now().UnixNano())
}

func (w *Watchdog) Start() {
	go func() {
		for {
			time.Sleep(time.Second)
			if atomic.LoadInt64(&w.last) < time.Now().Add(-10*time.Second).UnixNano() {
				fmt.Println("No keepalives for 10 seconds. Dying.")
				os.Exit(1)
			}
		}
	}()
}

非同步傳送和關閉操作

正如本例所示，同一通道上的非同步傳送和關閉操作也可能導致競爭條件

c := make(chan struct{}) // or buffered channel

// The race detector cannot derive the happens before relation
// for the following send and close operations. These two operations
// are unsynchronized and happen concurrently.
go func() { c <- struct{}{} }()
close(c)

根據 Go 記憶體模型，通道上的傳送發生在從該通道接收完成之前。為了同步傳送和關閉操作，請使用接收操作來保證在關閉之前完成傳送

c := make(chan struct{}) // or buffered channel

go func() { c <- struct{}{} }()
<-c
close(c)

要求

競爭檢測器需要啟用 cgo，在非 Darwin 系統上需要安裝 C 編譯器。競爭檢測器支援 linux/amd64、linux/ppc64le、linux/arm64、linux/s390x、linux/loong64、freebsd/amd64、netbsd/amd64、darwin/amd64、darwin/arm64 和 windows/amd64。

在 Windows 上，競爭檢測器執行時對安裝的 C 編譯器版本很敏感；自 Go 1.21 起，使用 -race 構建程式需要一個包含 mingw-w64 執行時庫版本 8 或更高版本的 C 編譯器。您可以透過使用引數 --print-file-name libsynchronization.a 呼叫 C 編譯器來測試您的 C 編譯器。較新的相容 C 編譯器將列印此庫的完整路徑，而較舊的 C 編譯器將只回顯該引數。

執行時開銷

競爭檢測的開銷因程式而異，但對於典型程式，記憶體使用量可能增加 5-10 倍，執行時間增加 2-20 倍。

競爭檢測器目前為每個 defer 和 recover 語句額外分配 8 位元組。這些額外分配在 goroutine 退出之前不會被回收。這意味著如果您的長時間執行的 goroutine 定期發出 defer 和 recover 呼叫，程式記憶體使用量可能會無限增長。這些記憶體分配不會顯示在 runtime.ReadMemStats 或 runtime/pprof 的輸出中。