Added ARM64 support by MeKo-Christian · Pull Request #10 · templexxx/tsc

MeKo-Christian · 2026-01-17T10:08:48Z

Implements ARM64 support using ARM Generic Timer (CNTVCT_EL0) with hybrid calibration approach and multiple optimized variants.

Core features:

Uses CNTVCT_EL0 virtual counter for high-precision timestamps
Hybrid calibration: reads CNTFRQ_EL0 for frequency, refines with linear regression
Three assembly variants:
- unixNanoARM16B: fast unordered path
- unixNanoARMFMADD: fused multiply-add optimization
- unixNanoARM16Bfence: ISB barriers for strict ordering
Conservative feature detection with Linux clock source fallback

Shared infrastructure changes:

Moved simpleLinearRegression and getClosestTSCSys to tsc.go for cross-platform sharing
Replaced cpu.X86FalseSharingRange with platform-agnostic CacheLineSize constant (64 bytes)
Updated tsc_generic.go build tags to exclude arm64

Implements ARM64 support using ARM Generic Timer (CNTVCT_EL0) with hybrid calibration approach and multiple optimized variants. New files: - tsc_arm64.go: Platform-specific initialization and calibration - tsc_arm64.s: Assembly implementations for counter reading and timestamp calculation - tsc_arm64_test.go: ARM64-specific tests and benchmarks Core features: - Uses CNTVCT_EL0 virtual counter for high-precision timestamps - Hybrid calibration: reads CNTFRQ_EL0 for frequency, refines with linear regression - Three assembly variants: * unixNanoARM16B: fast unordered path * unixNanoARMFMADD: fused multiply-add optimization * unixNanoARM16Bfence: ISB barriers for strict ordering - Conservative feature detection with Linux clock source fallback Shared infrastructure changes: - Moved simpleLinearRegression and getClosestTSCSys to tsc.go for cross-platform sharing - Replaced cpu.X86FalseSharingRange with platform-agnostic CacheLineSize constant (64 bytes) - Updated tsc_generic.go build tags to exclude arm64 Testing: - Builds successfully on Linux ARM64, Darwin ARM64, and AMD64 - Maintains backward compatibility with existing AMD64 code - Test suite covers store/load operations, ordered execution, frequency detection, and benchmarks Documentation: - Added Architecture Support section to README - Updated Key Features and Limitations - Documents ARM64-specific implementation details

Copilot

Pull request overview

This pull request adds comprehensive ARM64 architecture support to the TSC library, enabling high-performance timestamp generation using ARM Generic Timer (CNTVCT_EL0). The implementation includes a hybrid calibration approach, multiple optimized assembly variants for different performance/ordering trade-offs, and extensive refactoring to share calibration logic between AMD64 and ARM64.

Changes:

Implements ARM64 support with three optimized assembly variants (unordered fast path, FMADD optimization, and ISB-fenced strict ordering)
Refactors calibration infrastructure (simpleLinearRegression, getClosestTSCSys) into shared tsc.go for cross-platform use
Introduces platform-agnostic CacheLineSize constant replacing architecture-specific constants
Adds comprehensive CI/CD workflows for multi-architecture testing including native ARM64 macOS and QEMU-based Linux ARM64
Updates tooling (longdrift, calibrate) and adds development infrastructure (justfile, treefmt, golangci-lint config)

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
tsc_arm64.go	Core ARM64 implementation with hardware detection and calibration
tsc_arm64.s	Assembly implementations of counter reading and computation variants
tsc_arm64_test.go	ARM64-specific unit tests and benchmarks
tsc.go	Refactored shared calibration functions and platform-agnostic constants
tsc_generic.go	Updated build tags to exclude arm64
tsc_amd64.go	Refactored to use shared calibration functions
tsc_amd64_test.go	Code cleanup and style updates
tsc_test.go	Enhanced drift testing with race detector awareness
tools/longdrift/main.go	Refactored flag handling and improved code organization
tools/calibrate/main.go	Refactored into smaller functions for better maintainability
race_enabled.go / race_disabled.go	Build tag detection for race detector
treefmt.toml	Code formatting configuration
.golangci.toml	Linter configuration
justfile	Build and test automation recipes
.github/workflows/*.yaml	Multi-architecture CI/CD pipelines
README.md	Updated documentation with ARM64 support details
examples/with-calibration.go	Code cleanup and modernization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-20T11:41:20Z

+}
+
+// Out-of-Order test, GetInOrder should be in order as we assume.
+func TestGetInOrder(t *testing.T) {


Missing t.Parallel() call. This test should include t.Parallel() for consistency with other test files and to enable parallel test execution.

Copilot · 2026-01-20T11:41:21Z

+	}
+}
+
+func TestReadCounterFrequency(t *testing.T) {


Missing t.Parallel() call. This test should include t.Parallel() for consistency with other test files and to enable parallel test execution.

Copilot · 2026-01-20T11:41:21Z

+}
+
+func printInitialResults(cost time.Duration, avgCoeff, avgFreq float64) {
 	cpuFlag := fmt.Sprintf("%s_%d", cpu.X86.Signature, cpu.X86.SteppingID)


The code references cpu.X86.Signature and cpu.X86.SteppingID, which will not work correctly on ARM64 platforms. Since this tool should now support ARM64 (as indicated by the PR), this should be updated to handle both architectures appropriately, or use a platform-agnostic way to identify the CPU.

Suggested change

cpuFlag := fmt.Sprintf("%s_%d", cpu.X86.Signature, cpu.X86.SteppingID)

var cpuFlag string

switch runtime.GOARCH {

case "amd64", "386":

cpuFlag = fmt.Sprintf("%s_%d", cpu.X86.Signature, cpu.X86.SteppingID)

default:

cpuFlag = runtime.GOARCH

}

Copilot · 2026-01-20T11:41:21Z

+timestamps using the processor's Time Stamp Counter register. It's 6-10x faster
+than and can significantly improve performance for time-sensitive
+applications. `time.Now().UnixNano()`


Grammar issue: The sentence structure is awkward. The phrase "It's 6-10x faster than and can significantly improve performance" is incomplete. It should read "It's 6-10x faster than time.Now().UnixNano() and can significantly improve performance" with the backticked text moved before "and".

Suggested change

timestamps using the processor's Time Stamp Counter register. It's 6-10x faster

than and can significantly improve performance for time-sensitive

applications. `time.Now().UnixNano()`

timestamps using the processor's Time Stamp Counter register. It's 6-10x faster than

`time.Now().UnixNano()` and can significantly improve performance for time-sensitive

applications.

Copilot · 2026-01-20T11:41:22Z

 	if IsOutOfOrder() {
 		if cpu.X86.HasFMA {
 			start := GetInOrder()
-			for i := 0; i < 1000; i++ {
+
+			for range 1000 {
 				_ = unixNanoTSCFMA()
 			}
+
 			fmaCost := GetInOrder() - start
 			start = GetInOrder()
-			for i := 0; i < 1000; i++ {
+
+			for range 1000 {
 				_ = unixNanoTSC16B()
 			}
+
 			tscCost := GetInOrder() - start
 			if fmaCost < tscCost {
 				UnixNano = unixNanoTSCFMA
 			}
 		}
+
 		UnixNano = unixNanoTSC16B
+
 		return true


Logic error: when FMA is available but not faster, UnixNano is set to unixNanoTSC16B, but then immediately overwritten with unixNanoTSC16B again on line 49. The code is missing a return statement or else clause after line 47. This means UnixNano will always be set to unixNanoTSC16B when IsOutOfOrder() is true, even when FMA is faster. The intended logic should prevent falling through to line 49 when FMA cost evaluation happens.

Copilot · 2026-01-20T11:41:22Z

+		}
+	}
+	if cnt > 0 {
+		t.Fatal(fmt.Sprintf("GetInOrder is not in order: %d aren't in order", cnt))


The test uses fmt.Sprintf unnecessarily when t.Fatalf can handle formatting directly. Replace 't.Fatal(fmt.Sprintf(...))' with 't.Fatalf(...)' for cleaner code.

Copilot · 2026-01-20T11:41:22Z

+	"github.com/templexxx/tsc/internal/xbytes"
+)
+
+func TestStoreOffsetCoeff(t *testing.T) {


Missing t.Parallel() call. The tests in tsc_test.go and tsc_amd64_test.go have been updated to include t.Parallel() for tests that can run in parallel. This test should also include t.Parallel() for consistency and to enable parallel test execution.

templexxx · 2026-01-20T11:49:46Z

-	// Using false sharing range as aligned size & total size for avoiding cache pollution.
-	OffsetCoeff     = xbytes.MakeAlignedBlock(cpu.X86FalseSharingRange, cpu.X86FalseSharingRange)
+	// Using cache line size for alignment to avoid cache pollution.
+	OffsetCoeff     = xbytes.MakeAlignedBlock(CacheLineSize, CacheLineSize)


cpu.X86FalseSharingRange is 128Bytes, for reducing L2 Cache pollution

templexxx · 2026-01-20T11:55:21Z

could u add the benckmark result on your ARM machine? that would be helpful, thanks

MeKo-Christian added 2 commits January 17, 2026 10:58

fix: lint issue fixed

4c13128

templexxx requested a review from Copilot January 20, 2026 11:37

Copilot started reviewing on behalf of templexxx January 20, 2026 11:37 View session

Copilot AI reviewed Jan 20, 2026

View reviewed changes

templexxx reviewed Jan 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added ARM64 support#10

Added ARM64 support#10
MeKo-Christian wants to merge 2 commits intotemplexxx:masterfrom
MeKo-Christian:master

MeKo-Christian commented Jan 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

Copilot AI Jan 20, 2026

Uh oh!

templexxx Jan 20, 2026

Uh oh!

templexxx Jan 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-	cpuFlag := fmt.Sprintf("%s_%d", cpu.X86.Signature, cpu.X86.SteppingID)
+	var cpuFlag string
+	switch runtime.GOARCH {
+	case "amd64", "386":
+		cpuFlag = fmt.Sprintf("%s_%d", cpu.X86.Signature, cpu.X86.SteppingID)
+	default:
+		cpuFlag = runtime.GOARCH
+	}

Conversation

MeKo-Christian commented Jan 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

templexxx Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

templexxx Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants