vector-pi-rotation article
This commit is contained in:
parent
f2020f7463
commit
efa3391dbd
141
articles/vector-pi-rotation/page.mmd
Normal file
141
articles/vector-pi-rotation/page.mmd
Normal file
@ -0,0 +1,141 @@
|
||||
Title: Optimized Vector Rotation
|
||||
Brief: Specialized rotation methods over Pi and Pi/2 in 2D and 3D.
|
||||
Date: 1699548646
|
||||
Tags: Programming, Zig, Optimization
|
||||
CSS: /style.css
|
||||
|
||||
Came up with some useful optimization for 90 and 180 degree rotations while making a grid walker,
|
||||
below implementations are given, ripped straight from source, lol.
|
||||
|
||||
Compared to generic cos/sin method of rotation it's magnitudes of times less work in many cases, especially if glibc implementation is used.
|
||||
|
||||
Note: Given example assumes coordinate system where Y grows downwards and X to the right.
|
||||
|
||||
## Two dimensions
|
||||
|
||||
```zig
|
||||
pub fn rotateByHalfPiClockwise(self: Self) Self {
|
||||
return .{ .components = .{ -self.y(), self.x() } };
|
||||
}
|
||||
|
||||
pub fn rotateByHalfPiCounterClockwise(self: Self) Self {
|
||||
return .{ .components = .{ self.y(), -self.x() } };
|
||||
}
|
||||
|
||||
pub fn rotateByPi(self: Self) Self {
|
||||
return .{ .components = .{ -self.x(), -self.y() } };
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
## Three dimensions
|
||||
|
||||
```zig
|
||||
pub fn rotateByHalfPiClockwiseAroundAxis(self: Self, axis: Axis3) Self {
|
||||
return .{ .components = switch (axis) {
|
||||
.x => .{ self.x(), -self.z(), self.y() },
|
||||
.y => .{ -self.z(), self.y(), self.x() },
|
||||
.z => .{ -self.y(), self.x(), self.z() },
|
||||
} };
|
||||
}
|
||||
|
||||
pub fn rotateByHalfPiCounterClockwiseAroundAxis(self: Self, axis: Axis3) Self {
|
||||
return .{ .components = switch (axis) {
|
||||
.x => .{ self.x(), self.z(), -self.y() },
|
||||
.y => .{ self.z(), self.y(), -self.x() },
|
||||
.z => .{ self.y(), -self.x(), self.z() },
|
||||
} };
|
||||
}
|
||||
|
||||
pub fn rotateByPiAroundAxis(self: Self, axis: Axis3) Self {
|
||||
return .{ .components = switch (axis) {
|
||||
.x => .{ self.x(), -self.x(), -self.y() },
|
||||
.y => .{ -self.x(), self.y(), -self.z() },
|
||||
.z => .{ -self.x(), -self.y(), self.z() },
|
||||
} };
|
||||
}
|
||||
|
||||
```
|
||||
|
||||
## Generated amd64 assembly
|
||||
Note: Procedure prelude/epilogue is omitted. Zig's calling convention is used, which is roughly equivalent to C's static marked function in effect.
|
||||
|
||||
Note: It's for vectors stored packed for use in SSE, array/separate scalar passing produces worse result, at least when not inlined.
|
||||
|
||||
### rotateByHalfPiClockwise
|
||||
Notice how it's one instruction longer than coutner-clockwise case,
|
||||
so, choice of coordinate system effects costs of particular direction to rotate around.
|
||||
|
||||
```asm
|
||||
vmovlpd qword ptr [rsp], xmm0
|
||||
vmovshdup xmm1, xmm0
|
||||
vpbroadcastd xmm2, dword ptr [rip + .LCPI2_0]
|
||||
vpxor xmm1, xmm1, xmm2
|
||||
vbroadcastss xmm0, xmm0
|
||||
vblendps xmm0, xmm0, xmm1, 1
|
||||
```
|
||||
### rotateByHalfPiCounterClockwise
|
||||
|
||||
```asm
|
||||
vmovlpd qword ptr [rsp], xmm0
|
||||
vpbroadcastd xmm1, dword ptr [rip + .LCPI1_0]
|
||||
vpxor xmm1, xmm0, xmm1
|
||||
vmovshdup xmm0, xmm0
|
||||
vinsertps xmm0, xmm0, xmm1, 16
|
||||
```
|
||||
|
||||
### rotateByPi
|
||||
```asm
|
||||
vmovlpd qword ptr [rsp], xmm0
|
||||
vpermilps xmm0, xmm0, 212
|
||||
vpbroadcastd xmm1, dword ptr [rip + .LCPI3_0]
|
||||
vpxor xmm0, xmm0, xmm1
|
||||
```
|
||||
|
||||
### rotateByHalfPiClockwiseAroundAxis (X)
|
||||
```asm
|
||||
sub rsp, 24
|
||||
vmovq qword ptr [rsp], xmm0
|
||||
vpermilpd xmm1, xmm0, 1
|
||||
vmovaps xmm2, xmm1
|
||||
vmovss dword ptr [rsp + 8], xmm2
|
||||
vpbroadcastd xmm2, dword ptr [rip + .LCPI4_0]
|
||||
vpxor xmm1, xmm1, xmm2
|
||||
vpermilps xmm0, xmm0, 212
|
||||
vinsertps xmm0, xmm0, xmm1, 16
|
||||
add rsp, 24
|
||||
```
|
||||
|
||||
### rotateByHalfPiCounterClockwiseAroundAxis (X)
|
||||
Again, one instruction shorter.
|
||||
|
||||
```asm
|
||||
sub rsp, 24
|
||||
vextractps dword ptr [rsp + 8], xmm0, 2
|
||||
vmovq qword ptr [rsp], xmm0
|
||||
vmovshdup xmm1, xmm0
|
||||
vpbroadcastd xmm2, dword ptr [rip + .LCPI5_0]
|
||||
vpxor xmm1, xmm1, xmm2
|
||||
vpermilps xmm0, xmm0, 232
|
||||
vinsertps xmm0, xmm0, xmm1, 32
|
||||
add rsp, 24
|
||||
```
|
||||
|
||||
### rotateByPiAroundAxis (X)
|
||||
Now it's more work.
|
||||
|
||||
```asm
|
||||
sub rsp, 24
|
||||
vmovq qword ptr [rsp], xmm0
|
||||
vpermilpd xmm1, xmm0, 1
|
||||
vmovaps xmm2, xmm1
|
||||
vmovss dword ptr [rsp + 8], xmm2
|
||||
vmovshdup xmm2, xmm0
|
||||
vbroadcastss xmm3, dword ptr [rip + .LCPI6_0]
|
||||
vpxor xmm2, xmm2, xmm3
|
||||
vpxor xmm1, xmm1, xmm3
|
||||
vinsertps xmm0, xmm0, xmm2, 16
|
||||
vinsertps xmm0, xmm0, xmm1, 32
|
||||
add rsp, 24
|
||||
```
|
||||
|
Loading…
Reference in New Issue
Block a user