SIMD型について

概要

新しいプリミティブ型であるSIMD型及びAPIがV8で実装されてきている。
SIMDとは、複数の数値を並べて１つの値としたようなデータ型である。
これはCPUによって効率良くサポートされているデータ型であり、
1 + 2 -> 3 をするように [ 1, 2, 3, 4 ] + [ 2, 3, 4, 5 ] -> [ 3, 5, 7, 9 ] を1回の演算ですることができる。
つまり、沢山の数値を扱う場面でSIMD型を利用することで、何倍ものパフォーマンス向上が期待できる。
（※WASMに入ることとなり、ESからは一旦取り除かれました。）

実装される型

float32x4

　32bit浮動小数点型を4つ並べた128bitのデータ型
　float32はJSの通常のnumberであるところのfloat64より精度が低い

int32x4

　32bit符号付き整数型を4つ並べた128bitのデータ型
　各レーンが表せる数値の範囲は -2,147,483,648 ～ 2,147,483,647

int16x8

　各レーンが表せる数値の範囲は -32,768 ～ 32,767

int8x16

　各レーンが表せる数値の範囲は -128 ～ 127

uint32x4

　32bit符号無し整数型を4つ並べた128bitのデータ型
　各レーンが表せる数値の範囲は 0 ～ 4,294,967,295

uint16x8

　各レーンが表せる数値の範囲は 0 ～ 65,535

uint8x16

　各レーンが表せる数値の範囲は 0 ～ 255

bool32x4

　4個の真偽値を並べた128bitのデータ型

bool16x8

　8個の真偽値を並べた128bitのデータ型

bool8x16

　16個の真偽値を並べた128bitのデータ型

実装されるコンストラクタ　（%SIMDConstructor%）

Foating point SIMD constructors　

　SIMD.Float32x4( l1, l2, l3, l4 )

　　各レーンに l1、l2、l3、l4 の数値(float32)を持った新しいfloat32x4型の値を作る関数

Signed integer SIMD constructors　

　SIMD.Int32x4( l1, l2, l3, l4 )

　SIMD.Int16x8( l1, l2, l3, ......, l8 )

　SIMD.Int8x16( l1, l2, l3, ......, l16 )

Unsigned integer SIMD constructors　

　SIMD.Uint32x4( l1, l2, l3, l4 )

　SIMD.Uint16x8( l1, l2, l3, ......, l8 )

　SIMD.Uint8x16( l1, l2, l3, ......, l16 )

Boolean SIMD constructors　

　SIMD.Bool32x4( l1, l2, l3, l4 )

　SIMD.Bool16x8( l1, l2, l3, ......, l8 )

　SIMD.Bool8x16( l1, l2, l3, ......, l16 )

実装されるメソッド

SIMD.%SIMDConstructor%.～

splat( n )

　SIMD.%SIMDConstructor%( n, n, n, ...... ) と同じ

check( s )

　s が %SIMDConstructor% と対応する型ならそのまま返し、そうでなければ例外を投げる

extractLane( s, l )

　s の l 番目レーンの数値を返す

replaceLane( s, l, n )

　s の l 番目レーンを n にしたSIMD型値を返す

neg( s )

　各レーンの符号を反転させたSIMD型値を返す

abs( s )

　絶対値

sqrt( s )

　平方根

reciprocalApproximation( s )

　逆数

reciprocalSqrtApproximation( s )

　平方根の逆数

add( s1, s2 )

　【 s1 + s2 】加算

sub( s1, s2 )

　【 s1 - s2 】減算

addSaturate( s1, s2 )

　加算結果がSIMD型値の最大値を超えたり、最小値を下回った場合もエラーにならず、
　それぞれ最大値、最小値となる

subSaturate( s1, s2 )

　「 addSaturate( s1, s2 ) 」の減算版

mul( s1, s2 )

　【 s1 * s2 】乗算

div( s1, s2 )

　【 s1 / s2 】除算

min( s1, s2 )

　各レーンの小さい方をとったSIMD型値を返す
　レーンのどちらかが NaN の場合は NaN になる

max( s1, s2 )

　「 min( s1, s2 ) 」の大きい方版

minNum( s1, s2 )

　各レーンの小さい方をとったSIMD型値を返す
　レーンの片方が NaN の場合はもう片方が採用される

maxNum( s1, s2 )

　「 minNum( s1, s2 ) 」の大きい方版

or( s1, s2 )

　【 s1 | s2 】各レーンでORビット演算を行い、その結果のSIMD型値を返す

xor( s1, s2 )

　【 s1 ^ s2 】

not( s )

　【 ~s 】

anyTrue( s )

　いずれかのレーンが true であるかの真偽値を返す

allTrue( s )

　全てのレーンが true であるかの真偽値を返す

shiftLeftByScalar( s1, s2 )

　【 s1 << s2 】

shiftRightByScalar( s1, s2 )

　【 s1 >> s2 】

　【 s1 >>> s2 】

lessThan( s1, s2 )

　【 s1 < s2 】各レーン s1 が s2 より小さいか調べ、各レーンがその結果の真偽値となっているBoolean SIMD型値を返す
　返されるBoolean SIMD型値は s1 や s2 と同じ構造で、例えばこの関数をInt16x4型値に適応するとBool16x4型値が返される

lessThanOrEqual( s1, s2 )

　【 s1 <= s2 】

greaterThan( s1, s2 )

　【 s1 > s2 】

greaterThanOrEqual( s1, s2 )

　【 s1 >= s2 】

equal( s1, s2 )

　【 s1 == s2 】

notEqual( s1, s2 )

　【 s1 != s2 】

swizzle( s, l1, l2, l3, ...... )

　s から任意のレーンを抜き出した新たなSIMD型値を返す
　例えば4レーンのSIMD型値において｢ swizzle( s, 0, 0, 0, 0 ) 」なら、
　4レーンとも s の最初のレーンの値を持ったSIMD型値が返される

shuffle( s1, s2, l1, l2, l3, ...... )

　s1 または s2 から任意のレーンを抜き出した新たなSIMD型値を返す
　例えば4レーンのSIMD型値において｢ shuffle( s1, s2, 0, 0, 4, 4 ) 」なら、
　0,1レーン目は s1 の最初のレーンの値で、2,3レーン目は s2 の最初のレーンの値を持ったSIMD型値が返される

fromFloat32x4( s )

　Float32x4 －> other SIMD
　Float32x4型値 s の各レーンを数値として読み取り、各レーンに同等の値をもつSIMD型値を作る
　例えば「 Int32x4.fromFloat32x4( Float32x4( -1.4, -1, 2, 2.6 ) ) 」は「 Int32x4( -1, -1, 2, 2 ) 」となる
　（Signed integer SIMD型値が作られる時は、各レーンの元となる数値は0方向に切り詰められる）

fromInt32x4( s )

　Int32x4 －> other SIMD

fromUint32x4( s )

　Uint32x4 －> other SIMD

fromFloat32x4Bits( s )

　Float32x4 ～> other SIMD
　s のビット配列をそのままコピーし、対象の構造に当てはめた新たなSIMD型値を作る
　例えば「 Int16x8.fromFloat32x4Bits( Float32x4( 0, 0, 0, 1 ) ) 」は「 Int16x8( 0, 0, 0, 0, 0, 0, 0, 16256 ) 」となる
　1(float32) == 00000000 00000000 10000000 00111111(ビット配列8)
　　　　　　== 0000000000000000 0011111110000000(ビット配列16) == 0 16256(int16)

fromInt32x4Bits( s )

　Int32x4 ～> other SIMD

fromUint32x4Bits( s )

　Uint32x4 ～> other SIMD

fromInt16x8Bits( s )

　Int16x8 ～> other SIMD

fromUint16x8Bits( s )

　Uint16x8 ～> other SIMD

fromInt8x16Bits( s )

　Int8x16 ～> other SIMD

fromUint8x16Bits( s )

　Uint8x16 ～> other SIMD

load( ta, i )

　型付配列 ta のインデックス i から対象のSIMD全レーン分(16byte)を読み取り、新たなSIMD型値として返す

load1( ta, i )

　1レーン分(4byte)を読み取る

load2( ta, i )

　2レーン分(8byte)を読み取る

load3( ta, i )

　3レーン分(12byte)を読み取る

store( ta, i, s )

　型付配列 ta のインデックス i から対象のSIMD全レーン分(16byte)を書き込む

store1( ta, i, s )

　1レーン分(4byte)を書き込む

store2( ta, i, s )

　2レーン分(8byte)を書き込む

store3( ta, i, s )

　3レーン分(12byte)を書き込む

SIMD型の扱い

作る

SIMDオブジェクト下の各コンストラクタを使って作る。

s1 = SIMD.Float32x4(1, 2, 3, 4)

比較する

プリミティブ型であるため、内容が同じ値同士であれば、同値とされる。

s2 = SIMD.Float32x4(1, 2, 3, 4)

s1 === s2  // true

型情報

文字列化された時やtypeof演算子はSIMD型だと分かる値を返す。

typeof s1  // "float32x4"

'' + s1    // "SIMD.Float32x4(1, 2, 3, 4)"

型付配列の読み書き

SIMDは型付配列の読み書きにおいて真価を発揮する。
型付配列の各要素に同じような演算を適用したい場合、SIMD型を利用することで4～16要素まとめて演算ができ、当該部分の高速化が見込める。

例えばUint8Arrayのaとbの和をcに書き込みたい場合、従来のやり方はこう

for (let i = 0; i < c.length; i++) {

  c[i] = a[i] + b[i]

}

これがSIMDを使うとこうなる

let {load, store, add} = SIMD.Uint8x16

for (let i = 0; i < c.length; i += 16) {

  store( c, i, add( load(a, i), load(b, i) ) )

}

もし型付配列のサイズがレーンと合わない時は、load{1,2,3}とstore{1,2,3}を使って余りの部分を調整する。

実装されたバージョン

V8　4.6.10（Float32x4と少しのメソッド）　4.6.49（他のSIMDConstructor）　4.6.56（沢山のメソッド）　4.7.6（Unsigned integer SIMDsと幾つかのメソッド）　4.7.12（load系メソッド）　4.7.13（store系メソッド）

参考外部リンク

V8実装の元となる仕様（この記事の現在のサポート：v0.9.2）

概要

実装される型

float32x4

int32x4

int16x8

int8x16

uint32x4

uint16x8

uint8x16

bool32x4

bool16x8

bool8x16

実装されるコンストラクタ （%SIMDConstructor%）

Foating point SIMD constructors

SIMD.Float32x4( l1, l2, l3, l4 )

Signed integer SIMD constructors

SIMD.Int32x4( l1, l2, l3, l4 )

SIMD.Int16x8( l1, l2, l3, ......, l8 )

SIMD.Int8x16( l1, l2, l3, ......, l16 )

Unsigned integer SIMD constructors

SIMD.Uint32x4( l1, l2, l3, l4 )

SIMD.Uint16x8( l1, l2, l3, ......, l8 )

SIMD.Uint8x16( l1, l2, l3, ......, l16 )

Boolean SIMD constructors

SIMD.Bool32x4( l1, l2, l3, l4 )

SIMD.Bool16x8( l1, l2, l3, ......, l8 )

SIMD.Bool8x16( l1, l2, l3, ......, l16 )

実装されるメソッド

SIMD.%SIMDConstructor%.～

splat( n )

check( s )

extractLane( s, l )

replaceLane( s, l, n )

neg( s )

abs( s )

sqrt( s )

reciprocalApproximation( s )

reciprocalSqrtApproximation( s )

add( s1, s2 )

sub( s1, s2 )

addSaturate( s1, s2 )

subSaturate( s1, s2 )

mul( s1, s2 )

div( s1, s2 )

min( s1, s2 )

max( s1, s2 )

minNum( s1, s2 )

maxNum( s1, s2 )

or( s1, s2 )

xor( s1, s2 )

not( s )

anyTrue( s )

allTrue( s )

shiftLeftByScalar( s1, s2 )

shiftRightByScalar( s1, s2 )

lessThan( s1, s2 )

lessThanOrEqual( s1, s2 )

greaterThan( s1, s2 )

greaterThanOrEqual( s1, s2 )

equal( s1, s2 )

notEqual( s1, s2 )

swizzle( s, l1, l2, l3, ...... )

shuffle( s1, s2, l1, l2, l3, ...... )

fromFloat32x4( s )

fromInt32x4( s )

fromUint32x4( s )

fromFloat32x4Bits( s )

fromInt32x4Bits( s )

fromUint32x4Bits( s )

fromInt16x8Bits( s )

fromUint16x8Bits( s )

fromInt8x16Bits( s )

fromUint8x16Bits( s )

load( ta, i )

load1( ta, i )

load2( ta, i )

load3( ta, i )

store( ta, i, s )

store1( ta, i, s )

store2( ta, i, s )

実装されるコンストラクタ　（%SIMDConstructor%）

Foating point SIMD constructors　

　SIMD.Float32x4( l1, l2, l3, l4 )

Signed integer SIMD constructors　

　SIMD.Int32x4( l1, l2, l3, l4 )

　SIMD.Int16x8( l1, l2, l3, ......, l8 )

　SIMD.Int8x16( l1, l2, l3, ......, l16 )

Unsigned integer SIMD constructors　

　SIMD.Uint32x4( l1, l2, l3, l4 )

　SIMD.Uint16x8( l1, l2, l3, ......, l8 )

　SIMD.Uint8x16( l1, l2, l3, ......, l16 )

Boolean SIMD constructors　

　SIMD.Bool32x4( l1, l2, l3, l4 )

　SIMD.Bool16x8( l1, l2, l3, ......, l8 )

　SIMD.Bool8x16( l1, l2, l3, ......, l16 )