numpy自分用覚書 - 工作と競馬2

自分用numpyメモ。

スライス
keepdims
reshapeの引数に-1を指定
アダマール積
dot
inner
outer
matmul
pad
repeat
random
empty
zeros_like
ワンホットベクトル - カテゴリ変数の相互変換
- カテゴリ変数 -> ワンホットベクトル
- ワンホットベクトル->カテゴリ変数

import numpy as np
x = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

スライス

>>> z = np.arange(0, 10)

z[start:stop]

start番要素からstop-1番要素まで。

>>> z[3:5]
array([3, 4])

z[start:stop:step]

>>> z[3:7:2]
array([3, 5])

z[start::step]

指定しなければ、先頭または末尾。

>>> z[3::1]
array([3, 4, 5, 6, 7, 8, 9])

keepdims

配列の次元数を保持する。

>>> y1 = np.sum(x, axis=1, keepdims=True)
>>> y1
>>> array([[ 6],
           [15],
           [24]])
>>> y2 = np.sum(x, axis=1)
>>> y2
>>> array([ 6, 15, 24])

reshapeの引数に-1を指定

自動的に、要素数を計算してくれる。当たり前だが、2つ以上指定してはだめ。

>>> x2 = x.reshape(1, -1)
>>> x2
array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])
>>> x3 = x.reshape(-1, 1)
>>> x3
array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])
>>> x4 = x.reshape(-1)
>>> x4
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> x5 = x.reshape(-1, -1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: can only specify one unknown dimension

アダマール積

乗算 * は、要素同士の積(アダマール積)。

dot

ベクトルの内積または行列の積を計算する。内積はドット記号を使うので、dotという名前の関数になっていると思われる。
1次元、2次元配列では、matmulと同じ結果になる。
N次元配列では、matmulと結果が異なることに注意。

>>> y = np.dot(x[0], x[:, 0])
>>> y
30
>>> y = np.dot(x, x)
>>> y
array([[ 30,  36,  42],
       [ 66,  81,  96],
       [102, 126, 150]])

inner

内積を計算する。ベクトルの場合dotと同じ。行列の場合、行同士の内積を並べたもの。

>>> y = np.inner(x, x)
>>> y
array([[ 14,  32,  50],
       [ 32,  77, 122],
       [ 50, 122, 194]])

outer

ベクトルの外積を計算する。

>>> np.outer(x, x)
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 2,  4,  6,  8, 10, 12, 14, 16, 18],
       [ 3,  6,  9, 12, 15, 18, 21, 24, 27],
       [ 4,  8, 12, 16, 20, 24, 28, 32, 36],
       [ 5, 10, 15, 20, 25, 30, 35, 40, 45],
       [ 6, 12, 18, 24, 30, 36, 42, 48, 54],
       [ 7, 14, 21, 28, 35, 42, 49, 56, 63],
       [ 8, 16, 24, 32, 40, 48, 56, 64, 72],
       [ 9, 18, 27, 36, 45, 54, 63, 72, 81]])

matmul

ベクトルの内積または行列の積を計算する。
1次元、2次元配列では、dotと同じ結果になる。
N次元配列(N>2)のときは、第N-2軸と第N-1軸からなる行列の積の配列が得られる。

pad

ある値でパディングを行う。第2引数でパディングのサイズ、形状を指定。

スカラー値を指定

全次元の先頭、末尾に同じだけパディング。

>>> np.pad(x, 2, "constant", constant_values=1)
array([[1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 2, 3, 1, 1],
       [1, 1, 4, 5, 6, 1, 1],
       [1, 1, 7, 8, 9, 1, 1],
       [1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1]])

1つのタプルを指定

各次元の先頭と末尾に対応。

>>> np.pad(x, (1,2), "constant", constant_values=1)
array([[1, 1, 1, 1, 1, 1],
       [1, 1, 2, 3, 1, 1],
       [1, 4, 5, 6, 1, 1],
       [1, 7, 8, 9, 1, 1],
       [1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1]])

2つのタプルのリストを指定

リスト要素が各次元に対応。タプルの要素順はその次元の先頭、末尾パディングサイズに対応。 np.pad(x, [(0,0), (0,0), (pad, pad), (pad, pad)], "constant", constant_values=1)

>>> np.pad(x, [(0,2), (1,3)], "constant", constant_values=1)
array([[1, 1, 2, 3, 1, 1, 1],
       [1, 4, 5, 6, 1, 1, 1],
       [1, 7, 8, 9, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1]])

repeat

繰り返し同じ要素の配列生成する

>>> x.repeat(3, axis=0)
array([[1, 2, 3],
       [1, 2, 3],
       [1, 2, 3],
       [4, 5, 6],
       [4, 5, 6],
       [4, 5, 6],
       [7, 8, 9],
       [7, 8, 9],
       [7, 8, 9]])
>>> x.repeat(3, axis=1)
array([[1, 1, 1, 2, 2, 2, 3, 3, 3],
       [4, 4, 4, 5, 5, 5, 6, 6, 6],
       [7, 7, 7, 8, 8, 8, 9, 9, 9]])

random

np.random.permutation

ランダムに並べて返す。

# 整数をランダムに並べて返す
>>> np.random.permutation(10)
array([9, 1, 5, 8, 3, 0, 4, 7, 2, 6])

# 配列aをランダムに並べて返す
>>> a = [1,2,3,4,5,6]
>>> np.random.permutation(a)
array([3, 4, 6, 2, 1, 5])

np.random.choice

指定した数だけランダムに選んで返す。標準モジュールrandomのchoiceとは引数構成が違うので注意。

# aは1次元のシーケンス(配列)とする

# aから1つの要素を選択
np.random.choice(a)

# aから2個選択
np.random.choice(a, size=2)
# (2,3)のタプルが返り値
np.random.choice(a, size=(2,3))

# 重複有無(replace)
np.random.choice(a, size=3, replace=True) # 重複あり
np.random.choice(a, size=3, replace=False) # 重複なし

# 選択確率(p)
# 長さがaと同じで合計1となるリストにすること
np.random.choice(a, 3, replace=False, p=[0.1, 0.1, 0.3, 0.3, 0.1, 0.1])

np.random.randn

標準正規分布の乱数

>>> np.random.randn()
-0.783296997915273

np.random.random_sample

[0, 1)の一様乱数

>>> np.random.random_sample()
0.17152866495265917

empty

配列を初期化せずに確保する。

# np.empty(shape, dtype=float)
>>> np.empty((3,3))
array([[0.000e+000, 0.000e+000, 0.000e+000],
       [0.000e+000, 0.000e+000, 1.877e-321],
       [0.000e+000, 0.000e+000, 0.000e+000]])
>>> np.empty((3,3), dtype=np.int32)
array([[1697542254, 2037674093,  741550120],
       [ 539765043, 1887007844, 1886272869],
       [1953392942,  220803635,         10]])

zeros_like

引数の配列と同じ形のゼロ配列を作る。

>>> a
array([[0.3720223 , 0.49858174, 0.79575115],
       [0.32627888, 0.57125729, 0.25105339],
       [0.32911995, 0.5271041 , 0.78931682]])
>>> np.zeros_like(a)
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

ワンホットベクトル - カテゴリ変数の相互変換

カテゴリ変数 -> ワンホットベクトル

# カテゴリ変数
a = [0, 1, 2, 1, 0, 2]
ohv = np.identity(3)[a]

↓

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.]])

ワンホットベクトル->カテゴリ変数

ohv = [
    [1, 0, 0],
    [0, 1, 0], 
    [1, 0, 0],
    [1, 0, 0],
    [0, 1, 0], 
    [0, 0, 1], 
]

a = np.argmax(ohv, axis=1)

↓

array([0, 1, 0, 0, 1, 2], dtype=int64)