List vs numpy array¶

Germain Salvato Vallverdu¶

In [9]:

import numpy as np
import math as m

Utilisation d'une fonction¶

In [3]:

def np_func(x):
    return (3 * x ** 2 + 2 * x - 1) * np.exp(- x / 2.3) * np.sin(2 * x)

def m_func(x):
    return (3 * x ** 2 + 2 * x - 1) * m.exp(- x / 2.3) * m.sin(2 * x)

In [5]:

x = np.linspace(0, 10, 1000)
xl = x.tolist()

In [7]:

%timeit y = np_func(x)

37.7 µs ± 815 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]:

%%timeit
y = list()
for xi in xl:
    yi = m_func(xi)
    y.append(yi)

642 µs ± 6.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [32]:

642 / 37.7

Out[32]:

17.02917771883289

Produit scalaire¶

Ou produit de matrices ou matrices-vecteurs ou opération sur des vecteurs (sommes, produit) en général.

In [14]:

x = np.random.random(1000)
y = np.random.random(1000)

In [15]:

%timeit np.dot(x, y)

1.38 µs ± 18.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [16]:

xl = x.tolist()
yl = y.tolist()

In [17]:

%%timeit
scal = 0
for xi, yi in zip(xl, yl):
    scal += xi * yi

68.3 µs ± 719 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [33]:

68.3 / 1.38

Out[33]:

49.492753623188406

Centre de gravité¶

In [36]:

coords = np.random.uniform(0, 30, size=(100, 3))
weight = np.random.random(100)
lcoords = coords.tolist()
lweight = weight.tolist()

In [37]:

%%timeit
w_coords = coords * weight[:, np.newaxis]
G = w_coords.sum(axis=0) / len(w_coords)

10.4 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [38]:

w_coords = coords * weight[:, np.newaxis]
G = w_coords.sum(axis=0) / len(w_coords)
print(G)

[7.81686019 8.63915013 7.2122051 ]

In [39]:

%%timeit
G = [0, 0, 0]
for w_i, coords_i in zip(lweight, lcoords):
    w_coords_i = [w_i * xi for xi in coords_i]
    for i in range(3):
        G[i] += w_coords_i[i]

nat = len(lweight)
for i in range(3):
    G[i] /= nat

106 µs ± 3.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [35]:

107 / 10.4

Out[35]:

10.288461538461538

In [40]:

G = [0, 0, 0]
for w_i, coords_i in zip(lweight, lcoords):
    w_coords_i = [w_i * xi for xi in coords_i]
    for i in range(3):
        G[i] += w_coords_i[i]

nat = len(lweight)
for i in range(3):
    G[i] /= nat

print(G)

[7.816860188194455, 8.639150131333633, 7.2122051018216045]

Matrice de distances¶

Un peu plus sioux ... calculer toutes les distances entre atomes. En ajoutant un axe dans le tableau numpy il fait le calcul de toutes les opérations.

In [66]:

coords = np.random.uniform(0, 30, (5, 3))

Je découpe un peu l'ajout des axes. L'idée c'est d'avoir quelque chose de 5 x 5 à la fin (la matrice).

In [58]:

print(coords[:, np.newaxis, :].shape)
print(coords[:, np.newaxis, :])

(5, 1, 3)
[[[23.40646835 18.6866633   2.21638428]]

 [[ 4.66508894  1.69156122 15.23312689]]

 [[14.324572    9.34691585 20.53275304]]

 [[18.0876806  19.05955999 28.26381997]]

 [[10.03703844 21.41928076  6.58185643]]]

In [60]:

print(coords[np.newaxis, :, :].shape)
print(coords[np.newaxis, :, :])

(1, 5, 3)
[[[23.40646835 18.6866633   2.21638428]
  [ 4.66508894  1.69156122 15.23312689]
  [14.324572    9.34691585 20.53275304]
  [18.0876806  19.05955999 28.26381997]
  [10.03703844 21.41928076  6.58185643]]]

In [61]:

rij = coords[:, np.newaxis, :] - coords[np.newaxis, :, :]
print(rij.shape)
print(rij)

(5, 5, 3)
[[[  0.           0.           0.        ]
  [ 18.74137941  16.99510209 -13.01674261]
  [  9.08189635   9.33974745 -18.31636877]
  [  5.31878775  -0.37289669 -26.04743569]
  [ 13.36942991  -2.73261746  -4.36547215]]

 [[-18.74137941 -16.99510209  13.01674261]
  [  0.           0.           0.        ]
  [ -9.65948306  -7.65535464  -5.29962616]
  [-13.42259166 -17.36799878 -13.03069308]
  [ -5.3719495  -19.72771955   8.65127046]]

 [[ -9.08189635  -9.33974745  18.31636877]
  [  9.65948306   7.65535464   5.29962616]
  [  0.           0.           0.        ]
  [ -3.76310861  -9.71264414  -7.73106693]
  [  4.28753355 -12.07236491  13.95089662]]

 [[ -5.31878775   0.37289669  26.04743569]
  [ 13.42259166  17.36799878  13.03069308]
  [  3.76310861   9.71264414   7.73106693]
  [  0.           0.           0.        ]
  [  8.05064216  -2.35972077  21.68196354]]

 [[-13.36942991   2.73261746   4.36547215]
  [  5.3719495   19.72771955  -8.65127046]
  [ -4.28753355  12.07236491 -13.95089662]
  [ -8.05064216   2.35972077 -21.68196354]
  [  0.           0.           0.        ]]]

Du coup le calcul des distances avec numpy ça donne :

In [64]:

rij2 = (coords[:, np.newaxis, :] - coords[np.newaxis, :, :]) ** 2
d = np.sum(rij2, axis=2) ** 0.5
print(d)

[[ 0.         28.45186084 22.47667877 26.58754335 14.3271142 ]
 [28.45186084  0.         13.4162627  25.526698   22.20101891]
 [22.47667877 13.4162627   0.         12.97173228 18.94076173]
 [26.58754335 25.526698   12.97173228  0.         23.24841208]
 [14.3271142  22.20101891 18.94076173 23.24841208  0.        ]]

Avec plus d'atomes

In [74]:

coords = np.random.uniform(0, 30, (100, 3))
lcoords = coords.tolist()

In [75]:

%%timeit
rij2 = (coords[:, np.newaxis, :] - coords[np.newaxis, :, :]) ** 2
d = np.sum(rij2, axis=2) ** 0.5

378 µs ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [76]:

%%timeit
nat = len(lcoords)
distances = [[0. for iat in range(nat)] for jat in range(nat)]
for iat in range(nat):
    for jat in range(iat + 1, nat):
        ix = lcoords[iat]
        jx = lcoords[jat]
        rij2 = [(ix[i] - jx[i]) **2 for i in range(3)]
        d2 = sum(rij2)
        distances[iat][jat] = m.sqrt(d2)
        distances[jat][iat] = distances[iat][jat]

7.94 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [77]:

7940 / 378

Out[77]:

21.005291005291006

Tu noteras qu'en plus, dans ce cas, l'implémentation numpy fait le double de calculs que l'implémentation standard... Dans le calcul numpy tu calcules la distance i-j et j-i alors que tu évites de faire ça dans l'implementation standard.

In [78]:

%%timeit
# en calculant toutes les distances
nat = len(lcoords)
distances = [[0. for iat in range(nat)] for jat in range(nat)]
for iat in range(nat):
    for jat in range(nat):
        ix = lcoords[iat]
        jx = lcoords[jat]
        rij2 = [(ix[i] - jx[i]) **2 for i in range(3)]
        d2 = sum(rij2)
        distances[iat][jat] = m.sqrt(d2)

14.6 ms ± 52.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

C'est 2 fois plus long ... normal tu as le double de calcul. Du coup numpy est environ 40 fois plus rapide.

In [79]:

14.6e3 / 378

Out[79]:

38.62433862433863