## Saturday, July 4, 2015

### Relax Laplace! I

In a region with no free charge, the electrostatic potential obeys the Laplace equation:

grad^2 phi = 0

Jacobi iteration is a simple numerical technique to find solutions to problems like this. For example, if you have a 2D equal-spaced cartesian mesh of points, the numerical gradient

(del^2 phi)/(del x^2) + (del^2 phi)/(del y^2) can be approximated as:

(phi[i+1,j] + phi[i-1,j] - 2*phi[i,j]) + (phi[i,j+1] + phi[i,j-1] - 2*phi[i,j])

If that is set to zero, then one can iterate the expression:

next_phi[i,j] = 0.25 * (phi[i+1,j] + phi[i-1,j] +  phi[i,j+1] + phi[i,j-1])

It is slow, but here's a basic example. Scroll down to see cool ways to make it much faster!

Shown as a contour using mostly Matplotlib defaults:

 1 2 CS = ax.contour(phi, origin = 'lower') ax.clabel(CS, inline=1, fontsize=11) 

The script to set up the problem (phido_me, and sphere) and then iterate is quite short. Instead of looping over all the indices in the array as you would in a compiled language like C or FORTRAN, in Python we try to just use existing NumPy methods like np.roll in line 29.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 import numpy as np import matplotlib.pyplot as plt nx, ny = 101, 101 phi = np.zeros((ny, nx), dtype = 'float') do_me = np.ones_like(phi, dtype='bool') x0, y0, r0 = 40, 65, 12 x = np.arange(nx, dtype = 'float')[None,:] y = np.arange(ny, dtype = 'float')[:,None] rsq = (x-x0)**2 + (y-y0)**2 circle = rsq <= r0**2 phi[circle] = 1.0 do_me[circle] = False do_me[0,:], do_me[-1,:], do_me[:,0], do_me[:,-1] = False, False, False, False n, nper = 100, 100 phi_hold = np.zeros((n+1, ny, nx)) phi_hold[0] = phi for i in range(n): for j in range(nper): phi2 = 0.25*(np.roll(phi, 1, axis=0) + np.roll(phi, -1, axis=0) + np.roll(phi, 1, axis=1) + np.roll(phi, -1, axis=1) ) phi[do_me] = phi2[do_me] phi_hold[i+1] = phi change = phi_hold[1:] - phi_hold[:-1] cmin, cmax = [], [] for thing in change: cmin.append(thing[do_me].min()) cmax.append(thing[do_me].max()) 

The actual calculation (lines 26 to 36) takes about 2.2 seconds on my laptop. That's 100*100*101*101 multiplications plus three times that many sums, among other things, or about 6ns per flop. Since a lap-top-CPU-flop is more like about 1 or 1.5ns, there's definitely some room for improvement. (That GPU is just sitting there hungry for work, and could probably do it much faster!)

There's an upcoming post on lap-top-CPU-flops with NumPy; stay tuned!

So, let's make the problem a little harder by making it 3D, Instead of ~10,000 mesh points we'll have ~1,000,000. At 8 bytes per point, that's beyond the cache, so it's going to be slower still.

and two slices below the sphere:

This took 525 seconds. That's 100*100*101*101*101 multiplications plus five summations, or about 9ns per flop. It's creeping up there! ,And we're still changing at about 3E-04 per 100 iterations. The next Relax Laplace! post will cover some ways to bring that back down.

The 3D version of the script for reference:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 nx, ny, nz = 101, 101, 101 phi = np.zeros((nz, ny, nx), dtype = 'float') do_me = np.ones_like(phi, dtype='bool') x0, y0, z0, r0 = 40, 65, 50, 12 x = np.arange(nx, dtype = 'float')[None,None,:] y = np.arange(ny, dtype = 'float')[None,:,None] z = np.arange(ny, dtype = 'float')[:,None,None] rsq = (x-x0)**2 + (y-y0)**2 + (z-z0)**2 sphere = rsq <= r0**2 phi[sphere] = 1.0 do_me[sphere] = False do_me[0,:,:], do_me[-1,:,:] = False, False do_me[:,0,:], do_me[:,-1,:] = False, False do_me[:,:,0], do_me[:,:,-1] = False, False n, nper = 100, 100 phi_hold = np.zeros((n+1, nz, ny, nx)) phi_hold[0] = phi one_sixth = (1./6.) for i in range(n): for j in range(nper): phi2 = one_sixth*(np.roll(phi, 1, axis=0) + np.roll(phi, -1, axis=0) + np.roll(phi, 1, axis=1) + np.roll(phi, -1, axis=1) + np.roll(phi, 1, axis=2) + np.roll(phi, -1, axis=2) ) phi[do_me] = phi2[do_me] print i phi_hold[i+1] = phi