Polymorphic Parallel Prefix by Jiahao Chen is licensed under a Creative Commons Attribution 4.0 International License.
This notebook accompanies the workshop paper by Jiahao Chen and Alan Edelman entitled "Parallel Prefix Polymorphism Permits Parallelization, Presentation & Proof", Proceedings of the First Workshop for High Performance Technical Computing in Dynamic Languages, held in conjunction with SC14: The International Conference on High Performance Computing, Networking, Storage and Analysis.
The latest version of this notebook can be downloaded from https://github.com/jiahao/ijulia-notebooks/blob/master/2014-08-06-parallel-prefix.ipynb.
Tested to work on Julia v0.6.0.
#Uncomment these lines to install the necessary packages
#Pkg.update()
#map(Pkg.add, ("BenchmarkTools", Cairo", "Compose", "Gadfly", "Interact", "ProgressMeter"));
using BenchmarkTools, Compose, Gadfly, Interact, ProgressMeter
WARNING: using ProgressMeter.Progress in module Main conflicts with an existing identifier. WARNING: using ProgressMeter.update! in module Main conflicts with an existing identifier.
reduce()
¶Reduction applies an operator to a vector repeatedly to return a number.
reduce(+, 1:8)
36
#Result is the same as
sum(1:8)
36
reduce(*, 1:6)
720
#Result is the same as
prod(1:6)
720
boring_left(a, b)=a
reduce(boring_left, 1:8)
1
boring_right(a, b)=b
reduce(boring_right, 1:8)
8
You can also use reduce to compute Fibonacci numbers using their recurrences.
$$\begin{pmatrix} f_2 \\ f_1 \end{pmatrix} = \begin{pmatrix} f_1 + f_0 \\ f_1 \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} f_1 \\ f_0 \end{pmatrix} $$$$\begin{pmatrix} f_{n+1} \\ f_n \end{pmatrix} = \dots = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}^n \begin{pmatrix} f_1 \\ f_0 \end{pmatrix} $$From this, you can show that
$$\begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix}^n = \begin{pmatrix} f_{n+1} & f_n \\ f_n & f_{n-1} \end{pmatrix} $$fib(j)=reduce(*, [[1 1; 1 0] for i=1:j])[1,2]
map(fib, [1, 2, 3, 4, 5, 11, 100])
7-element Array{Int64,1}: 1 1 2 3 5 89 3736710778780434371
You can solve recurrences of any complexity using reduce
. For example, reduce
can compute a Hadamard matrix from its definition in terms of its submatrices:
and so on.
Hadamard(n)=reduce(kron, [[1 1; 1 -1] for i=1:n])
Hadamard(3)
8×8 Array{Int64,2}: 1 1 1 1 1 1 1 1 1 -1 1 -1 1 -1 1 -1 1 1 -1 -1 1 1 -1 -1 1 -1 -1 1 1 -1 -1 1 1 1 1 1 -1 -1 -1 -1 1 -1 1 -1 -1 1 -1 1 1 1 -1 -1 -1 -1 1 1 1 -1 -1 1 -1 1 1 -1
ans*ans' #This is a legitimate Hadamard matrix
8×8 Array{Int64,2}: 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 8
#Reduction of matrix multiplications
M=[randn(2,2) for i=1:4]
printnice(x)=display(round.(x, 3))
printnice(M[4]*M[3]*M[2]*M[1])
printnice(reduce((A,B)->B*A, M)) #backward multiply <-- left associativity
printnice(reduce(*, M)) #forward multiply
2×2 Array{Float64,2}: -2.556 -2.636 -2.769 -2.892
2×2 Array{Float64,2}: -2.556 -2.636 -2.769 -2.892
2×2 Array{Float64,2}: -0.093 0.036 1.715 -1.659
In the following example we apply reduce()
to function composition:
h=reduce((f,g)->(x->f(g(x))), [sin cos tan])
h(π)
0.8414709848078965
sin(cos(tan(π)))
0.8414709848078965
prefix
¶Having discussed reduce
, we are now ready for the idea behind prefix sum.
Suppose you wanted to compute the partial sums of a vector, i.e. given
y[1:n]
, we want to overwrite the vector y
with the vector of partial sums
new_y[1] = y[1]
new_y[2] = y[1] + y[2]
new_y[3] = y[1] + y[2] + y[3]
...
At first blush, it seems impossible to parallelize this, since
new_y[1] = y[1]
new_y[2] = new_y[1] + y[2]
new_y[3] = new_y[2] + y[3]
...
which is an intrinsically serial process.
function prefix_serial!(y, +)
for i=2:length(y)
y[i] = y[i-1] + y[i]
end
y
end
prefix_serial! (generic function with 1 method)
#Compute the first 8 triangular numbers
prefix_serial!([1:8;], +)'
1×8 RowVector{Int64,Array{Int64,1}}: 1 3 6 10 15 21 28 36
One particularly neat trick in Julia is that we are able to specify the +
operator as an argument to prefix_serial!
. As a result, we can use this function to compute other things too, just by swapping out the operator.
#Compute the first 8 factorials
prefix_serial!([1:8;], *)'
1×8 RowVector{Int64,Array{Int64,1}}: 1 2 6 24 120 720 5040 40320
However, it turns out that because addition (+
) is associative, we can regroup the order of how these sums are carried out. (This of course extends to other associative operations such as multiplication.) Another ordering of 8 associative operations is provided by prefix8!
:
function prefix8!(y, ⋅)
length(y)==8 || error("length 8 only")
for i in [2,4,6,8]; y[i] = y[i-1] ⋅ y[i]; end
for i in [ 4, 8]; y[i] = y[i-2] ⋅ y[i]; end
for i in [ 8]; y[i] = y[i-4] ⋅ y[i]; end
for i in [ 6 ]; y[i] = y[i-2] ⋅ y[i]; end
for i in [ 3,5,7 ]; y[i] = y[i-1] ⋅ y[i]; end
y
end
prefix8! (generic function with 1 method)
#Compute the first 8 triangular numbers
prefix8!([1:8;], +)'
1×8 RowVector{Int64,Array{Int64,1}}: 1 3 6 10 15 21 28 36
#Compute the first 8 factorials
prefix8!([1:8;], *)'
1×8 RowVector{Int64,Array{Int64,1}}: 1 2 6 24 120 720 5040 40320
The general case is a bit more convoluted to write out:
function prefix!(y, ⋅)
l=length(y)
k=ceil(Int, log2(l))
for j=1:k, i=2^j:2^j:min(l, 2^k) #"reduce"
y[i] = y[i-2^(j-1)] ⋅ y[i]
end
for j=(k-1):-1:1, i=3*2^(j-1):2^j:min(l, 2^k) #"broadcast"
y[i] = y[i-2^(j-1)] ⋅ y[i]
end
y
end
prefix! (generic function with 1 method)
We can visualize the operations with a little bit of trickery. In Julia, arrays are simply types that expose the array protocol. In particular, they need to implement methods for the generic functions length
, getindex
and setindex!
. The last two are used in indexing operations, since statements
y = A[1]
A[3] = y
get desugared to
y = getindex(A, 1)
setindex!(A, y, 3)
respectively.
We can trace through the iterable by introduce a dummy array type, AccessArray
, which stores no useful information but records every access to getindex
and setindex!
.
Specifically:
length(A::AccessArray)
returns an integer that is stored internally in the A.length
fieldgetindex(A::AccessArray, i)
records read access to the index i
in the A.read
field and always retuns nothing
.setindex!(A::AccessArray, x, i)
records write access to the index i
. The A.history
field is appended with a new tuple consisting of the current A.read
field and the index i
.The way AccessArray
works assumes an association between a single setindex!
call and and all the preceding getindex
calls since the previous setindex!
call, which is sufficient for the purposes of tracing through prefix calls.
import Base: getindex, setindex!, length, size
isdefined(:AccessArray) || type AccessArray <: AbstractArray{Void, 1}
length :: Int
read :: Vector
history :: Vector{Any}
AccessArray(length, read=[], history=[])=new(length, read, history)
end
length(A::AccessArray) = A.length
size(A::AccessArray) = (A.length,)
function getindex(A::AccessArray, i)
push!(A.read, i)
nothing
end
function setindex!(A::AccessArray, x, i)
push!(A.history, (A.read, [i]))
A.read = []
end
setindex! (generic function with 202 methods)
We also need a dummy associative operator to pass to the prefix call.
import Base.+
+(a::Void, b::Void)=a
+ (generic function with 279 methods)
A=prefix8!(AccessArray(8),+)
MethodError: no method matching size(::AccessArray) Closest candidates are: size(::AbstractArray{T,N}, ::Any) where {T, N} at abstractarray.jl:29 size(::Any, ::Integer, ::Integer, ::Integer...) where N at abstractarray.jl:30 size(::Char) at char.jl:13 ... Stacktrace: [1] #showarray#263(::Bool, ::Function, ::IOContext{Base.AbstractIOBuffer{Array{UInt8,1}}}, ::AccessArray, ::Bool) at ./show.jl:1690 [2] limitstringmime(::MIME{Symbol("text/plain")}, ::AccessArray) at /Users/cqm540/.julia/v0.6/IJulia/src/inline.jl:25 [3] display_dict(::AccessArray) at /Users/cqm540/.julia/v0.6/IJulia/src/execute_request.jl:27 [4] execute_request(::ZMQ.Socket, ::IJulia.Msg) at /Users/cqm540/.julia/v0.6/IJulia/src/execute_request.jl:188 [5] eventloop(::ZMQ.Socket) at /Users/cqm540/.julia/v0.6/IJulia/src/eventloop.jl:8 [6] (::IJulia.##11#14)() at ./task.jl:335
Now let's visualize this! Each entry in A.history
is rendered by a gate object:
type gate
ins :: Vector
outs:: Vector
end
import Gadfly.render
function render(G::gate, x₁, y₁, y₀; rᵢ=0.1, rₒ=0.25)
ipoints = [(i, y₀+rᵢ) for i in G.ins]
opoints = [(i, y₀+0.5) for i in G.outs]
igates = [circle(i..., rᵢ) for i in ipoints]
ogates = [circle(i..., rₒ) for i in opoints]
lines = [line([i, j]) for i in ipoints, j in opoints]
compose(context(units=UnitBox(0.5,0,x₁,y₁+1)),
compose(context(), stroke(colorant"black"), fill(colorant"white"),
igates..., ogates...),
compose(context(), linewidth(0.3mm), stroke(colorant"black"),
lines...))
end
A=gate([1,2],[2])
render(A,2,0,0)
Now we render the whole algorithm. We have to scan through the trace twice; the first time merely calculates the maximum depth that needs to be drawn and the second time actually generates the objects.
function render(A::AccessArray)
#Scan to find maximum depth
olast = depth = 0
for y in A.history
(any(y[1] .≤ olast)) && (depth += 1)
olast = maximum(y[2])
end
maxdepth = depth
olast = depth = 0
C = []
for y in A.history
(any(y[1] .≤ olast)) && (depth += 1)
push!(C, render(gate(y...), A.length, maxdepth, depth))
olast = maximum(y[2])
end
push!(C, compose(context(units=UnitBox(0.5,0,A.length,1)),
[line([(i,0), (i,1)]) for i=1:A.length]...,
linewidth(0.1mm), stroke(colorant"grey")))
compose(context(), C...)
end
render (generic function with 40 methods)
render(prefix!(AccessArray(20), +))
Now we can see that prefix!
rearranges the operations to form two spanning trees:
render(prefix!(AccessArray(120),+))
function prefix!(y, +)
l=length(y)
k=ceil(Int, log2(l))
@inbounds for j=1:k, i=2^j:2^j:min(l, 2^k) #"reduce"
y[i] = y[i-2^(j-1)] + y[i]
end
@inbounds for j=(k-1):-1:1, i=3*2^(j-1):2^j:min(l, 2^k) #"broadcast"
y[i] = y[i-2^(j-1)] + y[i]
end
y
end
@show prefix!(AccessArray(8),+)
render(prefix!(AccessArray(9),+))
MethodError: no method matching size(::AccessArray) Closest candidates are: size(::AbstractArray{T,N}, ::Any) where {T, N} at abstractarray.jl:29 size(::Any, ::Integer, ::Integer, ::Integer...) where N at abstractarray.jl:30 size(::Char) at char.jl:13 ... Stacktrace: [1] array_eltype_show_how(::AccessArray) at ./show.jl:1748 [2] show_vector(::Base.AbstractIOBuffer{Array{UInt8,1}}, ::AccessArray, ::String, ::String) at ./show.jl:1752 [3] #showarray#263(::Bool, ::Function, ::Base.AbstractIOBuffer{Array{UInt8,1}}, ::AccessArray, ::Bool) at ./show.jl:1681 [4] showall(::Base.AbstractIOBuffer{Array{UInt8,1}}, ::AccessArray) at ./show.jl:1724 [5] repr(::AccessArray) at ./strings/io.jl:146
#Look at the compose tree
#render(prefix8!(AccessArray(8),+)) |> introspect
as contrasted with the serial code:
render(prefix_serial!(AccessArray(8),+))
@manipulate for npp=1:180
render(prefix!(AccessArray(npp),+))
end
@manipulate for npp=1:180
render(prefix_serial!(AccessArray(npp),+))
end
draw(PDF("serial.pdf", 3.5inch, 3.5inch), render(prefix_serial!(AccessArray(8),+)))
draw(PDF("tree.pdf", 3.5inch, 3.5inch), render(prefix8!(AccessArray(8),+)))
Now let's run prefix
in parallel on every core on the CPU. Use addprocs
to attach additional worker processes.
import Base: fetch, length, +, *
fetch(t::Vector) = map(fetch, t) #Vectorize fetch
#Define elementary operations on remote data
length(r1::Future)=length(fetch(r1))
+(r1::Future, r2::Future)=@spawnat r2.where fetch(r1)+fetch(r2)
*(r1::Future, r2::Future)=@spawnat r2.where fetch(r1)*fetch(r2)
* (generic function with 284 methods)
For 8 processes, the serial version requires 7 operations. The parallel version uses 11 operations, but they are grouped into 5 simultaneous chunks of operations. Hence we expect that the parallel version runs in 5/7 the time needed for the naïve serial code (on a machine with 8 processors).
println("This computer has $(nprocs()) processors")
This computer has 8 processors
max_procs = nprocs() #Add all possible processors
#Before timing, force Julia to JIT compile the functions
[f([randn(1,1) for i=1:2], *) for f in (prefix_serial!, prefix!)]
addprocs(max(0, Sys.CPU_CORES-max_procs))
Base.zero(::Type{Tuple{Float64,Float64}}) = (0.0, 0.0)
timings=zeros(Tuple{Float64,Float64}, max_procs)
@showprogress for np in 1:max_procs
gc_enable(false) #Disable garbage collector for timing purposes
n=2048#128#512#2048#512#2048#4096#3072
#Create Futures on np processes
r=[@spawnat p randn(n,n) for p in procs()[1:np]]
s=fetch(r) #These are now the same matrices in memory on the master process
#TODO use @belapsed from BenchmarkTools.jl to measure timings
#in a more reliable manner - something funny happens though
t_ser = @belapsed prefix_serial!($s, *)
t_par = @belapsed @sync prefix!($r, *)
timings[np] = (t_ser, t_par)
gc_enable(true)
end
timings
Progress: 100%|█████████████████████████████████████████| Time: 0:03:43
8-element Array{Tuple{Float64,Float64},1}: (6.335e-9, 7.55627e-7) (0.136874, 0.447522) (0.312059, 0.98023) (0.47621, 1.35074) (0.611405, 1.77155) (0.785864, 2.03217) (1.05001, 2.52966) (1.07574, 2.73774)
#timings = #Data from julia.mit.edu, a large shared memory machine with 80 cores
#[
# ( 15.585, 15.721), ( 16.325, 15.893), ( 32.079, 31.678), ( 46.037, 33.257),
# ( 61.947, 51.412), ( 77.247, 50.769), ( 92.588, 67.796), (108.057, 65.185),
# (123.556, 68.197), (138.408, 66.658), (153.392, 82.580), (168.874, 83.373),
# (184.227, 82.875), (199.741, 83.098), (215.881, 101.271), (230.531, 104.860),
# (246.576, 103.229), (261.412, 102.497), (276.926, 103.367), (294.404, 104.162),
# (308.995, 104.944), (323.287, 104.838), (340.173, 121.495), (353.717, 119.729),
# (369.113, 120.281), (384.638, 118.710), (402.484, 119.237), (417.036, 119.980),
# (431.273, 120.463), (446.288, 120.560), (461.756, 137.209), (476.653, 135.873),
# (493.989, 138.150), (508.089, 137.998), (523.703, 139.059), (539.305, 135.201),
# (554.201, 138.242), (569.276, 137.470), (585.268, 137.261), (599.921, 137.871),
# (615.718, 137.887), (632.463, 137.579), (647.355, 138.181), (664.344, 139.917),
# (678.899, 138.382), (695.815, 138.354), (717.068, 154.826), (730.902, 155.603),
# (745.608, 154.549), (756.012, 154.244), (769.611, 154.000), (793.350, 152.893),
# (801.568, 155.276), (818.174, 153.436), (832.994, 155.923), (849.526, 155.721),
# (863.337, 155.289), (881.577, 156.188), (893.735, 157.889), (911.597, 155.557),
# (925.117, 157.048), (940.897, 153.338), (954.825, 173.942), (990.757, 173.236),
# (987.457, 172.005), (1009.688, 173.869), (1022.357, 173.303), (1043.536, 175.044),
# (1052.156, 173.361), (1067.174, 172.230), (1079.543, 173.491), (1101.516, 174.311),
# (1119.993, 174.057), (1136.092, 175.804), (1154.758, 175.075), (1170.732, 173.453),
# (1184.598, 171.936), (1190.922, 172.809), (1204.645, 172.878),
#]
speedup(p::Integer) = p==1 ? 1.0 : ((p-1)/(floor(Int, log2(p)) + 1 + floor(Int, log2(p/3))))
#for (np, (t_ser, t_par)) in sort!([x for x in timings])
# @printf("np: %3d Serial: %.3f sec Parallel: %.3f sec speedup: %.3fx (theory=%.3fx)\n",
# np, t_ser, t_par, t_ser/t_par, speedup(np))
#end
xend = 1+max_procs
theplot=plot(
#Theoretical speedup curve
layer(x=1:xend, y=map(speedup, 1:xend), Geom.line(), Theme(default_color=colorant"darkkhaki")),
#Empirical data
layer(x=1:max_procs,
y=[t_ser/t_par for (t_ser, t_par) in timings],
Geom.point,
Theme(default_color=colorant"firebrick",
point_size=1.25pt, highlight_width=0pt)),
Guide.xlabel("Number of processors, p"), Guide.ylabel("Speedup, r")
)
If you run this notebook on your local machine you will probably notice that a lot of things can go wrong in benchmarking:
BenchmarkTools.jl
was specifically written to handle the timing of fast serial kernels; however, the problem of developing proper statistics for parallel kernels remains an unsolved problem.@sync
barrier and the parallel code will magically look like it's faster because execution flow goes back to the master node immediately after the Future
is created.# This plot is specifically for the timing data in the paper
#
# labeledplot = compose(render(theplot),
# compose(context(), stroke(colorant"purple"), #Powers of 2
# text(0.21, 0.73, "2"), text(0.24, 0.72, "4"), text(0.27, 0.7, "8"), text(0.33, 0.63, "16"),
# text(0.47, 0.53, "32"), text(0.76, 0.35, "64")),
# compose(context(), stroke(colorant"green"), #3x powers of 2
# text(0.25, 0.62, "6"), text(0.28, 0.57, "12"), text(0.39, 0.49, "24"), text(0.63, 0.32, "48"),
# )
# )
# draw(SVG(3.5inch, 3.2inch), labeledplot)
#draw(SVG("scaling.svg", 3.5inch, 3.2inch), labeledplot)
#draw(PDF("scaling.pdf", 3.5inch, 3inch), theplot)
The exact same serial code now runs in parallel thanks to multiple dispatch!
Julia allows us to implement very easily the interval of summations monoid technique for verifying the correctness of prefix sums (doi:10.1145/2535838.2535882).
#Definition 4.3 of Chong et al., 2014
abstract type ⊤ end
abstract type Id end
const 𝕊 = Union{UnitRange, Type{⊤}, Type{Id}};
⊕(I::UnitRange, J::UnitRange) = I.stop+1==J.start ? (I.start:J.stop) : ⊤
⊕(::Type{Id}, ::Type{Id}) = Id
⊕(I::𝕊, ::Type{Id}) = I
⊕(::Type{Id}, I::𝕊) = I
⊕(I::𝕊, J::𝕊) = ⊤
⊕ (generic function with 5 methods)
for kernel in (prefix_serial!, prefix!), n=1:1000
@assert kernel([k:k for k=1:n], ⊕) == [1:k for k=1:n]
end
Wrong kernels produce type errors!
function prefix_wrong!(y, .+)
l = length(y)
offset = 1
z = copy(y)
while offset < l
for i=offset+1:l
z[i] = y[i-offset] .+ z[i]
end
offset += 2
y = copy(z)
end
y
end
@assert prefix_wrong!([k:k for k=1:8], ⊕) #Should produce conversion error
MethodError: Cannot `convert` an object of type Array{Int64,1} to an object of type UnitRange{Int64} This may have arisen from a call to the constructor UnitRange{Int64}(...), since type constructors fall back to convert methods. Stacktrace: [1] prefix_wrong!(::Array{UnitRange{Int64},1}, ::Function) at ./In[44]:7
@manipulate for npp=1:10
render(prefix_wrong!(AccessArray(npp),+))
end
In this section I play with other parallel prefix algorithms. All the ones here suffer from read antidependencies and they break the logic I used to draw the gates.
#Kogge-Stone variant
#Works but doesn't render properly
function prefix_ks!(y, .+)
l = length(y)
offset = 1
z = copy(y)
while offset < l
for i=offset+1:l
z[i] = y[i-offset] .+ z[i]
end
offset *= 2
y = copy(z)
end
y
end
@assert prefix_ks!(ones(8), +) == [1:8;]
render(prefix_ks!(AccessArray(8),+))
#Hillis-Steele = Kogge-Stone
function prefix_hs!(y, +)
l = length(y)
lk = floor(Int, log2(l))
for d=1:lk
for k=2d:l
y[k] = y[k-2^(d-1)] + y[k]
end
end
y
end
prefix_hs!(ones(4), +)
4-element Array{Float64,1}: 1.0 2.0 3.0 6.0
render(prefix_hs!(AccessArray(4), +))
#Sklansky
function prefix_s!(y, +)
n = length(y)
for d=1:log2(n)
for k=1:n÷2+1
block = 2 * (k - (mod(k, 2^d)))
me = block + mod(k, 2^d) + 2^d
spine = block + 2^d - 1
if spine ≤ n && me ≤ n
y[me] = y[me] + y[spine]
end
end
end
y
end
prefix_s!(ones(4), +)
ArgumentError: invalid index: 3.0 Stacktrace: [1] prefix_s!(::Array{Float64,1}, ::Base.#+) at ./In[61]:10
function prefix(v,⊕)
@show v
k=length(v)
k≤1 && return v
v[2:2:k] = @show v[1:2:k-1]⊕v[2:2:k] # Line 1
v[2:2:k] = @show prefix(v[2:2:k],⊕) # Line 2
v[3:2:k] = @show v[2:2:k-1]⊕v[3:2:k] # Line 3
v
end
prefix (generic function with 1 method)
prefix(ones(8),+)
v = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0] v[1:2:k - 1] ⊕ v[2:2:k] = [2.0, 2.0, 2.0, 2.0] v = [2.0, 2.0, 2.0, 2.0] v[1:2:k - 1] ⊕ v[2:2:k] = [4.0, 4.0] v = [4.0, 4.0] v[1:2:k - 1] ⊕ v[2:2:k] = [8.0] v = [8.0] prefix(v[2:2:k], ⊕) = [8.0] v[2:2:k - 1] ⊕ v[3:2:k] = Float64[] prefix(v[2:2:k], ⊕) = [4.0, 8.0] v[2:2:k - 1] ⊕ v[3:2:k] = [6.0] prefix(v[2:2:k], ⊕) = [2.0, 4.0, 6.0, 8.0] v[2:2:k - 1] ⊕ v[3:2:k] = [3.0, 5.0, 7.0]
8-element Array{Float64,1}: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
function ⊕{T<:Range}(v1::Vector{T}, v2::Vector{T})
for v in (v1, v2), i in 1:length(v)-1
@assert v[i].stop + 1 == v[i+1].start
end
if length(v1)>0 && length(v2)>0
@assert v1[end].stop + 1 == v2[1].start
v1[1].start:v2[end].stop
elseif length(v1)>0 #&& length(v2)==0
v1[1].start:v1[end].stop
elseif length(v2)>0 #&& length(v1)==0
v2[1].start:v2[end].stop
else #Both are empty
v1
end
end
⊕ (generic function with 6 methods)
for kernel in (prefix,), n=1:100#(prefix_serial!, prefix!, prefix_ks!), n=1:100
@assert kernel([k:k for k=1:n], ⊕) == [1:k for k=1:n]
end
v = UnitRange{Int64}[1:1] v = UnitRange{Int64}[1:1, 2:2] v[1:2:k - 1] ⊕ v[2:2:k] = 1:2
DimensionMismatch("tried to assign 2 elements to 1 destinations") Stacktrace: [1] throw_setindex_mismatch(::UnitRange{Int64}, ::Tuple{Int64}) at ./indices.jl:92 [2] setindex_shape_check(::UnitRange{Int64}, ::Int64) at ./indices.jl:143 [3] setindex!(::Array{UnitRange{Int64},1}, ::UnitRange{Int64}, ::StepRange{Int64,Int64}) at ./array.jl:563 [4] prefix(::Array{UnitRange{Int64},1}, ::#⊕) at ./In[62]:5 [5] macro expansion at ./In[65]:2 [inlined] [6] anonymous at ./<missing>:?