今回は、Bank Marketing Datasetを使って、
を試してみましょう。
最初に下記からデータを取得して、bank.csv, bank-full.csvをJuliaBoxに用意しておいてください。
using DataFrames
table = readtable("bank.csv", separator=';')
age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 30 | unemployed | married | primary | no | 1787 | no | no | cellular | 19 | oct | 79 | 1 | -1 | 0 | unknown | no |
2 | 33 | services | married | secondary | no | 4789 | yes | yes | cellular | 11 | may | 220 | 1 | 339 | 4 | failure | no |
3 | 35 | management | single | tertiary | no | 1350 | yes | no | cellular | 16 | apr | 185 | 1 | 330 | 1 | failure | no |
4 | 30 | management | married | tertiary | no | 1476 | yes | yes | unknown | 3 | jun | 199 | 4 | -1 | 0 | unknown | no |
5 | 59 | blue-collar | married | secondary | no | 0 | yes | no | unknown | 5 | may | 226 | 1 | -1 | 0 | unknown | no |
6 | 35 | management | single | tertiary | no | 747 | no | no | cellular | 23 | feb | 141 | 2 | 176 | 3 | failure | no |
7 | 36 | self-employed | married | tertiary | no | 307 | yes | no | cellular | 14 | may | 341 | 1 | 330 | 2 | other | no |
8 | 39 | technician | married | secondary | no | 147 | yes | no | cellular | 6 | may | 151 | 2 | -1 | 0 | unknown | no |
9 | 41 | entrepreneur | married | tertiary | no | 221 | yes | no | unknown | 14 | may | 57 | 2 | -1 | 0 | unknown | no |
10 | 43 | services | married | primary | no | -88 | yes | yes | cellular | 17 | apr | 313 | 1 | 147 | 2 | failure | no |
11 | 39 | services | married | secondary | no | 9374 | yes | no | unknown | 20 | may | 273 | 1 | -1 | 0 | unknown | no |
12 | 43 | admin. | married | secondary | no | 264 | yes | no | cellular | 17 | apr | 113 | 2 | -1 | 0 | unknown | no |
13 | 36 | technician | married | tertiary | no | 1109 | no | no | cellular | 13 | aug | 328 | 2 | -1 | 0 | unknown | no |
14 | 20 | student | single | secondary | no | 502 | no | no | cellular | 30 | apr | 261 | 1 | -1 | 0 | unknown | yes |
15 | 31 | blue-collar | married | secondary | no | 360 | yes | yes | cellular | 29 | jan | 89 | 1 | 241 | 1 | failure | no |
16 | 40 | management | married | tertiary | no | 194 | no | yes | cellular | 29 | aug | 189 | 2 | -1 | 0 | unknown | no |
17 | 56 | technician | married | secondary | no | 4073 | no | no | cellular | 27 | aug | 239 | 5 | -1 | 0 | unknown | no |
18 | 37 | admin. | single | tertiary | no | 2317 | yes | no | cellular | 20 | apr | 114 | 1 | 152 | 2 | failure | no |
19 | 25 | blue-collar | single | primary | no | -221 | yes | no | unknown | 23 | may | 250 | 1 | -1 | 0 | unknown | no |
20 | 31 | services | married | secondary | no | 132 | no | no | cellular | 7 | jul | 148 | 1 | 152 | 1 | other | no |
21 | 38 | management | divorced | unknown | no | 0 | yes | no | cellular | 18 | nov | 96 | 2 | -1 | 0 | unknown | no |
22 | 42 | management | divorced | tertiary | no | 16 | no | no | cellular | 19 | nov | 140 | 3 | -1 | 0 | unknown | no |
23 | 44 | services | single | secondary | no | 106 | no | no | unknown | 12 | jun | 109 | 2 | -1 | 0 | unknown | no |
24 | 44 | entrepreneur | married | secondary | no | 93 | no | no | cellular | 7 | jul | 125 | 2 | -1 | 0 | unknown | no |
25 | 26 | housemaid | married | tertiary | no | 543 | no | no | cellular | 30 | jan | 169 | 3 | -1 | 0 | unknown | no |
26 | 41 | management | married | tertiary | no | 5883 | no | no | cellular | 20 | nov | 182 | 2 | -1 | 0 | unknown | no |
27 | 55 | blue-collar | married | primary | no | 627 | yes | no | unknown | 5 | may | 247 | 1 | -1 | 0 | unknown | no |
28 | 67 | retired | married | unknown | no | 696 | no | no | telephone | 17 | aug | 119 | 1 | 105 | 2 | failure | no |
29 | 56 | self-employed | married | secondary | no | 784 | no | yes | cellular | 30 | jul | 149 | 2 | -1 | 0 | unknown | no |
30 | 53 | admin. | married | secondary | no | 105 | no | yes | cellular | 21 | aug | 74 | 2 | -1 | 0 | unknown | no |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
DataFrames
をより便利に使うために、DataFramesMetaを使いましょう。
これを使うと、よりPandasっぽい(あるいはSQL likeな)便利な書き方ができるようになります。
Pkg.add("DataFramesMeta")
INFO: Nothing to be done INFO: METADATA is out-of-date — you may not have the latest version of DataFramesMeta INFO: Use `Pkg.update()` to get the latest versions of your packages
using DataFramesMeta
x_thread = @linq table |>
where(:age .> 60) |>
where(:housing .== "yes") |>
orderby(:job)
age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 61 | admin. | married | unknown | no | 4629 | yes | no | cellular | 27 | jan | 181 | 1 | 92 | 1 | success | yes |
2 | 61 | blue-collar | married | primary | no | 625 | yes | no | unknown | 19 | may | 158 | 2 | -1 | 0 | unknown | no |
3 | 65 | housemaid | married | primary | no | 2179 | yes | no | cellular | 4 | sep | 112 | 7 | -1 | 0 | unknown | no |
4 | 66 | management | married | tertiary | no | 1048 | yes | no | cellular | 23 | jun | 971 | 2 | -1 | 0 | unknown | no |
5 | 63 | retired | married | secondary | no | 415 | yes | no | cellular | 7 | oct | 323 | 1 | -1 | 0 | unknown | no |
6 | 75 | retired | divorced | tertiary | no | 3810 | yes | no | cellular | 16 | nov | 262 | 1 | 183 | 1 | failure | yes |
7 | 63 | retired | married | tertiary | no | 133 | yes | no | cellular | 13 | feb | 104 | 2 | -1 | 0 | unknown | no |
8 | 71 | retired | married | tertiary | no | 14220 | yes | no | cellular | 9 | sep | 397 | 1 | -1 | 0 | unknown | yes |
9 | 61 | retired | married | primary | no | 1060 | yes | no | unknown | 13 | may | 118 | 1 | -1 | 0 | unknown | no |
10 | 68 | retired | married | secondary | no | 19317 | yes | no | cellular | 4 | aug | 249 | 1 | -1 | 0 | unknown | yes |
11 | 61 | retired | married | secondary | no | 76 | yes | no | cellular | 15 | jul | 195 | 7 | -1 | 0 | unknown | no |
12 | 62 | self-employed | divorced | tertiary | no | 6 | yes | no | cellular | 13 | oct | 216 | 1 | 183 | 4 | success | yes |
age_balance = @linq table |>
where(:age .> 60) |>
where(:housing .== "yes") |>
select(:age, :balance)
age | balance | |
---|---|---|
1 | 61 | 4629 |
2 | 63 | 415 |
3 | 75 | 3810 |
4 | 66 | 1048 |
5 | 61 | 625 |
6 | 63 | 133 |
7 | 71 | 14220 |
8 | 61 | 1060 |
9 | 65 | 2179 |
10 | 68 | 19317 |
11 | 61 | 76 |
12 | 62 | 6 |
using Gadfly
plot(age_balance, x=:age, y=:balance, Geom.point)
(メンテナー絶賛募集中の)SVM.jlを使って機械学習を試してみましょう
bank = readtable("bank-full.csv", separator = ';')
age | job | marital | education | default | balance | housing | loan | contact | day | month | duration | campaign | pdays | previous | poutcome | y | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 58 | management | married | tertiary | no | 2143 | yes | no | unknown | 5 | may | 261 | 1 | -1 | 0 | unknown | no |
2 | 44 | technician | single | secondary | no | 29 | yes | no | unknown | 5 | may | 151 | 1 | -1 | 0 | unknown | no |
3 | 33 | entrepreneur | married | secondary | no | 2 | yes | yes | unknown | 5 | may | 76 | 1 | -1 | 0 | unknown | no |
4 | 47 | blue-collar | married | unknown | no | 1506 | yes | no | unknown | 5 | may | 92 | 1 | -1 | 0 | unknown | no |
5 | 33 | unknown | single | unknown | no | 1 | no | no | unknown | 5 | may | 198 | 1 | -1 | 0 | unknown | no |
6 | 35 | management | married | tertiary | no | 231 | yes | no | unknown | 5 | may | 139 | 1 | -1 | 0 | unknown | no |
7 | 28 | management | single | tertiary | no | 447 | yes | yes | unknown | 5 | may | 217 | 1 | -1 | 0 | unknown | no |
8 | 42 | entrepreneur | divorced | tertiary | yes | 2 | yes | no | unknown | 5 | may | 380 | 1 | -1 | 0 | unknown | no |
9 | 58 | retired | married | primary | no | 121 | yes | no | unknown | 5 | may | 50 | 1 | -1 | 0 | unknown | no |
10 | 43 | technician | single | secondary | no | 593 | yes | no | unknown | 5 | may | 55 | 1 | -1 | 0 | unknown | no |
11 | 41 | admin. | divorced | secondary | no | 270 | yes | no | unknown | 5 | may | 222 | 1 | -1 | 0 | unknown | no |
12 | 29 | admin. | single | secondary | no | 390 | yes | no | unknown | 5 | may | 137 | 1 | -1 | 0 | unknown | no |
13 | 53 | technician | married | secondary | no | 6 | yes | no | unknown | 5 | may | 517 | 1 | -1 | 0 | unknown | no |
14 | 58 | technician | married | unknown | no | 71 | yes | no | unknown | 5 | may | 71 | 1 | -1 | 0 | unknown | no |
15 | 57 | services | married | secondary | no | 162 | yes | no | unknown | 5 | may | 174 | 1 | -1 | 0 | unknown | no |
16 | 51 | retired | married | primary | no | 229 | yes | no | unknown | 5 | may | 353 | 1 | -1 | 0 | unknown | no |
17 | 45 | admin. | single | unknown | no | 13 | yes | no | unknown | 5 | may | 98 | 1 | -1 | 0 | unknown | no |
18 | 57 | blue-collar | married | primary | no | 52 | yes | no | unknown | 5 | may | 38 | 1 | -1 | 0 | unknown | no |
19 | 60 | retired | married | primary | no | 60 | yes | no | unknown | 5 | may | 219 | 1 | -1 | 0 | unknown | no |
20 | 33 | services | married | secondary | no | 0 | yes | no | unknown | 5 | may | 54 | 1 | -1 | 0 | unknown | no |
21 | 28 | blue-collar | married | secondary | no | 723 | yes | yes | unknown | 5 | may | 262 | 1 | -1 | 0 | unknown | no |
22 | 56 | management | married | tertiary | no | 779 | yes | no | unknown | 5 | may | 164 | 1 | -1 | 0 | unknown | no |
23 | 32 | blue-collar | single | primary | no | 23 | yes | yes | unknown | 5 | may | 160 | 1 | -1 | 0 | unknown | no |
24 | 25 | services | married | secondary | no | 50 | yes | no | unknown | 5 | may | 342 | 1 | -1 | 0 | unknown | no |
25 | 40 | retired | married | primary | no | 0 | yes | yes | unknown | 5 | may | 181 | 1 | -1 | 0 | unknown | no |
26 | 44 | admin. | married | secondary | no | -372 | yes | no | unknown | 5 | may | 172 | 1 | -1 | 0 | unknown | no |
27 | 39 | management | single | tertiary | no | 255 | yes | no | unknown | 5 | may | 296 | 1 | -1 | 0 | unknown | no |
28 | 52 | entrepreneur | married | secondary | no | 113 | yes | yes | unknown | 5 | may | 127 | 1 | -1 | 0 | unknown | no |
29 | 46 | management | single | secondary | no | -246 | yes | no | unknown | 5 | may | 255 | 2 | -1 | 0 | unknown | no |
30 | 36 | technician | single | secondary | no | 265 | yes | yes | unknown | 5 | may | 348 | 1 | -1 | 0 | unknown | no |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
bank[:y] = [y == "yes" ? 1.0 : -1.0 for y in bank[:y]]
45211-element Array{Float64,1}: -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 ⋮ -1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 -1.0 -1.0
julia-users MLのコードを使わせてもらう
https://groups.google.com/d/msg/julia-users/7-Vtpi8w4YI/KvMlKAZSwDkJ
categorical_keys = [:job, :marital, :education, :default, :housing, :loan, :contact, :month, :poutcome]
9-element Array{Symbol,1}: :job :marital :education :default :housing :loan :contact :month :poutcome
numerical_keys = setdiff(names(bank), [categorical_keys, :y])
7-element Array{Symbol,1}: :age :balance :day :duration :campaign :pdays :previous
bank_normalized = deepcopy(bank)
bank_normalized[numerical_keys]
age | balance | day | duration | campaign | pdays | previous | |
---|---|---|---|---|---|---|---|
1 | 58 | 2143 | 5 | 261 | 1 | -1 | 0 |
2 | 44 | 29 | 5 | 151 | 1 | -1 | 0 |
3 | 33 | 2 | 5 | 76 | 1 | -1 | 0 |
4 | 47 | 1506 | 5 | 92 | 1 | -1 | 0 |
5 | 33 | 1 | 5 | 198 | 1 | -1 | 0 |
6 | 35 | 231 | 5 | 139 | 1 | -1 | 0 |
7 | 28 | 447 | 5 | 217 | 1 | -1 | 0 |
8 | 42 | 2 | 5 | 380 | 1 | -1 | 0 |
9 | 58 | 121 | 5 | 50 | 1 | -1 | 0 |
10 | 43 | 593 | 5 | 55 | 1 | -1 | 0 |
11 | 41 | 270 | 5 | 222 | 1 | -1 | 0 |
12 | 29 | 390 | 5 | 137 | 1 | -1 | 0 |
13 | 53 | 6 | 5 | 517 | 1 | -1 | 0 |
14 | 58 | 71 | 5 | 71 | 1 | -1 | 0 |
15 | 57 | 162 | 5 | 174 | 1 | -1 | 0 |
16 | 51 | 229 | 5 | 353 | 1 | -1 | 0 |
17 | 45 | 13 | 5 | 98 | 1 | -1 | 0 |
18 | 57 | 52 | 5 | 38 | 1 | -1 | 0 |
19 | 60 | 60 | 5 | 219 | 1 | -1 | 0 |
20 | 33 | 0 | 5 | 54 | 1 | -1 | 0 |
21 | 28 | 723 | 5 | 262 | 1 | -1 | 0 |
22 | 56 | 779 | 5 | 164 | 1 | -1 | 0 |
23 | 32 | 23 | 5 | 160 | 1 | -1 | 0 |
24 | 25 | 50 | 5 | 342 | 1 | -1 | 0 |
25 | 40 | 0 | 5 | 181 | 1 | -1 | 0 |
26 | 44 | -372 | 5 | 172 | 1 | -1 | 0 |
27 | 39 | 255 | 5 | 296 | 1 | -1 | 0 |
28 | 52 | 113 | 5 | 127 | 1 | -1 | 0 |
29 | 46 | -246 | 5 | 255 | 2 | -1 | 0 |
30 | 36 | 265 | 5 | 348 | 1 | -1 | 0 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
for key in numerical_keys
bank_normalized[key] = (bank[:, key] - mean(bank[key])) / std(bank[key])
end
bank_normalized[numerical_keys]
age | balance | day | duration | campaign | pdays | previous | |
---|---|---|---|---|---|---|---|
1 | 1.6069471864824068 | 0.25641641627596995 | -1.2984619713129057 | 0.011015976074886151 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
2 | 0.28852607995632196 | -0.437889851794998 | -1.2984619713129057 | -0.41612235524730484 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
3 | -0.7473762180284589 | -0.4467575288233406 | -1.2984619713129057 | -0.7073530356942532 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
4 | 0.5710448884976258 | 0.04720492490359217 | -1.2984619713129057 | -0.6452238238655709 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
5 | -0.7473762180284589 | -0.44708596130587175 | -1.2984619713129057 | -0.2336177955005505 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
6 | -0.5590303456675897 | -0.3715464903236945 | -1.2984619713129057 | -0.4627192641188166 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
7 | -1.218240898930632 | -0.3006050740969542 | -1.2984619713129057 | -0.15983935645399025 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
8 | 0.10018020759545271 | -0.4467575288233406 | -1.2984619713129057 | 0.47310198905071094 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
9 | 1.6069471864824068 | -0.40767406340212714 | -1.2984619713129057 | -0.808313004915862 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
10 | 0.19435314377588733 | -0.25265393164739824 | -1.2984619713129057 | -0.7888976262193987 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
11 | 0.006007271415018085 | -0.35873762350497757 | -1.2984619713129057 | -0.14042397775752702 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
12 | -1.1240679627501975 | -0.3193257256012329 | -1.2984619713129057 | -0.47048541559740187 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
13 | 1.1360825055802335 | -0.4454437988932157 | -1.2984619713129057 | 1.0050833653338034 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
14 | 1.6069471864824068 | -0.42409568752868737 | -1.2984619713129057 | -0.7267684143907165 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
15 | 1.512774250301972 | -0.39420833161834773 | -1.2984619713129057 | -0.326811613243574 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
16 | 0.9477366332193643 | -0.372203355288757 | -1.2984619713129057 | 0.3682589440898095 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
17 | 0.38269901613675655 | -0.4431447715154973 | -1.2984619713129057 | -0.621925369429815 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
18 | 1.512774250301972 | -0.4303359046967803 | -1.2984619713129057 | -0.8549099137873737 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
19 | 1.7952930588432758 | -0.42770844483653064 | -1.2984619713129057 | -0.15207320497540494 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
20 | -0.7473762180284589 | -0.447414393788403 | -1.2984619713129057 | -0.7927807019586914 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
21 | -1.218240898930632 | -0.20995770891834156 | -1.2984619713129057 | 0.014899051814178797 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
22 | 1.4186013141215374 | -0.19156548989659405 | -1.2984619713129057 | -0.36564237063650046 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
23 | -0.8415491542088935 | -0.43986044669018526 | -1.2984619713129057 | -0.381174673593671 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
24 | -1.5007597074719359 | -0.4309927696618427 | -1.2984619713129057 | 0.3255451109575904 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
25 | -0.08816566476541654 | -0.447414393788403 | -1.2984619713129057 | -0.29963008306852545 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
26 | 0.28852607995632196 | -0.5695912772900114 | -1.2984619713129057 | -0.3345777647221593 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
27 | -0.18233860094585116 | -0.36366411074294563 | -1.2984619713129057 | 0.14692362695012873 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
28 | 1.0419095693997988 | -0.4103015232623768 | -1.2984619713129057 | -0.5093161729903283 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
29 | 0.4768719523171912 | -0.5282087844910794 | -1.2984619713129057 | -0.012282478360869719 | -0.24655762082480956 | -0.4114485561028586 | -0.25193758438383734 |
30 | -0.46485740948715504 | -0.3603797859176336 | -1.2984619713129057 | 0.34884356539334627 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
function getdummy{R}(df::DataFrame, cname::Symbol, ::Type{R})
darr = df[cname]
vals = sort(levels(darr))[2:end]
namedict = Dict(vals, 1:length(vals))
arr = zeros(R, length(darr), length(namedict))
for i=1:length(darr)
if haskey(namedict, darr[i])
arr[i, namedict[darr[i]]] = 1
end
end
newdf = convert(DataFrame, arr)
names!(newdf, [symbol("$(cname)_$k") for k in vals])
return newdf
end
function convertdummy{R}(df::DataFrame, cnames::Array{Symbol}, ::Type{R})
# consider every variable from cnames as categorical
# and convert them into set of dummy variables,
# return new dataframe
newdf = DataFrame()
for cname in names(df)
if !in(cname, cnames)
newdf[cname] = df[cname]
else
dummydf = getdummy(df, cname, R)
for dummyname in names(dummydf)
newdf[dummyname] = dummydf[dummyname]
end
end
end
return newdf
end
convertdummy(df::DataFrame, cnames::Array{Symbol}) = convertdummy(df, cnames, Int32)
convertdummy (generic function with 2 methods)
bank_dummy = convertdummy(bank_normalized[:, 1:16], categorical_keys)
age | job_blue-collar | job_entrepreneur | job_housemaid | job_management | job_retired | job_self-employed | job_services | job_student | job_technician | job_unemployed | job_unknown | marital_married | marital_single | education_secondary | education_tertiary | education_unknown | default_yes | balance | housing_yes | loan_yes | contact_telephone | contact_unknown | day | month_aug | month_dec | month_feb | month_jan | month_jul | month_jun | month_mar | month_may | month_nov | month_oct | month_sep | duration | campaign | pdays | previous | poutcome_other | poutcome_success | poutcome_unknown | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1.6069471864824068 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0.25641641627596995 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.011015976074886151 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
2 | 0.28852607995632196 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | -0.437889851794998 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.41612235524730484 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
3 | -0.7473762180284589 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | -0.4467575288233406 | 1 | 1 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.7073530356942532 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
4 | 0.5710448884976258 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0.04720492490359217 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.6452238238655709 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
5 | -0.7473762180284589 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | -0.44708596130587175 | 0 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.2336177955005505 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
6 | -0.5590303456675897 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | -0.3715464903236945 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.4627192641188166 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
7 | -1.218240898930632 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | -0.3006050740969542 | 1 | 1 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.15983935645399025 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
8 | 0.10018020759545271 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | -0.4467575288233406 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.47310198905071094 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
9 | 1.6069471864824068 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | -0.40767406340212714 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.808313004915862 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
10 | 0.19435314377588733 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | -0.25265393164739824 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.7888976262193987 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
11 | 0.006007271415018085 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.35873762350497757 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.14042397775752702 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
12 | -1.1240679627501975 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | -0.3193257256012329 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.47048541559740187 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
13 | 1.1360825055802335 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | -0.4454437988932157 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1.0050833653338034 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
14 | 1.6069471864824068 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | -0.42409568752868737 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.7267684143907165 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
15 | 1.512774250301972 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | -0.39420833161834773 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.326811613243574 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
16 | 0.9477366332193643 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | -0.372203355288757 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.3682589440898095 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
17 | 0.38269901613675655 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | -0.4431447715154973 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.621925369429815 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
18 | 1.512774250301972 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | -0.4303359046967803 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.8549099137873737 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
19 | 1.7952930588432758 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | -0.42770844483653064 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.15207320497540494 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
20 | -0.7473762180284589 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | -0.447414393788403 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.7927807019586914 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
21 | -1.218240898930632 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | -0.20995770891834156 | 1 | 1 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.014899051814178797 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
22 | 1.4186013141215374 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | -0.19156548989659405 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.36564237063650046 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
23 | -0.8415491542088935 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | -0.43986044669018526 | 1 | 1 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.381174673593671 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
24 | -1.5007597074719359 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | -0.4309927696618427 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.3255451109575904 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
25 | -0.08816566476541654 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | -0.447414393788403 | 1 | 1 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.29963008306852545 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
26 | 0.28852607995632196 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | -0.5695912772900114 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.3345777647221593 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
27 | -0.18233860094585116 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | -0.36366411074294563 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.14692362695012873 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
28 | 1.0419095693997988 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | -0.4103015232623768 | 1 | 1 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.5093161729903283 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
29 | 0.4768719523171912 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | -0.5282087844910794 | 1 | 0 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | -0.012282478360869719 | -0.24655762082480956 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
30 | -0.46485740948715504 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | -0.3603797859176336 | 1 | 1 | 0 | 1 | -1.2984619713129057 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0.34884356539334627 | -0.5693443410168078 | -0.4114485561028586 | -0.25193758438383734 | 0 | 0 | 1 |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
X = convert(Array, bank_dummy[:, 1:42])'
42x45211 Array{Real,2}: 1.60695 0.288526 -0.747376 … 2.92537 1.51277 -0.370684 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 … 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 ⋮ ⋱ ⋮ 0 0 0 … 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0.011016 -0.416122 -0.707353 … 3.37376 0.970136 0.399324 -0.569344 -0.569344 -0.569344 0.721803 0.399016 -0.246558 -0.411449 -0.411449 -0.411449 1.43617 -0.411449 1.47612 -0.251938 -0.251938 -0.251938 1.05046 -0.251938 4.52353 0 0 0 0 0 1 0 0 0 … 1 0 0 1 1 1 0 1 0
Y = convert(Array, bank_normalized[:y])
45211-element Array{Float64,1}: -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 -1.0 ⋮ -1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 -1.0 -1.0
attribute_num, sample_num = size(X)
(42,45211)
train_flags = randbool(sample_num)
45211-element BitArray{1}: false true false false true false true true false true true false false ⋮ false true true false false false false false true false true true
using SVM
model_svm = svm(X[:, train_flags], Y[train_flags])
Fitted linear SVM * Non-zero weights: 41 * Iterations: 100 * Converged: true
WARNING: nnz(A::StridedArray) is deprecated, use countnz(A) instead. in show at /home/juser/.julia/v0.3/SVM/src/SVM.jl:15 in anonymous at show.jl:1159 in with_output_limit at ./show.jl:1136 in showlimited at show.jl:1158 in writemime at replutil.jl:2 in writemime at multimedia.jl:41 in sprint at iostream.jl:229 in display_dict at /home/juser/.julia/v0.3/IJulia/src/execute_request.jl:25 in execute_request_0x535c5df2 at /home/juser/.julia/v0.3/IJulia/src/execute_request.jl:196 in eventloop at /home/juser/.julia/v0.3/IJulia/src/IJulia.jl:123 in anonymous at task.jl:340
accuracy = countnz(predict(model_svm, X[:, ~train_flags]) .== Y[~train_flags]) / countnz(~train_flags)
0.8872026979055733