For details, see https://www.kaggle.com/c/mlsp-2014-mri/overview
import numpy as np
import pandas as pd
from os import path
For more details, see https://www.kaggle.com/c/mlsp-2014-mri/data.
Functional Network Connectivity (FNC) are correlation values that summarize the overall connection between independent brain maps over time. Therefore, the FNC feature gives a picture of the connectivity pattern over time between independent networks (or brain maps). The provided FNC information was obtained from functional magnetic resonance imaging (fMRI) from a set of schizophrenic patients and healthy controls at rest, using group independent component analysis (GICA). The GICA decomposition of the fMRI data resulted in a set of brain maps, and corresponding timecourses. These timecourses indicated the activity level of the corresponding brain map at each point in time. The FNC feature are the correlations between these timecourses. In a way, FNC indicates a subject's overall level of 'synchronicity' between brain areas. Because this information is derived from functional MRI scans, FNCs are considered a functional modality feature (i.e., they describe patterns of the brain function). More about FNCs can be found here: FNC paper.
FNC_train = pd.read_csv(path.join('Train', 'train_FNC.csv'), index_col=0)
Source-Based Morphometry (SBM) loadings correspond to the weights of brain maps obtained from the application of independent component analysis (ICA) on the gray-matter concentration maps of all subjects. Gray-matter corresponds to the outer-sheet of the brain; it is the brain region in which much of the brain signal processing actually occurs. In a way, the concentration of gray-matter is indicative of the "computational power" available in a certain region of the brain. Processing gray-matter concentration maps with ICA yields independent brain maps whose expression levels (i.e., loadings) vary across subjects. Simply put, a near-zero loading for a given ICA-derived brain map indicates that the brain regions outlined in that map are lowly present in the subject (i.e., the gray-matter concentration in those regions are very low in that subject). Because this information is derived from structural MRI scans, SBM loadings are considered a structural modality feature (i.e., they describe patterns of the brain structure). More about SBM loadings can be found here: SBM paper.
SBM_train = pd.read_csv(path.join('Train', 'train_SBM.csv'), index_col=0)
The labels are indicated in the "Class" column. 0 = 'Healthy Control', 1 = 'Schizophrenic Patient'
labels = pd.read_csv(path.join('Train', 'train_labels.csv'), index_col=0)
%matplotlib inline
import matplotlib.pyplot as plt
from sklearn.feature_selection import SelectKBest, f_classif
FNC_selector = SelectKBest(f_classif).fit(FNC_train, labels['Class'])
SBM_selector = SelectKBest(f_classif).fit(SBM_train, labels['Class'])
fig = plt.figure(figsize=(18,6))
plot = fig.add_subplot(111, xlabel='feature number', ylabel='score', title='Univariate FNC feature scores')
indices = np.arange(FNC_train.shape[-1])
FNC_scores = -np.log10(FNC_selector.pvalues_)
FNC_scores /= FNC_scores.max()
plot.plot(indices, FNC_scores, 'r^')
plot.vlines(indices, [0], FNC_scores)
plt.show()
fig = plt.figure(figsize=(18,6))
plot = fig.add_subplot(111, xlabel='feature number', ylabel='score', title='Univariate SBM feature scores')
indices = np.arange(SBM_train.shape[-1])
SBM_scores = -np.log10(SBM_selector.pvalues_)
SBM_scores /= SBM_scores.max()
plot.plot(indices, SBM_scores, 'b^')
plot.vlines(indices, [0], SBM_scores)
plt.show()
For this submission I selected 60 FNC features and 8 SBM features.
FNC_selector.k = 60
SBM_selector.k = 8
FNC_features = FNC_train.columns[FNC_selector.get_support()]
SBM_features = SBM_train.columns[SBM_selector.get_support()]
X_train = pd.concat([ FNC_train[FNC_features], SBM_train[SBM_features]], axis=1)
y_train = labels['Class']
X_train
FNC13 | FNC30 | FNC33 | FNC35 | FNC37 | FNC38 | FNC40 | FNC41 | FNC42 | FNC43 | ... | FNC353 | FNC368 | SBM_map7 | SBM_map17 | SBM_map36 | SBM_map52 | SBM_map61 | SBM_map64 | SBM_map67 | SBM_map75 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Id | |||||||||||||||||||||
120873 | 0.270490 | 0.036615 | 0.21516 | 0.069346 | -0.086613 | 0.054857 | -0.365520 | -0.273410 | -0.275500 | -0.035595 | ... | -0.23049 | -0.060204 | -0.264192 | 0.137624 | -1.062109 | 0.791762 | -0.982331 | 1.070363 | 0.220316 | -0.002006 |
135376 | -0.088119 | 0.450290 | 0.70298 | 0.543640 | 0.244000 | 0.512400 | 0.439300 | 0.125780 | 0.191420 | -0.058085 | ... | -0.31699 | 0.369300 | -0.466051 | 0.972934 | 0.044317 | -0.073326 | -0.057543 | 0.371701 | -0.513081 | -0.295125 |
139149 | -0.361020 | 0.203270 | 0.51565 | 0.114280 | 0.262910 | 0.018740 | 0.088855 | -0.211420 | 0.026982 | 0.093953 | ... | -0.17144 | 0.521330 | 1.439242 | -1.488153 | 0.414747 | -0.910225 | 0.597229 | 1.220756 | -0.059213 | 0.350434 |
146791 | -0.060777 | 0.658060 | 0.64302 | 0.589520 | 0.485810 | 0.574150 | 0.344040 | 0.255670 | 0.091637 | 0.183470 | ... | 0.27329 | 0.144460 | -0.492673 | 0.187573 | -0.026555 | -3.013096 | 0.829697 | -0.450726 | -0.791032 | 0.448966 |
153870 | 0.048705 | 0.158000 | 0.25707 | 0.152580 | -0.105510 | -0.234190 | -0.127320 | 0.143880 | -0.286530 | -0.333980 | ... | -0.49929 | -0.179310 | -1.105922 | 1.961955 | -1.027496 | 0.474353 | -0.978412 | 0.158492 | 0.889753 | -0.551440 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
934330 | -0.373800 | 0.706610 | 0.53835 | 0.428510 | 0.533970 | 0.317520 | -0.042524 | 0.249370 | 0.191810 | 0.184660 | ... | -0.46915 | 0.108670 | -0.515736 | -0.112326 | 0.458916 | -0.516803 | 0.826910 | -0.225792 | 0.369724 | 0.567717 |
950671 | -0.029000 | 0.516310 | 0.41210 | 0.354370 | 0.311470 | -0.067289 | 0.011223 | 0.173140 | -0.039187 | 0.137150 | ... | -0.42424 | 0.304690 | -0.933527 | -0.347191 | 0.183240 | 0.914623 | 0.482055 | 0.073327 | -0.455141 | 0.977018 |
963924 | -0.197060 | 0.011246 | 0.35735 | 0.623080 | 0.317980 | -0.081810 | -0.102280 | 0.029382 | 0.205780 | 0.374430 | ... | 0.42983 | 0.357230 | -0.523021 | 1.369376 | -0.976704 | -1.429466 | -0.114078 | -0.476524 | -0.556896 | -0.424864 |
993348 | -0.087478 | 0.420390 | 0.29361 | 0.026402 | -0.041024 | 0.355390 | 0.163750 | -0.186250 | -0.297480 | 0.179980 | ... | 0.15890 | 0.462660 | 0.462689 | -1.749746 | -0.385862 | 0.839745 | -0.163926 | 0.953385 | 0.402673 | -0.421040 |
993946 | -0.061554 | -0.125080 | 0.15613 | 0.269830 | -0.178200 | -0.133630 | 0.019102 | 0.383180 | -0.082001 | -0.058647 | ... | -0.17677 | 0.307260 | 1.522190 | -0.281454 | -0.907096 | 1.241959 | -0.626201 | 0.229092 | 1.358773 | -1.125141 |
86 rows × 68 columns
labels
Class | |
---|---|
Id | |
120873 | 1 |
135376 | 0 |
139149 | 0 |
146791 | 0 |
153870 | 1 |
... | ... |
934330 | 0 |
950671 | 0 |
963924 | 1 |
993348 | 0 |
993946 | 1 |
86 rows × 1 columns
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=35).fit(X_train, y_train)
print(clf.score(X_train, labels))
0.872093023255814
FNC_test = pd.read_csv(path.join('Test', 'test_FNC.csv'), index_col=0)
SBM_test = pd.read_csv(path.join('Test', 'test_SBM.csv'), index_col=0)
X_test = pd.concat([FNC_test[FNC_features], SBM_test[SBM_features]], axis=1)
X_test
FNC13 | FNC30 | FNC33 | FNC35 | FNC37 | FNC38 | FNC40 | FNC41 | FNC42 | FNC43 | ... | FNC353 | FNC368 | SBM_map7 | SBM_map17 | SBM_map36 | SBM_map52 | SBM_map61 | SBM_map64 | SBM_map67 | SBM_map75 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Id | |||||||||||||||||||||
100004 | 0.113166 | 0.304551 | 0.678723 | 0.676364 | 0.102174 | 0.022148 | 0.317406 | 0.464976 | 0.236121 | -0.047108 | ... | -0.483997 | 0.150698 | -2.404130 | 2.256762 | -2.011786 | 0.139369 | 1.123770 | 2.083006 | 1.145440 | 0.192076 |
100015 | -0.054457 | 0.315034 | 0.770686 | 0.787717 | 0.366940 | -0.299074 | 0.699394 | 0.452051 | 0.032888 | 0.262658 | ... | -0.252089 | 0.318086 | -0.612468 | 1.711094 | 0.185261 | -2.084801 | 1.397832 | 1.046136 | -0.191733 | 0.174160 |
100026 | 0.002372 | -0.108333 | -0.058818 | 0.316538 | 0.081580 | 0.113453 | -0.050571 | 0.068699 | 0.275885 | 0.562116 | ... | 0.294130 | 0.093333 | -0.752907 | 1.386814 | -0.123830 | 0.046525 | 1.906989 | -2.661633 | -0.193911 | -0.476647 |
100030 | 0.040945 | 0.675230 | 0.537128 | 0.124338 | 0.073727 | 0.278834 | 0.161171 | 0.316369 | 0.067719 | -0.046476 | ... | -0.270263 | 0.442313 | 1.041755 | -0.949757 | 0.167515 | -1.693663 | -1.997087 | -2.083782 | 1.154107 | 2.790871 |
100047 | -0.245284 | 0.624100 | 0.294196 | 0.332166 | 0.447499 | 0.264495 | 0.446376 | 0.400933 | 0.041004 | 0.538643 | ... | -0.204068 | 0.394492 | -1.775324 | 0.415556 | 2.410666 | 0.021838 | 1.578984 | 1.402592 | -1.230440 | -1.544345 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
999956 | 0.182524 | 0.241864 | 0.606688 | -0.000529 | 0.566684 | -0.232798 | 0.733727 | -0.056436 | -0.227445 | 0.171759 | ... | -0.184820 | 0.407409 | -1.123315 | 0.428509 | 2.027816 | 0.849052 | 1.079632 | -1.525021 | 2.479249 | 0.548342 |
999979 | -0.172684 | 0.259993 | 0.571281 | -0.075842 | -0.262201 | 0.019818 | 0.504592 | -0.185592 | -0.809122 | 0.389232 | ... | -0.310955 | 0.473550 | -1.270540 | 0.013607 | -0.047793 | -0.755896 | 0.192909 | -0.336587 | 0.062809 | -0.690686 |
999989 | -0.408067 | 0.862003 | 0.646698 | 0.700784 | 0.572094 | -0.678907 | 0.192104 | -0.556889 | -0.246365 | -0.199630 | ... | -0.167875 | -0.437320 | -0.501659 | -0.680883 | 1.595677 | 0.466304 | 0.430025 | 1.452216 | 0.759654 | 1.607317 |
999992 | -0.160676 | -0.055138 | 0.247362 | 0.478874 | 0.195440 | 0.358874 | -0.124268 | 0.284184 | 0.330500 | -0.258015 | ... | -0.497435 | 0.783766 | -1.248628 | 0.815399 | 1.427952 | -0.305261 | -2.599313 | -0.161822 | 0.183669 | -1.313079 |
999994 | 0.005356 | 0.403151 | 0.538632 | 0.792229 | -0.226379 | -0.229716 | -0.044973 | -0.262501 | -0.039308 | -0.207124 | ... | 0.300068 | 0.562554 | -1.980608 | -1.398193 | -1.263830 | -0.303426 | 0.785046 | 2.369759 | -0.168919 | -0.668412 |
119748 rows × 68 columns
y_test = clf.predict_proba(X_test)
submission = pd.DataFrame(y_test, index=X_test.index, columns=['Healthy Control', 'Schizophrenic Patient'])
submission
Healthy Control | Schizophrenic Patient | |
---|---|---|
Id | ||
100004 | 0.342857 | 0.657143 |
100015 | 0.542857 | 0.457143 |
100026 | 0.600000 | 0.400000 |
100030 | 0.685714 | 0.314286 |
100047 | 0.600000 | 0.400000 |
... | ... | ... |
999956 | 0.485714 | 0.514286 |
999979 | 0.428571 | 0.571429 |
999989 | 0.600000 | 0.400000 |
999992 | 0.342857 | 0.657143 |
999994 | 0.514286 | 0.485714 |
119748 rows × 2 columns
submission.to_csv('submission.csv', columns=['Schizophrenic Patient'], header=['Probability'])