SNaCK is an algorithm that generates concept embeddings of your data.
As input, give it:
SNaCK will output an embedding that tries to satisfy the neighbor relationships of your data in feature space (like t-SNE), as well as the triplet constraints.
Each row of the triplet matrix represents a single "triplet constraint." A row like (3, 5, 8) specifies that in the final embedding, X[3]
should be closer to X[5]
than X[3]
is to X[8]
.
Using triplet constraints allows you to capture properties of your data that you can't quite model directly, such as subjective measurements of human taste.
The embedding corresponds to minimizing $$\arg\min_Y \lambda_1 C_{triplets}(Y) + \lambda_2 C_{t-SNE}(Y) ,$$ where $\lambda_1$ controls how much the expert triplet constraints influence the embedding and $\lambda_2$ controls how much the t-SNE neighbor information influences the embedding.
Returns an array with shape (len(X_np), no_dims). The i-th row in this array corresponds to the no_dims-dimensional coordinate of the point.
snack.snack_embed(
# Important parameters
X_np, # Features
contrib_cost_tsne, # Feature trade-off
triplets, # Triplet constraints
contrib_cost_triplets, # Triplet trade-off
perplexity = 30.0, # (magic) t-SNE neighborhood size
theta = 0, # Approximation trade-off
# Fine-grained optimization control:
no_dims = 2, # Output size
alpha = None, # Degrees of freedom in student-T PDF
max_iter = 300, # Iterations to run for
early_exaggeration = 4, # Perplexity exaggeration at beginning of optimization
early_exaggeration_switch_iter = 100, # How long to use early exaggeration
momentum = 0.5, # SGD parameter
momentum_switch_iter = 250, # SGD: When to switch to final momentum
final_momentum = 0.8, # SGD parameter
learning_rate = 1.0, # SGD parameter; equivalent to contrib_cost_* above
initial_Y = None, # Use this to initialize the embedding.
# Other:
each_fun = None, # Callback to run at each function
verbose = True, # Pretty output
num_threads = None, # Parallelization
)
X_np
: numpy array with shape (N, D)
and type double.contrib_cost_tsne
: doubletriplets
: numpy array with shape (T, 3)
and type long.contrib_cost_triplets
: doubleperplexity
: doubletheta
: doubleno_dims
: intalpha
: intmax_iter
: intearly_exaggeration
: doubleearly_exaggeration_switch_iter
: intmomentum
: doublefinal_momentum
: doublemomentum_switch_iter
: intlearning_rate
: doubleeach_fun
: functionverbose
: booleannum_threads
: intWe will demonstrate SNaCK and its parameters by recreating the food embeddings from the Food-10k dataset. You can download the dataset on our companion site. This dataset consists of 10,000 recipes from Yummly, along with CNN features and 900,000 crowdsourced relative triplet comparisons. Please see our paper for details, available from the companion site linked above.
import simplejson
import snack
DATASET_HOME = "/Users/michael/food-10k/release/Food-10k/"
dset = simplejson.load(open(DATASET_HOME+"/dataset.json"))
ftrs = np.load(DATASET_HOME+"/features.npy")
uuid_map = {uuid: i for i,uuid in enumerate(dset['image_uuids'])}
triplets = []
for line in open(DATASET_HOME+"/all-triplets.txt").readlines():
(a,b,c) = line.replace("\n","").split(" ")
triplets.append( (uuid_map[a], uuid_map[b], uuid_map[c]) )
triplets = np.array(triplets)
# Standard embedding
Y = snack.snack_embed(
ftrs.astype('float'), 500.0,
triplets, 0.05,
theta = 0.5,
)
fig,ax = subplots(figsize=(10, 10))
ax.set_xlim(-20, 20); ax.set_ylim(-20, 20)
ax.scatter(*Y.T, lw=0, s=1)
_=ax.axis('off')
The two most important parameters to SNaCK are the contrib_cost_tsne
and contrib_cost_triplets
parameters, which control the trade-off between valuing human expert constraints and neighborhood constraints for machine-generated features.
Unfortunately, these parameters should be chosen by hand using a grid search.
Note that the algorithm is equivalent to t-SNE alone when contrib_cost_triplets==0
, and is equivalent to t-STE alone when contrib_cost_tsne==0
In the example below, we show five different values for this trade-off on the food-10k dataset. On the left, t-SNE alone produces a sensible embedding of the food dataset, but it does not capture human taste structure in a global sense. On the right, t-STE does its best to respect expert constraints using no prior information about the objects' visual similarity, but since the 900,000 triplets are a small fraction of the $10000^3 \approx 10^{12}$ we could sample, the embedding is severely underconstrained and looks like two Gaussian blobs. Some balanced combination of these two objectives can better recover the missing fine-grained information while still respecting the human taste prior.
params = [{"tsne": 1000.0, "tste": 0.0},
{"tsne": 750.0, "tste": 0.025},
{"tsne": 500.0, "tste": 0.05},
{"tsne": 250.0, "tste": 0.075},
{"tsne": 0.0, "tste": 0.1},]
fig, axes = subplots(1, len(params), figsize=(len(params)*6, 6))
for params,ax in zip(params, axes):
ax.scatter(*snack.snack_embed(
ftrs.astype('float'), params['tsne'],
triplets, params['tste'],
theta=0.5).T, lw=0, s=1, alpha=0.5)
ax.set_title("tSNE: %r\ntSTE: %r"%(pp['tsne'], pp['tste']), fontsize=18)
ax.set_xticks([]); ax.set_yticks([])
(Not in the paper)
The t-STE portion of the calculation can be sped up greatly by calculating an approximation of the gradient. This usually makes the optimization much faster at a sensible accuracy cost.
The theta
parameter controls the quality of the approximation. Small values of theta
lead to a higher-quality embedding; large values of theta
create a faster, lower-quality embedding.
If there are $N$ points in $D$-dimensional space with $T$ triplet constraints, the exact calculation of the gradient takes $O(DN^2 + DT)$ time and space complexity. This can be reduced to $O(DN\log N + DT)$ by using the Barnes-Hut approach.
The answer is exact (Barnes-Hut approximation is disabled) by default when theta == 0
In the examples below, the exact solution takes 950 seconds to calculate. Values for theta
as high as 0.5 still look sensible. Setting theta
to 1.0 saves several seconds, but the visual quality of the embedding is heavily degraded. For this example, setting theta
any higher creates unusable garbage.
import time
thetas = [2.0, 1.5, 1.0, 0.5, 0.1, 0.0]
fig, axes = subplots(2,3, figsize=(15,10))
for theta,ax in zip(thetas, axes.ravel()):
begin = time.time()
ax.scatter(*snack.snack_embed(
ftrs.astype('float'), 500,
triplets, 0.05,
theta = theta,
verbose = False
).T, lw=0, s=1, alpha=0.5)
ax.set_title("Theta: %r\nSeconds: %r"%(theta, time.time()-begin), fontsize=18)
ax.set_xticks([]); ax.set_yticks([])
The perplexity
of the optimization roughly controls the expected number of neighbors around each point in the high-dimensional distribution. (This is not exactly what perplexity controls, but is one useful approximation.) This parameter is specific to t-SNE and is briefly documented in the t-SNE FAQ.
For lower values of perplexity
, the t-SNE optimization will attempt to force each point to have few neighbors, moving each point away from the others. For very high values of perplexity
, each point will try to be part of the same group, which may discard some fine-grained structure. The best balance is somewhere in the middle. See the below examples to get a sense of good perplexity.
perplexities = [3.0, 10.0, 30.0, 100.0, 300.0, 1000.0]
fig, axes = subplots(2,3, figsize=(15,10))
for perplexity,ax in zip(perplexities, axes.ravel()):
ax.scatter(*snack.snack_embed(
ftrs.astype('float'), 500,
triplets, 0.05,
theta = 0.5,
perplexity = perplexity,
verbose = False
).T, lw=0, s=1, alpha=0.5)
ax.set_title("Perplexity: %r"%perplexity)
ax.set_xticks([]); ax.set_yticks([])
If you wish to inspect the state of the optimization as its converging, you can optionally write a callback function that will be called during every iteration. This function receives:
In the example below, we use this functionality to show how
def each_fun(iter, Y, C_tSTE, C_tSNE, dY_tSTE, dY_tSNE):
fig,ax = subplots(figsize=(8,8))
ax.scatter(*Y.T, lw=0, s=1)
sz = Y.ravel().std()
ax.set_xlim(-4*sz, 4*sz)
ax.set_ylim(-4*sz, 4*sz)
ax.set_title("Iteration %r"%iter, fontsize=20)
fig.savefig("/tmp/conv-%04d.png"%iter)
fig.tight_layout()
clf()
Y = snack.snack_embed(
ftrs.astype('float'), 500.0,
triplets, 0.05,
theta = 0.5,
each_fun = each_fun,
verbose = False,
)
The following animation shows 300 iterations of SNaCK on the Food 10k dataset, generated from the above code: