Essentials of Caffe Tutorial

This post summarizes and reorganizes the key points of Caffe Tutorial for later use and quick reference.

  • 3 Basic Components: Blob, Layer, Net
    • Blob: an N-dimensional C-contiguous array that wraps the actual data and provides synchronization between CPU and GPU.
      • Conceal the overhead of CPU/GPU operation by synchronizing from CPU host to GPU device. Memory on CPU(host) and GPU(device) is allocated on demand(lazily) for efficient memory usage.
      • Conventional dimensions for batches of image data: number of batches N * channel K * height H * width W. Memory is row-major in layout. The value at (n, k, h, w) is physically at location ((n*K + k) * H + h)*W +w
      • Totally Ok to use other dimensions like 3D or 2D
      • Parameter blob dimensions vary according to the type and configuration of the layer.
      • a Blob stores two chunks of memories, data and diff(gradient)
    • Layer:
      • Supports fundamental operations and computation: convolve filters, pool, take inner products, apply activation function, element-wise transformation, normalize, load data, compute losses.
      • Each layer takes a set of input (bottom) blobs and produces a set of output (top) blobs.
      • Each layer has 3 critical computations,:
        • setup: initialize the layer and its connections once at model initialization, which creates the blobs and layers, calls the layers’ setup() function, and validates the correctness of the overall network architecture.
        • forward(one for CPU and one for GPU)
        • backward(one for CPU and one for GPU)
      • Vision layers: usually take images as input and produce other images as output
        • A typical image may have one channel (grayscale image) or three channels (RGB). While in this context of caffe, spatial structure of an image (2D geometry) matters
        • Most of the layers apply particular operations to some region of the input to produce a corresponding region of the output. While other few layers ignore the spatial structure and treat the image as one big vector.
        • Types: Convolution, Pooling, Local Response Normalization(LRN), im2col.
      • Loss Layer: Softmax, Sum-of-Squares, Hinge, Sigmoid Cross-Entropy, Infogain, Acccuray and Top-k (Accuracy scores the output as the accuracy of output with respect to target – it is not actually a loss and has no backward step.)
      • Activation/Neuron layers: element-wise operators, taking one bottom blob and producing one top blob of the same size
        • Types: RuLU/Leaky-ReLU, Sigmoid, Tanh, Absolute Value, Power, BNLL
      • Data layers: at the bottom of nets.
        • Common input preprocessing (mean subtraction, scaling, random cropping, and mirroring) is available by specifying TransformationParameterS.
        • Data can come from:
          • efficient databases (LevelDB or LMDB) -> Data type layer
          • directly from memory -> MemoryData type layer
          • when efficiency is not critical, from files on disk in HDF5 or common image formats -> HDF5Data or ImageData layer
          • Other types are: WindowData (for Windows) and DummyData (for development and debugging)
        • Prefetching: for throughput data layers fetch the next batch of data and prepare it in the background while the Net computes the current batch.
        • Multiple inputs: a Net can have multiple inputs of any number and type. Define as many data layers as needed giving each a unique name and top
      • Other common layer types:
        • InnerProduct: essentially fully connected layer. Input as a vector and output as a vector
        • Split: a utility layer splits one blob into multiple blobs
        • Concat: concatenates multiple blobs into one blob
        • Flatten: flattens an input of shape n * c * h * w to a simple vector output of shape n * (c*h*w)
        • Reshape: used to change the dimensions of input, without changing data. Just like the Flatten layer, only the dimensions are changed
        • Slice: slices an input layer to multiple output layers along a given dimension
      • Details can be seen at
    • Net: a set of layers connected in a directed acyclic graph (DAG)
      • defined in a plaintext modeling language
      • yields loss and gradients
      • Example: a simple logistic regression classifier
        • Screen Shot 2016-04-27 at 15.45.49
        • is defined by
        • Screen Shot 2016-04-27 at 15.58.00
  • Model:
    • The models are defined in plaintext protocol buffer schema (prototxt) while the learned models are serialized as binary protocol buffer (binaryproto) .caffemodel files.
    • To create a Caffe model you need to define the model architecture in a protocol buffer definition file (prototxt), with layers and their parameters defined in it.
  • Loss:
    • By convention, Caffe layer types with the suffix Loss contribute to the loss function, and other layers are assumed to be used for intermediate computations
    • However, any layer can be used as a loss by adding a field loss_weight: <float> to a layer definition for each top blob produced by the layer
    • Layers with the suffix Loss have an implicit loss_weight: 1 for the first top blob (and loss_weight: 0 for any additionaltops); other layers have an implicit loss_weight: 0 for all tops.
  • Solver: oversees the optimization and generates parameter updates
      • Stochastic Gradient Descent (type: "SGD"),
      • AdaDelta (type: "AdaDelta"),
      • Adaptive Gradient (type: "AdaGrad"),
      • Adam (type: "Adam"),
      • Nesterov’s Accelerated Gradient (type: "Nesterov") and
      • RMSprop (type: "RMSProp")
      • Details can be seen at
      1. scaffolds the optimization bookkeeping and creates the training network for learning and test network(s) for evaluation.
      2. iteratively optimizes by calling forward / backward and updating parameters
      3. (periodically) evaluates the test networks
      4. snapshots the model and solver state throughout the optimization
    • In each iteration:
      1. calls network forward to compute the output and loss
      2. calls network backward to compute the gradients
      3. incorporates the gradients into parameter updates according to the solver method
      4. updates the solver state according to learning rate, history, and method
    • Run in CPU/GPU modes.
    • Snapshots and resumes:
      • The weights are snapshot and saved without extension while the solver states are saved with .solverstate extension.
      • Snapshotting can be configured in the solver definition prototxt


  • Interfaces: command line(-cmdcaffe-), Python, and MATLAB interfaces
    • Command Line:
      • caffe train“:
        • 1. learns models from scratch; 2. resumes learning from saved snapshots; 3. fine-tunes models to new data and tasks
        • 3 types of training and arguments
          • All training requires a solver configuration through the -solver solver.prototxtargument.
          • Resuming requires the -snapshot model_iter_1000.solverstate argument to load the solver snapshot.
          • Fine-tuning requires the -weights model.caffemodel argument for the model initialization.
        • for example:
          • caffe train -solver examples/mnist/lenet_solver.prototxt
          • caffe train -solver examples/mnist/lenet_solver.prototxt -gpu 2
          • caffe train -solver examples/mnist/lenet_solver.prototxt -snapshot examples/mnist/lenet_iter_5000.solverstate
          • caffe train -solver examples/finetuning_on_flickr_style/solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
      • caffe test“:
        • scores models by running them in the test phase and reports the net output as its score
        • The per-batch score is reported and then the grand average is reported last
        • # score the learned LeNet model on the validation set as defined in the
          # model architeture lenet_train_test.prototxt
          caffe test -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel -gpu 0 -iterations 100
      • “caffe time”: (Benchmarking)
        • benchmarks model execution layer-by-layer through timing and synchronization
        • useful to check system performance and measure relative execution times for models
        • # (These example calls require you complete the LeNet / MNIST example first.)
          # time LeNet training on CPU for 10 iterations
          caffe time -model examples/mnist/lenet_train_test.prototxt -iterations 10
          # time LeNet training on GPU for the default 50 iterations
          caffe time -model examples/mnist/lenet_train_test.prototxt -gpu 0
          # time a model architecture with the given weights on the first GPU for 10 iterations
          caffe time -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel -gpu 0 -iterations 10
      • “caffe device_query”: for Diagnostics
        • reports GPU details for reference and checking device ordinals for running on a given device in multi-GPU machines
        • # query the first device
          caffe device_query -gpu 0
      • “-gpu” for Parallelism:
        • A solver and net will be instantiated for each GPU so the batch size is effectively multiplied by the number of GPUs. To reproduce single GPU training, reduce the batch size in the network definition accordingly
        • # train on GPUs 0 & 1 (doubling the batch size)
          caffe train -solver examples/mnist/lenet_solver.prototxt -gpu 0,1
          # train on all GPUs (multiplying batch size by number of devices)
          caffe train -solver examples/mnist/lenet_solver.prototxt -gpu all
    • Python interface -pycaffe-: the caffe module and its scripts in caffe/python
      • import caffe to load models, do forward and backward, handle IO, visualize networks, and even instrument model solving. All model data, derivatives, and parameters are exposed for reading and writing.
        • caffe.Net is the central interface for loading, configuring, and running models.caffe.Classifier and caffe.Detector provide convenience interfaces for common tasks.
        • caffe.SGDSolver exposes the solving interface.
        • handles input / output with preprocessing and protocol buffers.
        • caffe.draw visualizes network architectures.
        • Caffe blobs are exposed as numpy ndarrays for ease-of-use and efficiency.
      • Detailed examples to be seen below


  • Examples:
    • Fine-tuning a pre-trained model using command line:
      • Task: predict Flickr image style from “caffenet” pre-trained on ImageNet data
      • Main steps: modification to prototxt files
        • Since predicting 20 classes instead of 1000, so the last layer needs to be changed. Also, the layer name needs to be changed from fc8 to fc8_flickr in the prototxt. Since there is no layer named that in thebvlc_reference_caffenet, that layer will begin training with random weights.
        • decrease the overall learning rate base_lr in the solver prototxt, but boost the lr_mult on the newly introduced layer, so that the rest of the model changes very slowly with new data, but the new layer learns fast
        • setstepsize in the solver to a lower value than if we were training from scratch, since we’re virtually far along in training and therefore want the learning rate to go down faster
        • then run as
          ./build/tools/caffe train -solver models/finetune_flickr_style/solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel -gpu 0



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s