Dr. Iris Fu and I have recently published a post in the AWS AI Blog 🙂
Detection is one of many classic computer vision problems that has significantly improved with the adoption of convolutional neural networks (CNNs). When CNNs rose to popularity for image classification, many relied on crude and expensive preprocessing routines for generating region-proposals. Algorithms like Selective Search were used to generate candidate regions based on their “objectness” (how likely they are to contain an object) and those regions were subsequently fed to a CNN trained for classification. While this approach produces accurate results, it has a significant runtime cost. CNN architectures like Faster R-CNN, You Only Look Once (YOLO), and Single Shot MultiBox Detector (SSD) address this trade-off by embedding the localization task into the network itself. We sought out to compare the experience of training SSD using Apache MXNet and Caffe with the motivation of training these new architectures in a distributed fashion without suffering a reduction in accuracy.