In 2014, Amazon Web Services (AWS) released their function as a service (FaaS) platform, Lambda. Since then, a burgeoning community of developers has formed around this technology, generating countless web frameworks, debugging/monitoring tools, and storage backends specifically designed for the serverless world. The capabilities of this technology is best understood by examining the EC2 dashboard of start ups such as A Cloud Guru, Instant, and Serverless (figure 1); these companies have been built using nothing but serverless technologies like Lamdba and DynamoDB. The FaaS paradigm has even worked its way into many major businesses outside of the tech industry; for example, Nordstroms has migrated many of their on-premise and existing cloud infrastructure to serverless platforms, which has helped them improve efficiency in both developer time and in operational cost.
There are three things that make FaaS platforms like AWS Lambda different from other auto-scaling, web infrastructure services:
- Fully-managed operations: developers using FaaS platforms only need to write code that follows the simple request-response software pattern. The job of provisioning servers, load-balancing, and fixing security vulnerabilities are all outsourced to the FaaS provider. By using these platforms, you are essentially renting the best operations talent on Earth from Amazon, Google, IBM, and Microsoft.
- Instantaneous start up and infinite scalability: for many applications, users expect an interactive experience; and delays above a few hundred milliseconds can significantly degrade a user's perceived quality of the service. In the past, obtaining interactive speeds required resources to be over-provisioned so that delays could be minimized even under spikes in load; however, having lots of idle servers is both expensive and adds complexity to the system. FaaS platforms provide an interactive experience without needing to over-provision resources by having developers encapsulate their functionality in Linux containers; these containers can be started almost instantly and distributed to additional nodes in a data center quickly during spikes in load.
- Micro-scale billing granularity: AWS Lambda (and other competing services) bill based on invocation time rounded up to the nearest 100ms increment; so you only pay for what you use. Cloud VMs and container management services provide this same benefit, but with much less granularity. For example, EC2 bills a minimum of 1 hour and GCE VMs bill a minimum of 10 minutes. As a result, it is frequently much less expensive to run a system on a FaaS platform than with cloud VMs or on-premise servers where machines may be idle much of the time.
The benefits of AWS Lambda make it perfect for web microservices, but these same benefits also make it a great tool for other workloads. In fact, it is easy to imagine taking any bursty, highly parallelizeable application and deploying it on a FaaS platform; and in doing so, that application would gain all of the advantages of these services. To demonstrate this, the remainder of this blog post is dedicated to explaining how I used AWS Lambda to perform massively parallel face recognition using a deep convolutional nueral network.
In the first panel of figure 2, you see a six hour long video I recorded with a GoPro (strapped to my head!) from USENIX NSDI'17, a computer systems and networking conference. I was at the conference with Keith Winstein and Sadjad Fouladi to help present ExCamera and I wanted to record every minute of the conference! After recording this video, our goal was to scan over the faces in the video and stitch together a montage of all of the times I encountered our UCSD collaborator, George Porter (the second panel of figure 2); we wanted to make sure he made it to the conference safely! And most importantly, we wanted to do all of this on AWS Lambda so that thousands of instances of our deep nueral network based face recognizer could (1) run in parallel across the video, (2) be billed at a 100ms granularity, and (3) start up instantly after invocation.
To perform the face recognition and stitch the montage together, our system performs the following steps (see the AWSLambdaFace github page for more details and simplified demo you can try at home!):
- Upload image with face of interest to an EC2 coordination server and perform standard image augmentation techniques to generate a training set.
- Use a deep neural network (DNN) to locate and generate 128-dimensional feature vectors for the face in each augmented image in the training set.
- Train a KNN classifier with (1) the augmented image feature vectors and (2) labeled faces in the wild (lfw) feature vectors.
- Run the DNN featurizer and KNN classifier in parallel across the entire video using 3000+ AWS lambda workers to perform recognition.
- Aggregate all the frames where the face of interest was recognized.
- Launch ExCamera to encode the frames into a montage!
It is unlikely that the major cloud providers (AWS, Google, IBM, or Microsoft) would have predicted that their systems would be used to perform computations like deep learning. However, I hope that after reading this post you will agree that virtually any bursty, highly parallelizeable application could benefit by using FaaS platforms. It is time that we re-examine the ways we build mobile and desktop applications since it is now possible to affordably and interactively ask for thousands of cores in the cloud!