Business is scaling – up, changing their sizes – and out, increasing the number of branches. If we are speaking about IT infrastructure in growing businesses, the situation is similar. The number of servers (i.e. cores, RAM, disks) is growing rapidly and business is looking for new solutions for empowering this growth.
The first solution is to scale up existing infrastructure – the servers company bought and maintain, then look for new “boxes” to mount in old cases. But DC capacity will end somewhere (and investment budget also). At this point the needed capacity exceeds owned infrastructure. What then? The second scenario. Lift and Shift.
The business would like to move to the cloud, in the hope that the provider can provide a bigger or (cloud sales love this) infinite capacity. Of course it is both not true. Cloud instances (VMs) are limited in totally the same way as Your own “boxes”. It’s because cloud provider is buying the same servers as You can buy directly from Your vendor. But there is one (not so little) difference. Real cloud services provider has a scale. Instead of scale up, infrastructure need to be scaled out. So, if You want more computing power, stop choosing bigger instances – just start multiplying them.
The bad thing (and good from larger perspective) is to re-architect your workload to enable it to be distributed across many machines. Your application need to be redesigned to scale out from 1 to 2 or more machines. Next you will be able to scale your business to 3, 10, or 100 servers – but lift and shift to the Cloud is not only about infrastructure – it is also about your mind. 100 VMs is a small scale if you want to do something fast – delivering your product to the customer much faster than your competitors without need to have massive computing power reserve. You can scale out your infrastructure to 100.000 cores and hundreds of terabytes of RAM in minutes and use it only for time you need – paying only for what you used. This is Cloud.
A great example of a parallel workloads is report or billing generating procedures in telecommunication companies or banking sector. Rather than calculating 100.000 billings in serial, these billings can be divided up to 100.000 smaller tasks and processed in parallel – like a batch, without maintaining infrastructure below your workload.
Batch processing began with mainframe computers. Today, it still plays a central role in business, engineering, science, and other areas that require running lots of automated tasks – not only processing bills and payroll, but also calculating portfolio risk, designing new products, rendering animated films, testing software, searching for energy, predicting the weather, and finding new cures for disease. There is a tool created for scenarios like this – Microsoft Azure Batch. With Azure Batch, that power is available to you when you need it, without any capital investment.
Azure Batch runs the applications that you use on Your workstations and clusters. It is an easy way to Cloud-enable your “exe” files and scripts into hyperscale. Azure Batch provides a queue to receive the work that you want to run and executes your applications. You only need to describe the data that need to be moved to the cloud for processing, how the data should be distributed, what parameters to use for each task, and the command to start the process. Only two elements are mandatory:
1: Client application/script that interacts with the Batch and Storage services to execute a parallel workload on compute nodes (virtual machines). It runs on your local workstation or on “master node” you configure.
2: The program that runs on compute nodes in Azure to perform the actual work. In the sample below, TaskApplication.exe parses the text in a file downloaded from Azure Storage (the input file). Then it produces a text file (the output file) that contains a list of the top three words that appear in the input file. After it creates the output file, TaskApplication uploads the file to Azure Storage. This makes it available to the client application for download. TaskApplication runs in parallel on multiple compute nodes in the Batch service.
- Step 1.– Create containers in Azure Blob Storage.
- Step 2.– Upload task application files and input files to containers.
- Step 3.– Create a Batch pool. The pool StartTask downloads the task binary files (TaskApplication) to nodes as they join the pool.
- Step 4.– Create a Batch job.
- Step 5.– Add tasks to the job. 5a. The tasks are scheduled to execute on nodes. 5b. Each task downloads its input data from Azure Storage, then begins execution.
- Step 6.– Monitor tasks. As tasks are completed, they upload their output data to Azure Storage.
- Step 7.– Download task output from Storage.
With automatic scaling, you can have the Batch service dynamically adjust the number of compute nodes in a pool according to the current workload and resource usage of your compute scenario. This allows you to lower the overall cost of running your application by using only the resources you need, and releasing those you don’t need.
As an example, we can use 1000 H16r (compute-optimized) Azure VM instances with RDMA, 16 Xeon E5-2667 v3 Haswell 3.2 GHz (3.6 GHz with turbo) cores, 112GB of DDR4 RAM and 2000GB of local storage each, to run Your Business Intelligence process once a Year. We will have 16.000 cores, 112TB of RAM and 2.000TB of storage. The cost of this computing power will be around 2.800 EUR/h. To calculate in your own DC, you need to invest hundreds of thousands of Euros – just for using it once a year.
Processing parallel workloads with Azure Batch is typically done programmatically by using one of the Batch APIs. Your client application or service can use the Batch APIs to communicate with the Batch service. With the Batch APIs, you can create and manage pools of compute nodes, either virtual machines or Cloud services. You can then schedule jobs and tasks to run on those nodes.