Tuesday, 28 February 2017

Microsoft Azure for Research in Data Science

I attended the 1-Day Microsoft Azure Workshop organized by Microsoft Azure Education team at Iowa State University Campus. I am penning down my takeaways from it here. Manaswi, who also attended the Workshop, provided some inputs, too.

Microsoft Azure is an open, flexible, enterprise-grade cloud computing platform. (https://azure.microsoft.com/)

I found the product, i.e. Azure, more industry oriented, somewhat useful in academia/teaching, and not-so-useful for research purposes.

It is a very powerful product, easy to use when it comes to cloud-based web applications (such as Office 365, IoT applications etc. ) which are mostly industrial requirements.

It provides cloud computing and virtual machines facilities which in some ways are useful for academic/teaching purposes. But within the University having excellent computing infrastructure, I feel it makes less sense as the computation costs in Azure are significant. (To give you a rough idea, standard dual core machine with 7 GB RAM and 100 GB storage costs 110$ for a month(running 24*7)).

It also provides GPU computing machines with NVIDIA GPUs, but with costs as high as 10 times (roughly 1$ an hour for 56 GB GPU). The benefit of using Azure services is, you are free from the pain of purchasing and setting up all the hardware. (http://gpu.azure.com/)

For me, it seems the Machine learning lab available for Azure is the what the wordpress is for the websites. (https://studio.azureml.net/) It comes with a lot of preloaded datasets, algorithms and drag-and-drop easy to use workflow. Though Azure provides us with an option to execute custom python code I am not sure how useful it is when it comes to customization of algorithms or feeding in our own code. That is the direction to explore. Access to Azure ML lab is free to some extent.

Though, few particular things I found useful about Azure, that can directly applicable to research,

1) Database Management: it provides a variety of data storage and management services. Which can be useful to store large datasets and managing them.

2) Data analytics: powerful data analysis tools available.

3) PowerBI: it provides Data visualization tools. You just upload your data, and it is capable of generating a variety of visualizations in seconds - easily publishable, real-time and interactive (https://powerbi.microsoft.com/en-us/).
Also, PowerBI is free for ISU students. It can be useful for creating visualizations of experimental results and publish them.

4) There is a Hadoop (Hadoop, MapReduce, spark etc) framework on Azure. We can save the trouble of setting up the cluster and maintaining them.

5) There are tools to do HPC (High Performance Computing).

6)There are some cool APIs provided by Microsoft in Computer Vision, and NLP (https://www.microsoft.com/cognitive-services/en-us/apis)

Tutorials about all of the above services can be found at : https://github.com/MSRConnections/Azure-training-course In the 'Content' folder. 

If we want to know how much Azure costs for each service we can use pricing calculator. (https://azure.microsoft.com/en-us/pricing/calculator/)

Easan Selvan is the contact person for ISU in the case of any questions. I can share his email id and contact details if anyone is interested.

Those who attended the workshop were given free Azure pass which can be used for 1 month. I believe it shouldn't be difficult to get a free pass for a month for ISU student by contacting the Azure team.

More info about Azure services can be found at their blog : https://azure.microsoft.com/en-us/blog/