Azure HDInsight, a Data Analytics Service

Azure HDInsight is a Microsoft’s open-source platform on Microsoft Azure. Azure HDInsight was introduced in 2014 as an analytic service for enterprises that allows to process, analyze and report on huge amounts of data in a fast and cost-effective manner. Being a cloud distribution of Hadoop components, Azure HDInsight allows users to take advantage of such popular open-source frameworks as Hadoop, Storm, Kafka, LLAP, hive, Spark, R. It contains specific cluster types and can add utilities, components, and languages.


With these frameworks, users can perform a wide range of scenarios, for example, ETL, machine learning, data warehousing, and IoT. The data can be both historical and real-time. 


  • During the batch processing, users can extract structured and unstructured data, transform them into a necessary structured format and load into a special data store. Later, the data can be used for data warehousing or data science.
  • Developers can take advantage of using HDInsight to create applications for extracting critical insights from business data that can be used for predicting future business trends.
  • In IoT, Azure HDInsight can be used for processing streams of data that are received in real time from a large number of different devices.
  • HDInsight can be used for performing interactive queries at massive scales over data in all formats, both structured and unstructured.
  • HDInsight can help users to extend their existing big data infrastructure on-premises to take advantage of the sophisticated cloud analytics capabilities of Azure.


Java and Python are the programming languages by default, but HDInsight clusters support many other programming languages as well. Developers can install additional programming languages with the help of script actions that are run on the cluster nodes to install software or make configuration changes.


It is possible to use BI tools for retrieving, analyzing, and reporting data that is integrated with HDInsight with the help of Hive ODBC driver or the Power Query add-in.