Azure Data Lake, a Hyper-scale Big Data Repository

Microsoft’s Azure Data Lake Store was designed to support different tools built for the Hadoop File system and was released in November 2016.  It also provides integration with other Azure services. Data Lake Store is a special hyper-scale repository for workloads needed for big data analytics. Data Lake Store provides an infrastructure model to help enterprises store their big data for an undetermined future use and apply analytics to it. All data can be shared for collaboration within an enterprise with the highest level of security.


Data Lake Store is able to store any data in its native format and no prior transformation is required. There are no fixed limits on the file or account size as well. It is possible to handle unstructured, semi-structured and structured data. Users can operate on the stored data, using Azure Powershell, Azure Portal, and SDKs. It is possible to store multiple copies of data to provide redundancy.


Azure Data Lake Store works with Hadoop ecosystem and is compatible with most components in it, for example, Apache Hive, Apache Storm, Apache Spark, and Apache Tez. Using the REST APIs compatible with WebHDFS, users can access Azure Data Lake Store from Hadoop which is available with the DHInsight cluster. Data Lake Store has its own new file system that is called AzureDataLakeFilesystem.


Data Lake Store enables analytics on the stored data and offers such enterprise-grade capabilities as reliability, scalability, manageability, and security. All these features are important for enterprise use cases typical in the real world.. Users can analyze data which they store in the Data Lake Store if they use one of the Hadoop frameworks such as Hive or MapReduce.


Users can get started with Azure Data Lake Store if they create an account on the Azure Portal. Another option is to build a .NET application and create Data Lake Store account.