U-SQL, a Data Processing Language

U-SQL is a new data processing language of the Azure Data Lake Analytics service. It allows users to efficiently analyze data in Azure Storage Blobs, Data Lake Store, and such relational stores as Azure SQL DB/DW.

 

Microsoft announced the formulating of this new data query language in September 2015 when the company revealed launching the Azure Data Lake Store in a preview model.

 

U-SQL was developed out SCOPE, an internal Big Data language of Microsoft. It combines the best features of such programming languages as SQL and C# and allows to process data at any scale.  Developers can take advantage of a common SQL-like declarative language, programmability and extensibility of C# expression language and C# types, and such concepts of big data processing like custom reducers and processors, and “schema on reads”. Microsoft also aligned the U-SQL metadata system, the SQL syntax, and the semantics of the language with ANSI SQL and T-SQL.

 

U-SQL enables developers to make queries and combine data from various data sources such as Azure SQL DB, Azure Blob Storage, Azure Data Lake Storage, and Azure SQL Data Warehouse. The language makes possible to process unstructured data using schemas to read and insert UDFs and custom logic. Besides, U-SQL features extensibility that allows developers to control how to execute at scale.

 

U-SQL allows combining SQL keywords and syntactic C# expressions in a single script. Using U-SQL, a programmer can organize the data from an unstructured source into a schema, then, apply SQL to aggregate these data into the desired form, and finally, write the output in a table or a file.

 

Using U-SQL, programmers are able to work over the data in multiple steps and set the stage for their complex analysis. Microsoft's U-SQL enables data scientists and developers to process any type of data easily, using custom code, and to scale to any size of data.