Sunday, January 18, 2009

M Data Transformation Part 1

Lots of blogs and content on M spend a bunch of time focused on the modeling and DSL aspects of M. And, lots of folks always ask about data transform. So, I'm going to spend some time on transformation. Clearly, if you're working on a data-oriented platform, transformation is a key enabler.  

Let's start with a couple of principles that M transforms live by:
- Functional. functional programming is the right paradigm for writing transformations because they are compositional and side-effect free
- Compositional. A corollary of functional, building transforms on top of transforms is powerful and enables reuse. It also means that clients/consumers do the same thing regardless of whether they are consuming a graph or a transform over a graph.
- Consistent. Queries are expressions that produce new values. Constraints are also expressions. We wanted the query language to be consistent withe the constraint language.
- Familiar. The syntax should be familiar to folks already writing transforms in T-SQL or in LINQ
- Ease. There are a number of shorthand forms for queries that make writing transforms even easier

OK - let's write some transforms. I'm going to use a very simple data model for my examples. Here's a model for Contacts (aka Outlook):

module Contacts
{
    export People, Addresses, Zips;

    People : 
    {
        Name : Text#128;
        Age : Integer32;
        MyAddresses : Addresses*;
    }* where identity(Name);
    
    Addresses : 
    {
        Id : Integer32 = AutoNumber();
        Street : Text;
        Country : Text;
        Zip : Zips;
    }* where identity(Id);

    Zips : Integer32* { 98052, 44114, 44115};
}

Here are a couple of queries for the projections, i.e., selecting values from a collection. Notice that there is a long syntax and a comprehension syntax that uses value.  

module CollectionQueries
{
    import Contacts;
    
    Q1()
    {
        from z in Zips
        select z
    }
    
    Q2()
    {
        Zips select value
    }
}

These are exactly equivalent. Check out the generated SQL:

create view [Queries].[Q1]
(
  [Item]
)
as
  select [z].[Item] as [Item]
  from [Contacts].[Zips] as [z];
go

create view [Queries].[Q2]
(
  [Item]
)
as
  select [$value].[Item] as [Item]
  from [Contacts].[Zips] as [$value];
go

Now, let's write some projections using entity collections. 

module EntityQueries
{
    import Contacts;
    
    Q1()
    {
        from p in People
        select p.Age
    }
    
    Q2()
    {
        People select value.Age
    }
    
    
    Q3()
    {
        People.Age
    }
        
}

Again, we have a full query syntax version and a comprehension form. There's also a 3rd syntax called a projector. It returns the same results, but is written more like a function of the field name.

That's some very basics around projection. Stay tuned for posts on more complex projections, plus selection, join, and other interesting query language features.


PS. if you want to see lots of examples, check out the set of M sample queries in the SDK. We wrote all of the LINQ samples in M so you can compare.

Enjoy!