Wednesday, February 11, 2009

Friday, January 30, 2009

Oslo CTP 2

At PDC I told folks that we would try to release a new version of Oslo every 3-6 months.

Well, it's been 3 months and here it is!

Sells should be blogging more about this on Monday.

Sunday, January 18, 2009

I'm boring

My buddy says that my blog is boring.

Yo - Jon - why you gotta dis' me like that? At least my friends look better than yours.

M Data Transformation Part 1

Lots of blogs and content on M spend a bunch of time focused on the modeling and DSL aspects of M. And, lots of folks always ask about data transform. So, I'm going to spend some time on transformation. Clearly, if you're working on a data-oriented platform, transformation is a key enabler.  

Let's start with a couple of principles that M transforms live by:
- Functional. functional programming is the right paradigm for writing transformations because they are compositional and side-effect free
- Compositional. A corollary of functional, building transforms on top of transforms is powerful and enables reuse. It also means that clients/consumers do the same thing regardless of whether they are consuming a graph or a transform over a graph.
- Consistent. Queries are expressions that produce new values. Constraints are also expressions. We wanted the query language to be consistent withe the constraint language.
- Familiar. The syntax should be familiar to folks already writing transforms in T-SQL or in LINQ
- Ease. There are a number of shorthand forms for queries that make writing transforms even easier

OK - let's write some transforms. I'm going to use a very simple data model for my examples. Here's a model for Contacts (aka Outlook):

module Contacts
{
    export People, Addresses, Zips;

    People : 
    {
        Name : Text#128;
        Age : Integer32;
        MyAddresses : Addresses*;
    }* where identity(Name);
    
    Addresses : 
    {
        Id : Integer32 = AutoNumber();
        Street : Text;
        Country : Text;
        Zip : Zips;
    }* where identity(Id);

    Zips : Integer32* { 98052, 44114, 44115};
}

Here are a couple of queries for the projections, i.e., selecting values from a collection. Notice that there is a long syntax and a comprehension syntax that uses value.  

module CollectionQueries
{
    import Contacts;
    
    Q1()
    {
        from z in Zips
        select z
    }
    
    Q2()
    {
        Zips select value
    }
}

These are exactly equivalent. Check out the generated SQL:

create view [Queries].[Q1]
(
  [Item]
)
as
  select [z].[Item] as [Item]
  from [Contacts].[Zips] as [z];
go

create view [Queries].[Q2]
(
  [Item]
)
as
  select [$value].[Item] as [Item]
  from [Contacts].[Zips] as [$value];
go

Now, let's write some projections using entity collections. 

module EntityQueries
{
    import Contacts;
    
    Q1()
    {
        from p in People
        select p.Age
    }
    
    Q2()
    {
        People select value.Age
    }
    
    
    Q3()
    {
        People.Age
    }
        
}

Again, we have a full query syntax version and a comprehension form. There's also a 3rd syntax called a projector. It returns the same results, but is written more like a function of the field name.

That's some very basics around projection. Stay tuned for posts on more complex projections, plus selection, join, and other interesting query language features.


PS. if you want to see lots of examples, check out the set of M sample queries in the SDK. We wrote all of the LINQ samples in M so you can compare.

Enjoy!

Why oslo remix

This post is awesome!

Sunday, January 11, 2009

Metadata or data

I get quite irritated (sorry - no patience) when I hear others talk of a very distinct difference between metadata and transactional data. I really don't agree.

The arguments generally go something like this: "Metadata is mostly read-only. Transactional data is written much more frequently. Metadata has different access patterns -- I don't even know what that means :).  

I find that to be hogwash. That describes usage not kinds of data. I do not like categorizing data. It's like nominal typing - limits its broader viability and usability after the fact. Any data at any given time can be more like metadata or more like transactional data. For example:

- To an engineer, bill of materials is transactional data when designing a product. But, to a resource planner, the bill of materials is metadata that drives materials planning, purchasing and manufacturing scheduling. 
- A web page is transactional data during development, but metadata at runtime (unless it is self-modifying code) 

So, it just depends on the usage. Don't categorize the data, just understand the usage.

As for Oslo, I assert thatwe are building a broad set of capabilities to describe, validate, transform, access, and store data. Sure, Oslo's primary scenarios and our investments right now are targeted at data that describes runtimes. However, our ambitions are bigger, and our architecture and designs not limited or miopic in our thinking. If they are - please help us. After all, data is just data.

Tuesday, January 6, 2009

MGrammar + MSchema example

Justin asked a question about what it means to bring MSchema nad MGrammar together. Let me do a simple example to clarify.

Today I can write my semantic model in MSchema as such:

module Contacts {
type Person { Name : Text; Age : Integer32; }
People : Person*;
}

I can then write a grammar like this to generate values like this:

module Contacts {
language PeopleLang {
{
syntax Main = p:Person* => People { valuesof(p) };
syntax Person = n:Name a:Age => { Name = n, Age = a};
// elided details
}
}

There's a bunch of things that come to mind with the MGraph produced by the DSL versus the values expected by the semantic model, such as:
Type Checking
How do we statically (or dynamically) check that the values are typed correctly. For example, I want to do something like this:

syntax Person = n:Name a:Age => { Name = n : Text, Age = a : Integer32} : Contacts.Person ;

Expression support
I can build values up in MSchema using expressions and querys. Shouldn't I be able to do that on the RHS of grammars? For example:

syntax Name = f:FirstName l:LastName=> f + " " + l;

I also want to use functions and other things in the semantic model to both construct values as well as validate structural correctness.

Constraint support
Semantic models have constraints. Shouldn't values produced by MGrammar be validated against those? This is really a superset of the type check question since typing is really a form of constraint checking the structure of a value.


So, when I talked about MSchema and MGrammar integration in previous posts, I was hinting at the idea of bringing together the semantic model with the DSL declarations to ensure that the DSL output aligns with the model.

I hope that helps.