Friday, January 30, 2009

Oslo CTP 2

At PDC I told folks that we would try to release a new version of Oslo every 3-6 months.

Well, it's been 3 months and here it is!

Sells should be blogging more about this on Monday.

Sunday, January 18, 2009

I'm boring

My buddy says that my blog is boring.

Yo - Jon - why you gotta dis' me like that? At least my friends look better than yours.

M Data Transformation Part 1

Lots of blogs and content on M spend a bunch of time focused on the modeling and DSL aspects of M. And, lots of folks always ask about data transform. So, I'm going to spend some time on transformation. Clearly, if you're working on a data-oriented platform, transformation is a key enabler.  

Let's start with a couple of principles that M transforms live by:
- Functional. functional programming is the right paradigm for writing transformations because they are compositional and side-effect free
- Compositional. A corollary of functional, building transforms on top of transforms is powerful and enables reuse. It also means that clients/consumers do the same thing regardless of whether they are consuming a graph or a transform over a graph.
- Consistent. Queries are expressions that produce new values. Constraints are also expressions. We wanted the query language to be consistent withe the constraint language.
- Familiar. The syntax should be familiar to folks already writing transforms in T-SQL or in LINQ
- Ease. There are a number of shorthand forms for queries that make writing transforms even easier

OK - let's write some transforms. I'm going to use a very simple data model for my examples. Here's a model for Contacts (aka Outlook):

module Contacts
{
    export People, Addresses, Zips;

    People : 
    {
        Name : Text#128;
        Age : Integer32;
        MyAddresses : Addresses*;
    }* where identity(Name);
    
    Addresses : 
    {
        Id : Integer32 = AutoNumber();
        Street : Text;
        Country : Text;
        Zip : Zips;
    }* where identity(Id);

    Zips : Integer32* { 98052, 44114, 44115};
}

Here are a couple of queries for the projections, i.e., selecting values from a collection. Notice that there is a long syntax and a comprehension syntax that uses value.  

module CollectionQueries
{
    import Contacts;
    
    Q1()
    {
        from z in Zips
        select z
    }
    
    Q2()
    {
        Zips select value
    }
}

These are exactly equivalent. Check out the generated SQL:

create view [Queries].[Q1]
(
  [Item]
)
as
  select [z].[Item] as [Item]
  from [Contacts].[Zips] as [z];
go

create view [Queries].[Q2]
(
  [Item]
)
as
  select [$value].[Item] as [Item]
  from [Contacts].[Zips] as [$value];
go

Now, let's write some projections using entity collections. 

module EntityQueries
{
    import Contacts;
    
    Q1()
    {
        from p in People
        select p.Age
    }
    
    Q2()
    {
        People select value.Age
    }
    
    
    Q3()
    {
        People.Age
    }
        
}

Again, we have a full query syntax version and a comprehension form. There's also a 3rd syntax called a projector. It returns the same results, but is written more like a function of the field name.

That's some very basics around projection. Stay tuned for posts on more complex projections, plus selection, join, and other interesting query language features.


PS. if you want to see lots of examples, check out the set of M sample queries in the SDK. We wrote all of the LINQ samples in M so you can compare.

Enjoy!

Why oslo remix

This post is awesome!

Sunday, January 11, 2009

Metadata or data

I get quite irritated (sorry - no patience) when I hear others talk of a very distinct difference between metadata and transactional data. I really don't agree.

The arguments generally go something like this: "Metadata is mostly read-only. Transactional data is written much more frequently. Metadata has different access patterns -- I don't even know what that means :).  

I find that to be hogwash. That describes usage not kinds of data. I do not like categorizing data. It's like nominal typing - limits its broader viability and usability after the fact. Any data at any given time can be more like metadata or more like transactional data. For example:

- To an engineer, bill of materials is transactional data when designing a product. But, to a resource planner, the bill of materials is metadata that drives materials planning, purchasing and manufacturing scheduling. 
- A web page is transactional data during development, but metadata at runtime (unless it is self-modifying code) 

So, it just depends on the usage. Don't categorize the data, just understand the usage.

As for Oslo, I assert thatwe are building a broad set of capabilities to describe, validate, transform, access, and store data. Sure, Oslo's primary scenarios and our investments right now are targeted at data that describes runtimes. However, our ambitions are bigger, and our architecture and designs not limited or miopic in our thinking. If they are - please help us. After all, data is just data.

Tuesday, January 6, 2009

MGrammar + MSchema example

Justin asked a question about what it means to bring MSchema nad MGrammar together. Let me do a simple example to clarify.

Today I can write my semantic model in MSchema as such:

module Contacts {
type Person { Name : Text; Age : Integer32; }
People : Person*;
}

I can then write a grammar like this to generate values like this:

module Contacts {
language PeopleLang {
{
syntax Main = p:Person* => People { valuesof(p) };
syntax Person = n:Name a:Age => { Name = n, Age = a};
// elided details
}
}

There's a bunch of things that come to mind with the MGraph produced by the DSL versus the values expected by the semantic model, such as:
Type Checking
How do we statically (or dynamically) check that the values are typed correctly. For example, I want to do something like this:

syntax Person = n:Name a:Age => { Name = n : Text, Age = a : Integer32} : Contacts.Person ;

Expression support
I can build values up in MSchema using expressions and querys. Shouldn't I be able to do that on the RHS of grammars? For example:

syntax Name = f:FirstName l:LastName=> f + " " + l;

I also want to use functions and other things in the semantic model to both construct values as well as validate structural correctness.

Constraint support
Semantic models have constraints. Shouldn't values produced by MGrammar be validated against those? This is really a superset of the type check question since typing is really a form of constraint checking the structure of a value.


So, when I talked about MSchema and MGrammar integration in previous posts, I was hinting at the idea of bringing together the semantic model with the DSL declarations to ensure that the DSL output aligns with the model.

I hope that helps.

MGraph as a data rep

Today we have lots of XML on the wire.

There's also lots of JSON.

I want to see MGraph get there as well. So, over the holidays I wrote GraphReader and GraphWriter. Think of these in the same equivalence class as TextReader/Writer or XmlReader/Writer.

Now - you may ask yourself - what's the difference between this and what the lexer/parser does in the M toochain to process MGraphs? Conceptually, there's no difference. But, if you really want to use MGraph on the wire, then you need something that is highly tuned, capable of streaming, and supports doing other wire reps besides text.

The way to think about this is serialization format versus program text. We want both with the same textual syntax, but the usage scenarios are different.

So, I started off building the Writer. It was easy except I started down 1 path 1st, and then had to simplify. I started off by baking in our interpretation of Record and Sequence into the serializer. It was something like this:

class GraphWriter {
void WriteStartEntity();
void WriteStartSequence();
void WriteStartNode();
// ends and other basic write statements elided
}

But then with the help of my great colleagues, I realized that I do not want this because it bakes the interpretation into the serialized format. I actually want the interpretation to be done by the reader. Also, it makes the reader much harder to write.

So, now it's just basically WriteStartNode :) It's that simple.

I also built a serializer on top of the writer that takes an IGraphBuilder and a graph node and makes the appropriate calls to serialize.

I then started on the Reader as the inverse/deserialization process. I haven't finished this yet because my hard drive is failing. I'll provide more details later.

I have no idea if we will productive this kind of thing, but I hope so in V1.

I miss chrome

My hard drive is crashing. It didn't totally die, but lets just say that at random points starting up Win7, things get interesting. And, chkdsk hangs forever.

Anyways - Gudge was kind enough to lend me a machine while I get it fixed.

It doesn't have Chrome. I miss it.

Saturday, January 3, 2009

Intro to Formal Type Systems

Cardelli has a great introduction paper to the formalism of type systems and type theory. The math guy in me loves the theory.


Friday, January 2, 2009

M == Semantic Model + DSL + values

I want to make sure I'm clear wrt my post from the other day about Fowler's work. Here's a simple formula to translate M concepts into Fowler speak:

MGrammar == DSL
MSchema == semantic model
MGraph == values of DSL/semantic model