Thursday, November 6, 2008

API for MGraph

In M we talk about 3 pieces of technology: MGraph is the data model, MSchema is the type system, and MGrammar is the transformation engine.

MGraph is at the core of all of these. MGraph is a labeled directed graph as defined in the System.Dataflow assembly. Interesting enough, rather than force you to implement a class or interface, we designed MGraph to follow a visitor pattern, like a simplified XML navigator. So, you keep or build whatever data structures you have, and then implement IGraphBuilder to expose those values as an MGraph.

Here is IGraphBuilder:

public interface IGraphBuilder

{

IEqualityComparer<object> NodeComparer { get; }

object DefineNode(object label);

void DefineSuccessors(object protoNode, IEnumerable<object> successors);

object GetLabel(object node);

IEnumerable<object> GetSuccessors(object node);

bool IsNode(object value);

}

I'm only going to talk about the navigation part of IGraphBuilder, not the construction protocol.

A node in the graph is identified by your implementation of GraphBuilder by calling IsNode. If this returns true, then the GetLabel and GetSuccessors methods must accept the same object and produce the label and successors, respectively. The Label is optional.

I recently experimented with building an MGraph over an existing SQL database. I defined a SQLGraphBuilder to take a connection and the name of a SQL object, i.e., table or view.

The SQLGraphBuilder queries the database using that name. I constructs a node for the SQL object with the label being the name of the sql object, and each successor being a row in that object. In turn, each row has a label of the primary key (concatenated if there are multiple) followed by each successor, which is itself a node that has the field name as the label and 1 successor being the value.

Here's the base definition for a node in the graph:

abstract class Node : IEquatable<Node>

{

protected string label;

protected SqlDataReader reader;

public Node(string label, SqlDataReader reader)

{

this.label = label;

this.reader = reader;

}

public string Label { get { return label; } }

public SqlDataReader Reader { get { return reader; } }

public abstract IEnumerable<object> GetSuccessors();

public abstract bool Equals(Node other);

}


So then I have a subtype of this for each kind (SqlObject, Row, and Cell). Here's an example:


class SqlObjectNode : Node

{

public SqlObjectNode(string sqlObjectName, SqlDataReader reader) : base(sqlObjectName, reader)

{

}

public override IEnumerable

{

return new Enumerable(new SqlObjectEnumerator(label, reader));

}

public override bool Equals(Node other)

{

var sqlNode = other as SqlObjectNode;

return sqlNode != null

&& sqlNode.label == this.label;

}

}


This is all quite easy except for the definition of identity via the node comparer. You need to make sure that you think through the identity story so that you can easily identity equivalent graph nodes. In my prototype, I use a concatenation of sqlobject + row id + column name. See the Equals method above as an example.

Now here's the tricky part. Some columns are actually foreign keys. So, I maintain a general foreign key lookup table, and when I hit a column name that is actually a foreign key, in stead of returning a value for that cell, I return a new row representing the values from the related table.

Here's the query to find foreign keys:

const string ForeignKeyQuery =

@"select ssource.name as SourceSchemaName,

tsource.name as SourceTableName,

csource.name as SourceColumnName,

fk.name as ForeignKeyName,

starget.name as TargetSchemaName,

ttarget.name as TargetTableName,

ctarget.name as TargetColumnName

from sys.foreign_keys fk

inner join sys.foreign_key_columns fkc on fk.object_id = fkc.constraint_object_id

inner join sys.tables tsource on fkc.parent_object_id = tsource.object_id

inner join sys.tables ttarget on fkc.referenced_object_id = ttarget.object_id

inner join sys.columns csource on (fkc.parent_column_id = csource.column_id and csource.object_id = tsource.object_id)

inner join sys.columns ctarget on (fkc.referenced_column_id = ctarget.column_id and ctarget.object_id = ttarget.object_id)

inner join sys.schemas ssource on ssource.schema_id = tsource.schema_id

inner join sys.schemas starget on starget.schema_id = ttarget.schema_id

and fk.type = 'F'

order by SourceSchemaName, SourceTableName, ForeignKeyName";

And here's the code to construct the row in the reference table. Notice I'm still working on it :)

StringBuilder query = new StringBuilder(@"select * from " + fk.TargetSqlObjectName + " where ");

int i = 0;

foreach (var targetCol in fk.TargetColumnNames)

{

if (i > 0) { query.Append(" and "); }

query.Append(targetCol + " = @var" + i);

i++;

}

i = 0;

//TODO: fix this hack to support non MARs connections; need to dispose connection + reader

var newConnection = new SqlConnection(connection.ConnectionString);

newConnection.Open();

using (var command = new SqlCommand(query.ToString(), newConnection))

{

foreach (var sourceCol in fk.SourceColumnNames)

{

object foo = reader[sourceCol];

command.Parameters.AddWithValue("@var" + i, foo);

i++;

}

var newReader = command.ExecuteReader();

return new ForeignKeyCellNode(fk.TargetSqlObjectName, rowName, fieldName, newReader);

}

GraphBuilder is fairly straightforward to use. It and Mgraphs are at the base of the modeling platform, so learning it now will only help as we introduce more tools and APIs to consume them.

1 comment:

Anonymous said...

Thanks for sharing this!
I'm also interested in the
"contruction protocol" you mentioned. Could also you tell us how/where to interact with a custom GraphBuilder? Where does it fit in?